How to remove ending tab separator in streaming output
HI: In streming MR program, I use /bin/cat as a mapper and set reducer=NONE, but the outputs all end with a tab separator. How do I configure the streaming MR program to remove the ending separator? Thanks.
Secondary Namenode on hadoop 0.20.205 ?
Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
RE: Hadoop configuration
Hey Humayun, Looks your hostname still not resoling properly. even though you configured hostnames as master, slave...etc, it is getting humayun as hostname. just edit /etc/HOSTNAME file with correct hostname what you are expecting here. To confirm whether it is resolving properly or not, you can just do below steps #hostname //should get hostname here correctly ( ex: master) #hostname -i ..//should resolve correct IP here ... ( ex: master ip) and make sure slave and slave1 sre pingable each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Saturday, December 24, 2011 9:51 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop configuration i've checked my log files. But i don't understand to why this error occurs. here i my logs files. please give me some suggestion. jobtracker.log http://paste.ubuntu.com/781181/ namenode.log http://paste.ubuntu.com/781183/ datanode.log(1st machine) http://paste.ubuntu.com/781176/ datanode.log(2nd machine) http://paste.ubuntu.com/781195goog_2054845717/ tasktracker.log(1st machine) http://paste.ubuntu.com/781192/ tasktracker.log(2nd machine) http://paste.ubuntu.com/781197/ On 24 December 2011 15:26, Joey Krabacher jkrabac...@gmail.com wrote: have you checked your log files for any clues? --Joey On Sat, Dec 24, 2011 at 3:15 AM, Humayun kabir humayun0...@gmail.com wrote: Hi Uma, Thank you very much for your tips. We tried it in 3 nodes in virtual box as you suggested. But still we are facing problem. Here is our all configuration file to all nodes. please take a look and show us some ways to solve it. It was nice and it would be great if you help us in this regards. http://core-site.xmlcore-site.xml http://pastebin.com/Twn5edrp hdfs-site.xml http://pastebin.com/k4hR4GE9 mapred-site.xml http://pastebin.com/gZuyHswS /etc/hosts http://pastebin.com/5s0yhgnj output http://paste.ubuntu.com/780807/ Hope you will understand and extend your helping hand towards us. Have a nice day. Regards Humayun On 23 December 2011 17:31, Uma Maheswara Rao G mahesw...@huawei.com wrote: Hi Humayun , Lets assume you have JT, TT1, TT2, TT3 Now you should configure the \etc\hosts like below examle 10.18.xx.1 JT 10.18.xx.2 TT1 10.18.xx.3 TT2 10.18.xx.4 TT3 Configure the same set in all the machines, so that all task trackers can talk each other with hostnames correctly. Also pls remove some entries from your files 127.0.0.1 localhost.localdomain localhost 127.0.1.1 humayun I have seen others already suggested many links for the regular configuration items. Hope you might clear about them. hope it will help... Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Thursday, December 22, 2011 10:34 PM To: common-user@hadoop.apache.org; Uma Maheswara Rao G Subject: Re: Hadoop configuration Hello Uma, Thanks for your cordial and quick reply. It would be great if you explain what you suggested to do. Right now we are running on following configuration. We are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. We are able to ping both Master-Slave and Slave-Master. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave Regards, -Humayun. On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.com mailto:mahesw...@huawei.com wrote: Hey Humayun, To solve the too many fetch failures problem, you should configure host mapping correctly. Each tasktracker should be able to ping from each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.commailto:humayun0...@gmail.com ] Sent: Thursday, December 22, 2011 2:54 PM To: common-user@hadoop.apache.orgmailto:common-user@hadoop.apache.org Subject:
RE: Secondary Namenode on hadoop 0.20.205 ?
Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
Re: Secondary Namenode on hadoop 0.20.205 ?
(Answering beyond Uma's reply) Can a DN also act as SNN, any pros and cons of having this configuration ? You can run SNN on a regular slave box if you can't have a dedicate a box, it shouldn't be an issue for small clusters -- Do ensure its disk configuration is proper, and its allocated near to the same heap as the NameNode is. For large clusters where the fsimage and periodic edits file sizes are larger, it would be worth placing it on a separate box given SNN's interactions. On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote: Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
Re: Hadoop configuration
Hi Uma, Thanks a lot. At last it is running without errors. Thank you very much for your suggestion. On 26 December 2011 20:04, Uma Maheswara Rao G mahesw...@huawei.com wrote: Hey Humayun, Looks your hostname still not resoling properly. even though you configured hostnames as master, slave...etc, it is getting humayun as hostname. just edit /etc/HOSTNAME file with correct hostname what you are expecting here. To confirm whether it is resolving properly or not, you can just do below steps #hostname //should get hostname here correctly ( ex: master) #hostname -i ..//should resolve correct IP here ... ( ex: master ip) and make sure slave and slave1 sre pingable each other. Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Saturday, December 24, 2011 9:51 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop configuration i've checked my log files. But i don't understand to why this error occurs. here i my logs files. please give me some suggestion. jobtracker.log http://paste.ubuntu.com/781181/ namenode.log http://paste.ubuntu.com/781183/ datanode.log(1st machine) http://paste.ubuntu.com/781176/ datanode.log(2nd machine) http://paste.ubuntu.com/781195 goog_2054845717/ tasktracker.log(1st machine) http://paste.ubuntu.com/781192/ tasktracker.log(2nd machine) http://paste.ubuntu.com/781197/ On 24 December 2011 15:26, Joey Krabacher jkrabac...@gmail.com wrote: have you checked your log files for any clues? --Joey On Sat, Dec 24, 2011 at 3:15 AM, Humayun kabir humayun0...@gmail.com wrote: Hi Uma, Thank you very much for your tips. We tried it in 3 nodes in virtual box as you suggested. But still we are facing problem. Here is our all configuration file to all nodes. please take a look and show us some ways to solve it. It was nice and it would be great if you help us in this regards. http://core-site.xmlcore-site.xml http://pastebin.com/Twn5edrp hdfs-site.xml http://pastebin.com/k4hR4GE9 mapred-site.xml http://pastebin.com/gZuyHswS /etc/hosts http://pastebin.com/5s0yhgnj output http://paste.ubuntu.com/780807/ Hope you will understand and extend your helping hand towards us. Have a nice day. Regards Humayun On 23 December 2011 17:31, Uma Maheswara Rao G mahesw...@huawei.com wrote: Hi Humayun , Lets assume you have JT, TT1, TT2, TT3 Now you should configure the \etc\hosts like below examle 10.18.xx.1 JT 10.18.xx.2 TT1 10.18.xx.3 TT2 10.18.xx.4 TT3 Configure the same set in all the machines, so that all task trackers can talk each other with hostnames correctly. Also pls remove some entries from your files 127.0.0.1 localhost.localdomain localhost 127.0.1.1 humayun I have seen others already suggested many links for the regular configuration items. Hope you might clear about them. hope it will help... Regards, Uma From: Humayun kabir [humayun0...@gmail.com] Sent: Thursday, December 22, 2011 10:34 PM To: common-user@hadoop.apache.org; Uma Maheswara Rao G Subject: Re: Hadoop configuration Hello Uma, Thanks for your cordial and quick reply. It would be great if you explain what you suggested to do. Right now we are running on following configuration. We are using hadoop on virtual box. when it is a single node then it works fine for big dataset larger than the default block size. but in case of multinode cluster (2 nodes) we are facing some problems. We are able to ping both Master-Slave and Slave-Master. Like when the input dataset is smaller than the default block size(64 MB) then it works fine. but when the input dataset is larger than the default block size then it shows ‘too much fetch failure’ in reduce state. here is the output link http://paste.ubuntu.com/707517/ this is our /etc/hosts file 192.168.60.147 humayun # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 humayun localhost6.localdomain6 localhost6 127.0.1.1 humayun # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 192.168.60.1 master 192.168.60.2 slave Regards, -Humayun. On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.com mailto:mahesw...@huawei.com wrote: Hey Humayun, To solve the too many fetch failures problem, you should configure host mapping
Re: How to remove ending tab separator in streaming output
Bccing common-user and ccing mapred-user. Please use the correct mailing lists for your questions. You can use -Dstream.map.output.field.separator= for specifying the seperator. The link below should have more information. http://hadoop.apache.org/common/docs/r0.20.205.0/streaming.html#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs Hope that helps. thanks mahadev On Mon, Dec 26, 2011 at 2:42 AM, devdoer bird devdo...@gmail.com wrote: HI: In streming MR program, I use /bin/cat as a mapper and set reducer=NONE, but the outputs all end with a tab separator. How do I configure the streaming MR program to remove the ending separator? Thanks.
anyone upgrade from 0.21 to 0.22?
hello all, I have a running hadoop 0.21.0 cluster, since 0.22 is the stable version and we don't use Kerberos. just want to be sure, has anybody upgrade from hadoop 0.21 to 0.22 siccessfully? any hint would be greatly appreciated! -- best wishes. Steven
Re: Secondary Namenode on hadoop 0.20.205 ?
Thanks..But, my 1st question is still unanswered. I have a 8 DN/TT machines and 1 NN machine. I want to set one of my DN/TT machine as SNN. How I have to configure my conf/*.xml files to achieve this ? Thanks, Praveenesh On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote: (Answering beyond Uma's reply) Can a DN also act as SNN, any pros and cons of having this configuration ? You can run SNN on a regular slave box if you can't have a dedicate a box, it shouldn't be an issue for small clusters -- Do ensure its disk configuration is proper, and its allocated near to the same heap as the NameNode is. For large clusters where the fsimage and periodic edits file sizes are larger, it would be worth placing it on a separate box given SNN's interactions. On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote: Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
Re: Secondary Namenode on hadoop 0.20.205 ?
The link Uma passed already covered that question: http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ [dfs.http.address in hdfs-site.xml pointing to NN_HOST:50070 should do.] Also, if you are using the tarball start/stop scripts, putting in the hostname for SNN in the conf/masters list is sufficient to get it auto-started there. On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote: Thanks..But, my 1st question is still unanswered. I have a 8 DN/TT machines and 1 NN machine. I want to set one of my DN/TT machine as SNN. How I have to configure my conf/*.xml files to achieve this ? Thanks, Praveenesh On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote: (Answering beyond Uma's reply) Can a DN also act as SNN, any pros and cons of having this configuration ? You can run SNN on a regular slave box if you can't have a dedicate a box, it shouldn't be an issue for small clusters -- Do ensure its disk configuration is proper, and its allocated near to the same heap as the NameNode is. For large clusters where the fsimage and periodic edits file sizes are larger, it would be worth placing it on a separate box given SNN's interactions. On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote: Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh
Re: Secondary Namenode on hadoop 0.20.205 ?
Yes, checkpoints are helpful when your original NN image goes corrupt (very very rare, if you use dual or more dfs.name.dir points to be safe). On 27-Dec-2011, at 12:33 PM, praveenesh kumar wrote: Cool. I just did that.. So now I am seeing my fsimage file on SNN's hadoop.tmp.dir... So incase my NN went down.. I can take this image file from SNN and paste it at NN's *dfs.name.dir/current/fsimage * and I can have NN up based on last snapshot that SNN had, right ? Thanks, Praveenesh On Tue, Dec 27, 2011 at 12:20 PM, Harsh J ha...@cloudera.com wrote: The link Uma passed already covered that question: http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/[dfs.http.address in hdfs-site.xml pointing to NN_HOST:50070 should d.] Also, if you are using the tarball start/stop scripts, putting in the hostname for SNN in the conf/masters list is sufficient to get it auto-started there. On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote: Thanks..But, my 1st question is still unanswered. I have a 8 DN/TT machines and 1 NN machine. I want to set one of my DN/TT machine as SNN. How I have to configure my conf/*.xml files to achieve this ? Thanks, Praveenesh On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote: (Answering beyond Uma's reply) Can a DN also act as SNN, any pros and cons of having this configuration ? You can run SNN on a regular slave box if you can't have a dedicate a box, it shouldn't be an issue for small clusters -- Do ensure its disk configuration is proper, and its allocated near to the same heap as the NameNode is. For large clusters where the fsimage and periodic edits file sizes are larger, it would be worth placing it on a separate box given SNN's interactions. On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote: Hey Praveenesh, You can start secondary namenode also by just giving the option ./hadoop secondarynamenode DN can not act as seconday namenode. The basic work for seconday namenode is to do checkpointing and getting the edits insync with Namenode till last checkpointing period. DN is to store the real data blocks physically. you need to configure correct namenode http address also for the secondaryNN, so that it can connect NN for checkpointing operations. http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode You can configure secondary node IP in masters file, start-dfs.sh itself will start the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM To: common-user@hadoop.apache.org Subject: Secondary Namenode on hadoop 0.20.205 ? Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh