How to remove ending tab separator in streaming output

2011-12-26 Thread devdoer bird
HI:

In streming MR program, I use /bin/cat as a mapper and set reducer=NONE,
 but the outputs all end with a tab separator. How do I configure the
streaming MR program to remove the ending separator?

Thanks.


Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
Hey people,

How can we setup another machine in the cluster as Secondary Namenode
in hadoop 0.20.205 ?
Can a DN also act as SNN, any pros and cons of having this configuration ?

Thanks,
Praveenesh


RE: Hadoop configuration

2011-12-26 Thread Uma Maheswara Rao G
Hey Humayun,
 Looks your hostname still not resoling properly. even though you configured 
hostnames as master, slave...etc, it is getting humayun as hostname. 
just edit /etc/HOSTNAME file with correct hostname what you are expecting here.
To confirm whether it is resolving properly or not, you can just do below steps
#hostname
  //should get hostname here 
correctly ( ex: master)
   #hostname -i
  ..//should resolve correct IP here   
... ( ex: master ip)


and make sure slave and slave1 sre pingable each other.

Regards,
Uma


From: Humayun kabir [humayun0...@gmail.com]
Sent: Saturday, December 24, 2011 9:51 PM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop configuration

i've checked my log files. But i don't understand to why this error occurs.
here i my logs files. please give me some suggestion.

jobtracker.log  http://paste.ubuntu.com/781181/ 

namenode.log  http://paste.ubuntu.com/781183/ 

datanode.log(1st machine)  http://paste.ubuntu.com/781176/ 

datanode.log(2nd machine)   http://paste.ubuntu.com/781195goog_2054845717/


tasktracker.log(1st machine)  http://paste.ubuntu.com/781192/ 

tasktracker.log(2nd machine)  http://paste.ubuntu.com/781197/ 



On 24 December 2011 15:26, Joey Krabacher jkrabac...@gmail.com wrote:

 have you checked your log files for any clues?

 --Joey

 On Sat, Dec 24, 2011 at 3:15 AM, Humayun kabir humayun0...@gmail.com
 wrote:
  Hi Uma,
 
  Thank you very much for your tips. We tried it in 3 nodes in virtual box
 as
  you suggested. But still we are facing problem. Here is our all
  configuration file to all nodes. please take a look and show us some ways
  to solve it. It was nice and it would be great if you help us in this
  regards.
 
  http://core-site.xmlcore-site.xml  http://pastebin.com/Twn5edrp 
  hdfs-site.xml  http://pastebin.com/k4hR4GE9 
  mapred-site.xml  http://pastebin.com/gZuyHswS 
 
  /etc/hosts  http://pastebin.com/5s0yhgnj 
 
  output  http://paste.ubuntu.com/780807/ 
 
 
  Hope you will understand and extend your helping hand towards us.
 
  Have a nice day.
 
  Regards
  Humayun
 
  On 23 December 2011 17:31, Uma Maheswara Rao G mahesw...@huawei.com
 wrote:
 
  Hi Humayun ,
 
   Lets assume you have JT, TT1, TT2, TT3
 
   Now you should configure the \etc\hosts like below examle
 
   10.18.xx.1 JT
 
   10.18.xx.2 TT1
 
   10.18.xx.3 TT2
 
   10.18.xx.4 TT3
 
Configure the same set in all the machines, so that all task trackers
  can talk each other with hostnames correctly. Also pls remove some
 entries
  from your files
 
127.0.0.1 localhost.localdomain localhost
 
127.0.1.1 humayun
 
 
 
  I have seen others already suggested many links for the regular
  configuration items. Hope you might clear about them.
 
  hope it will help...
 
  Regards,
 
  Uma
 
  
 
  From: Humayun kabir [humayun0...@gmail.com]
  Sent: Thursday, December 22, 2011 10:34 PM
  To: common-user@hadoop.apache.org; Uma Maheswara Rao G
  Subject: Re: Hadoop configuration
 
  Hello Uma,
 
  Thanks for your cordial and quick reply. It would be great if you
 explain
  what you suggested to do. Right now we are running on following
  configuration.
 
  We are using hadoop on virtual box. when it is a single node then it
 works
  fine for big dataset larger than the default block size. but in case of
  multinode cluster (2 nodes) we are facing some problems. We are able to
  ping both Master-Slave and Slave-Master.
  Like when the input dataset is smaller than the default block size(64
 MB)
  then it works fine. but when the input dataset is larger than the
 default
  block size then it shows ‘too much fetch failure’ in reduce state.
  here is the output link
  http://paste.ubuntu.com/707517/
 
  this is our /etc/hosts file
 
  192.168.60.147 humayun # Added by NetworkManager
  127.0.0.1 localhost.localdomain localhost
  ::1 humayun localhost6.localdomain6 localhost6
  127.0.1.1 humayun
 
  # The following lines are desirable for IPv6 capable hosts
  ::1 localhost ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
  ff02::3 ip6-allhosts
 
  192.168.60.1 master
  192.168.60.2 slave
 
 
  Regards,
 
  -Humayun.
 
 
  On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.com
  mailto:mahesw...@huawei.com wrote:
  Hey Humayun,
 
   To solve the too many fetch failures problem, you should configure host
  mapping correctly.
  Each tasktracker should be able to ping from each other.
 
  Regards,
  Uma
  
  From: Humayun kabir [humayun0...@gmail.commailto:humayun0...@gmail.com
 ]
  Sent: Thursday, December 22, 2011 2:54 PM
  To: common-user@hadoop.apache.orgmailto:common-user@hadoop.apache.org
  Subject: 

RE: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread Uma Maheswara Rao G
Hey Praveenesh,
  
  You can start secondary namenode also by just giving the option ./hadoop 
secondarynamenode
 
DN can not act as seconday namenode. The basic work for seconday namenode is to 
do checkpointing and getting the edits insync with Namenode till last 
checkpointing period. DN is to store the real data blocks physically.
  you need to configure correct namenode http address also for the secondaryNN, 
so that it can connect NN for checkpointing operations. 
 
http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
You can configure secondary node IP in masters file, start-dfs.sh itself will 
start the SNN automatically as it starts DN and NN as well.

also you can see 
http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

Regards,
Uma

From: praveenesh kumar [praveen...@gmail.com]
Sent: Monday, December 26, 2011 5:05 PM
To: common-user@hadoop.apache.org
Subject: Secondary Namenode on hadoop 0.20.205 ?

Hey people,

How can we setup another machine in the cluster as Secondary Namenode
in hadoop 0.20.205 ?
Can a DN also act as SNN, any pros and cons of having this configuration ?

Thanks,
Praveenesh


Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread Harsh J
(Answering beyond Uma's reply)

 Can a DN also act as SNN, any pros and cons of having this configuration ?

You can run SNN on a regular slave box if you can't have a dedicate a box, it 
shouldn't be an issue for small clusters -- Do ensure its disk configuration is 
proper, and its allocated near to the same heap as the NameNode is.

For large clusters where the fsimage and periodic edits file sizes are larger, 
it would be worth placing it on a separate box given SNN's interactions.

On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote:

 Hey Praveenesh,
 
  You can start secondary namenode also by just giving the option ./hadoop 
 secondarynamenode
 
 DN can not act as seconday namenode. The basic work for seconday namenode is 
 to do checkpointing and getting the edits insync with Namenode till last 
 checkpointing period. DN is to store the real data blocks physically.
  you need to configure correct namenode http address also for the 
 secondaryNN, so that it can connect NN for checkpointing operations. 
 http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
 You can configure secondary node IP in masters file, start-dfs.sh itself will 
 start the SNN automatically as it starts DN and NN as well.
 
 also you can see 
 http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
 
 Regards,
 Uma
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, December 26, 2011 5:05 PM
 To: common-user@hadoop.apache.org
 Subject: Secondary Namenode on hadoop 0.20.205 ?
 
 Hey people,
 
 How can we setup another machine in the cluster as Secondary Namenode
 in hadoop 0.20.205 ?
 Can a DN also act as SNN, any pros and cons of having this configuration ?
 
 Thanks,
 Praveenesh



Re: Hadoop configuration

2011-12-26 Thread Humayun kabir
Hi Uma,
Thanks a lot. At last it is running without errors. Thank you very much for
your suggestion.

On 26 December 2011 20:04, Uma Maheswara Rao G mahesw...@huawei.com wrote:

 Hey Humayun,
  Looks your hostname still not resoling properly. even though you
 configured hostnames as master, slave...etc, it is getting humayun as
 hostname.
 just edit /etc/HOSTNAME file with correct hostname what you are expecting
 here.
 To confirm whether it is resolving properly or not, you can just do below
 steps
#hostname
  //should get hostname here
 correctly ( ex: master)
   #hostname -i
  ..//should resolve correct IP
 here   ... ( ex: master ip)


 and make sure slave and slave1 sre pingable each other.

 Regards,
 Uma

 
 From: Humayun kabir [humayun0...@gmail.com]
 Sent: Saturday, December 24, 2011 9:51 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Hadoop configuration

 i've checked my log files. But i don't understand to why this error occurs.
 here i my logs files. please give me some suggestion.

 jobtracker.log  http://paste.ubuntu.com/781181/ 

 namenode.log  http://paste.ubuntu.com/781183/ 

 datanode.log(1st machine)  http://paste.ubuntu.com/781176/ 

 datanode.log(2nd machine)   http://paste.ubuntu.com/781195
 goog_2054845717/
 

 tasktracker.log(1st machine)  http://paste.ubuntu.com/781192/ 

 tasktracker.log(2nd machine)  http://paste.ubuntu.com/781197/ 



 On 24 December 2011 15:26, Joey Krabacher jkrabac...@gmail.com wrote:

  have you checked your log files for any clues?
 
  --Joey
 
  On Sat, Dec 24, 2011 at 3:15 AM, Humayun kabir humayun0...@gmail.com
  wrote:
   Hi Uma,
  
   Thank you very much for your tips. We tried it in 3 nodes in virtual
 box
  as
   you suggested. But still we are facing problem. Here is our all
   configuration file to all nodes. please take a look and show us some
 ways
   to solve it. It was nice and it would be great if you help us in this
   regards.
  
   http://core-site.xmlcore-site.xml  http://pastebin.com/Twn5edrp 
   hdfs-site.xml  http://pastebin.com/k4hR4GE9 
   mapred-site.xml  http://pastebin.com/gZuyHswS 
  
   /etc/hosts  http://pastebin.com/5s0yhgnj 
  
   output  http://paste.ubuntu.com/780807/ 
  
  
   Hope you will understand and extend your helping hand towards us.
  
   Have a nice day.
  
   Regards
   Humayun
  
   On 23 December 2011 17:31, Uma Maheswara Rao G mahesw...@huawei.com
  wrote:
  
   Hi Humayun ,
  
Lets assume you have JT, TT1, TT2, TT3
  
Now you should configure the \etc\hosts like below examle
  
10.18.xx.1 JT
  
10.18.xx.2 TT1
  
10.18.xx.3 TT2
  
10.18.xx.4 TT3
  
 Configure the same set in all the machines, so that all task
 trackers
   can talk each other with hostnames correctly. Also pls remove some
  entries
   from your files
  
 127.0.0.1 localhost.localdomain localhost
  
 127.0.1.1 humayun
  
  
  
   I have seen others already suggested many links for the regular
   configuration items. Hope you might clear about them.
  
   hope it will help...
  
   Regards,
  
   Uma
  
   
  
   From: Humayun kabir [humayun0...@gmail.com]
   Sent: Thursday, December 22, 2011 10:34 PM
   To: common-user@hadoop.apache.org; Uma Maheswara Rao G
   Subject: Re: Hadoop configuration
  
   Hello Uma,
  
   Thanks for your cordial and quick reply. It would be great if you
  explain
   what you suggested to do. Right now we are running on following
   configuration.
  
   We are using hadoop on virtual box. when it is a single node then it
  works
   fine for big dataset larger than the default block size. but in case
 of
   multinode cluster (2 nodes) we are facing some problems. We are able
 to
   ping both Master-Slave and Slave-Master.
   Like when the input dataset is smaller than the default block size(64
  MB)
   then it works fine. but when the input dataset is larger than the
  default
   block size then it shows ‘too much fetch failure’ in reduce state.
   here is the output link
   http://paste.ubuntu.com/707517/
  
   this is our /etc/hosts file
  
   192.168.60.147 humayun # Added by NetworkManager
   127.0.0.1 localhost.localdomain localhost
   ::1 humayun localhost6.localdomain6 localhost6
   127.0.1.1 humayun
  
   # The following lines are desirable for IPv6 capable hosts
   ::1 localhost ip6-localhost ip6-loopback
   fe00::0 ip6-localnet
   ff00::0 ip6-mcastprefix
   ff02::1 ip6-allnodes
   ff02::2 ip6-allrouters
   ff02::3 ip6-allhosts
  
   192.168.60.1 master
   192.168.60.2 slave
  
  
   Regards,
  
   -Humayun.
  
  
   On 22 December 2011 15:47, Uma Maheswara Rao G mahesw...@huawei.com
   mailto:mahesw...@huawei.com wrote:
   Hey Humayun,
  
To solve the too many fetch failures problem, you should configure
 host
   mapping 

Re: How to remove ending tab separator in streaming output

2011-12-26 Thread Mahadev Konar
Bccing common-user and ccing mapred-user. Please use the correct
mailing lists for your questions.

You can use -Dstream.map.output.field.separator=
 for specifying the seperator.

  The link below should have more information.

http://hadoop.apache.org/common/docs/r0.20.205.0/streaming.html#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs

Hope that helps.

thanks
mahadev


On Mon, Dec 26, 2011 at 2:42 AM, devdoer bird devdo...@gmail.com wrote:

 HI:

 In streming MR program, I use /bin/cat as a mapper and set reducer=NONE,
  but the outputs all end with a tab separator. How do I configure the
 streaming MR program to remove the ending separator?

 Thanks.


anyone upgrade from 0.21 to 0.22?

2011-12-26 Thread steven zhuang
hello all,
I have a running hadoop 0.21.0 cluster, since 0.22 is the stable version
and we don't use Kerberos. just want to be sure, has anybody upgrade from
hadoop 0.21 to 0.22 siccessfully?
any hint would be greatly appreciated!



-- 
best wishes.
Steven


Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
Thanks..But, my 1st question is still unanswered.
I have a 8 DN/TT machines and 1 NN machine.
I want to set one of my DN/TT machine as SNN.
How I have to configure my conf/*.xml files to achieve this ?

Thanks,
Praveenesh

On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote:
 (Answering beyond Uma's reply)

 Can a DN also act as SNN, any pros and cons of having this configuration ?

 You can run SNN on a regular slave box if you can't have a dedicate a box, it 
 shouldn't be an issue for small clusters -- Do ensure its disk configuration 
 is proper, and its allocated near to the same heap as the NameNode is.

 For large clusters where the fsimage and periodic edits file sizes are 
 larger, it would be worth placing it on a separate box given SNN's 
 interactions.

 On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote:

 Hey Praveenesh,

  You can start secondary namenode also by just giving the option ./hadoop 
 secondarynamenode

 DN can not act as seconday namenode. The basic work for seconday namenode is 
 to do checkpointing and getting the edits insync with Namenode till last 
 checkpointing period. DN is to store the real data blocks physically.
  you need to configure correct namenode http address also for the 
 secondaryNN, so that it can connect NN for checkpointing operations.
 http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
 You can configure secondary node IP in masters file, start-dfs.sh itself 
 will start the SNN automatically as it starts DN and NN as well.

 also you can see 
 http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

 Regards,
 Uma
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, December 26, 2011 5:05 PM
 To: common-user@hadoop.apache.org
 Subject: Secondary Namenode on hadoop 0.20.205 ?

 Hey people,

 How can we setup another machine in the cluster as Secondary Namenode
 in hadoop 0.20.205 ?
 Can a DN also act as SNN, any pros and cons of having this configuration ?

 Thanks,
 Praveenesh



Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread Harsh J
The link Uma passed already covered that question: 
http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
 [dfs.http.address in hdfs-site.xml pointing to NN_HOST:50070 should do.]

Also, if you are using the tarball start/stop scripts, putting in the hostname 
for SNN in the conf/masters list is sufficient to get it auto-started there.

On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote:

 Thanks..But, my 1st question is still unanswered.
 I have a 8 DN/TT machines and 1 NN machine.
 I want to set one of my DN/TT machine as SNN.
 How I have to configure my conf/*.xml files to achieve this ?
 
 Thanks,
 Praveenesh
 
 On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote:
 (Answering beyond Uma's reply)
 
 Can a DN also act as SNN, any pros and cons of having this configuration ?
 
 You can run SNN on a regular slave box if you can't have a dedicate a box, 
 it shouldn't be an issue for small clusters -- Do ensure its disk 
 configuration is proper, and its allocated near to the same heap as the 
 NameNode is.
 
 For large clusters where the fsimage and periodic edits file sizes are 
 larger, it would be worth placing it on a separate box given SNN's 
 interactions.
 
 On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote:
 
 Hey Praveenesh,
 
  You can start secondary namenode also by just giving the option ./hadoop 
 secondarynamenode
 
 DN can not act as seconday namenode. The basic work for seconday namenode 
 is to do checkpointing and getting the edits insync with Namenode till last 
 checkpointing period. DN is to store the real data blocks physically.
  you need to configure correct namenode http address also for the 
 secondaryNN, so that it can connect NN for checkpointing operations.
 http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
 You can configure secondary node IP in masters file, start-dfs.sh itself 
 will start the SNN automatically as it starts DN and NN as well.
 
 also you can see 
 http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
 
 Regards,
 Uma
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, December 26, 2011 5:05 PM
 To: common-user@hadoop.apache.org
 Subject: Secondary Namenode on hadoop 0.20.205 ?
 
 Hey people,
 
 How can we setup another machine in the cluster as Secondary Namenode
 in hadoop 0.20.205 ?
 Can a DN also act as SNN, any pros and cons of having this configuration ?
 
 Thanks,
 Praveenesh
 



Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread Harsh J
Yes, checkpoints are helpful when your original NN image goes corrupt (very 
very rare, if you use dual or more dfs.name.dir points to be safe).

On 27-Dec-2011, at 12:33 PM, praveenesh kumar wrote:

 Cool.
 I just did that..
 So now I am seeing my fsimage file on SNN's hadoop.tmp.dir...
 So incase my NN went down.. I can take this image file from SNN and paste
 it  at NN's *dfs.name.dir/current/fsimage *
 and I can have NN up based on last snapshot that SNN had, right ?
 
 Thanks,
 Praveenesh
 
 On Tue, Dec 27, 2011 at 12:20 PM, Harsh J ha...@cloudera.com wrote:
 The link Uma passed already covered that question:
 http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/[dfs.http.address
 in hdfs-site.xml pointing to NN_HOST:50070 should d.]
 
 Also, if you are using the tarball start/stop scripts, putting in the
 hostname for SNN in the conf/masters list is sufficient to get it
 auto-started there.
 
 On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote:
 
 Thanks..But, my 1st question is still unanswered.
 I have a 8 DN/TT machines and 1 NN machine.
 I want to set one of my DN/TT machine as SNN.
 How I have to configure my conf/*.xml files to achieve this ?
 
 Thanks,
 Praveenesh
 
 On Mon, Dec 26, 2011 at 8:44 PM, Harsh J ha...@cloudera.com wrote:
 (Answering beyond Uma's reply)
 
 Can a DN also act as SNN, any pros and cons of having this
 configuration ?
 
 You can run SNN on a regular slave box if you can't have a dedicate a
 box, it shouldn't be an issue for small clusters -- Do ensure its disk
 configuration is proper, and its allocated near to the same heap as the
 NameNode is.
 
 For large clusters where the fsimage and periodic edits file sizes are
 larger, it would be worth placing it on a separate box given SNN's
 interactions.
 
 On 26-Dec-2011, at 7:53 PM, Uma Maheswara Rao G wrote:
 
 Hey Praveenesh,
 
 You can start secondary namenode also by just giving the option
 ./hadoop secondarynamenode
 
 DN can not act as seconday namenode. The basic work for seconday
 namenode is to do checkpointing and getting the edits insync with Namenode
 till last checkpointing period. DN is to store the real data blocks
 physically.
 you need to configure correct namenode http address also for the
 secondaryNN, so that it can connect NN for checkpointing operations.
 
 http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
 You can configure secondary node IP in masters file, start-dfs.sh
 itself will start the SNN automatically as it starts DN and NN as well.
 
 also you can see
 http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
 
 Regards,
 Uma
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, December 26, 2011 5:05 PM
 To: common-user@hadoop.apache.org
 Subject: Secondary Namenode on hadoop 0.20.205 ?
 
 Hey people,
 
 How can we setup another machine in the cluster as Secondary Namenode
 in hadoop 0.20.205 ?
 Can a DN also act as SNN, any pros and cons of having this
 configuration ?
 
 Thanks,
 Praveenesh