Virtual memory problems on Ubuntu 12.04 (a.k.a. MALLOC_ARENA_MAX or HADOOP-7154)

2012-10-25 Thread Henning Blohm
Recently I have installed data nodes on Ubuntu 12.04 and observed failing
M/R jobs with errors like this:

Diagnostics report from attempt_1351154628597_0002_m_00_0: Container
[pid=14529,containerID=container_1351154628597_0002_01_02] is running
beyond virtual memory limits. Current usage: 124.4mb of 1.0gb physical
memory used; 2.1gb of 2.1gb virtual memory used. Killing container.
Dump of the process-tree for container_1351154628597_0002_01_02 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14529 13550 14529 14529 (java) 678 18 2265411584 31856
/home/gd/gd/jdk1.6.0_35/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx1000M -XX:MaxPermSize=512M
-Djava.io.tmpdir=/home/gd/gd/gi-de-nosql.cdh4-base/data/yarn/usercache/gd/appcache/application_1351154628597_0002/container_1351154628597_0002_01_02/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/home/gd/gd/gi-de-nosql.cdh4-base/logs/application_1351154628597_0002/container_1351154628597_0002_01_02
-Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 192.168.178.25 36183
attempt_1351154628597_0002_m_00_0 2

I am using CDH4.0.1 (hadoop 2.0.0) with the Yarn M/R implementation on
Ubuntu 12.04 64Bit.

According to HADOOP-7154 making sure MALLOC_ARENA_MAX=1 (or 4) is exported
should fix the issue.

I tried the following:

Exporting the environment variable MALLOC_ARENA_MAX with value 1 in all
hadoop shell scrips (e.g. yarn-env.sh). Checking the launch_container.sh
script that Yarn creates I can tell that it indeed contains the line

export MALLOC_ARENA_MAX=1

But still I am getting the error above.

In addition I tried adding

property
   namemapred.child.env/name
   valueMALLOC_ARENA_MAX=1/value
/property

to mapred-site.xml. But that didn't seem to fix it either.

Is there anything special that I need to configure on the server to make
the setting effective?

Any idea would be great!!

Thanks,
  Henning


Re: Virtual memory problems on Ubuntu 12.04 (a.k.a. MALLOC_ARENA_MAX or HADOOP-7154)

2012-10-25 Thread Henning Blohm
Could not get it to make sense out of MALLOC_ARENA_MAX. No .bashrc etc. 
no env script seemed to have any impact.


Made jobs work again by setting yarn.nodemanager.vmem-pmem-ratio=10. Now 
they probably run with some obscene and unnecessary vmem allocation 
(which I read does not come for free with the new malloc). What a crappy 
situation (and change) :-(


Thanks,
  Henning

On 10/25/2012 11:47 AM, Henning Blohm wrote:
Recently I have installed data nodes on Ubuntu 12.04 and observed 
failing M/R jobs with errors like this:


Diagnostics report from attempt_1351154628597_0002_m_00_0: 
Container 
[pid=14529,containerID=container_1351154628597_0002_01_02] is 
running beyond virtual memory limits. Current usage: 124.4mb of 1.0gb 
physical memory used; 2.1gb of 2.1gb virtual memory used. Killing 
container.

Dump of the process-tree for container_1351154628597_0002_01_02 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14529 13550 14529 14529 (java) 678 18 2265411584 31856 
/home/gd/gd/jdk1.6.0_35/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -Xmx1000M -XX:MaxPermSize=512M 
-Djava.io.tmpdir=/home/gd/gd/gi-de-nosql.cdh4-base/data/yarn/usercache/gd/appcache/application_1351154628597_0002/container_1351154628597_0002_01_02/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.mapreduce.container.log.dir=/home/gd/gd/gi-de-nosql.cdh4-base/logs/application_1351154628597_0002/container_1351154628597_0002_01_02 
-Dyarn.app.mapreduce.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 
192.168.178.25 36183 attempt_1351154628597_0002_m_00_0 2


I am using CDH4.0.1 (hadoop 2.0.0) with the Yarn M/R implementation on 
Ubuntu 12.04 64Bit.


According to HADOOP-7154 making sure MALLOC_ARENA_MAX=1 (or 4) is 
exported should fix the issue.


I tried the following:

Exporting the environment variable MALLOC_ARENA_MAX with value 1 in 
all hadoop shell scrips (e.g. yarn-env.sh). Checking the 
launch_container.sh script that Yarn creates I can tell that it indeed 
contains the line


export MALLOC_ARENA_MAX=1

But still I am getting the error above.

In addition I tried adding

property
   namemapred.child.env/name
   valueMALLOC_ARENA_MAX=1/value
/property

to mapred-site.xml. But that didn't seem to fix it either.

Is there anything special that I need to configure on the server to 
make the setting effective?


Any idea would be great!!

Thanks,
  Henning




Re: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread Mohammad Tariq
Make sure username on both the machines is same. Also, have you copied the
public key to the slave machine?

Regards,
Mohammad Tariq



On Thu, Oct 25, 2012 at 1:58 PM, yogesh.kuma...@wipro.com wrote:

  Hi all,

 I am trying to run the command

 ssh Master

 it runs and shows after entering password.
 Password: abc
 Last login: Thu Oct 25 13:51:06 2012 from master

 But ssh for Slave through error.



 ssh Slave

 it asks for password ans denie

 Password: abc
 Password: abc
 Password: abc
 Permission denied (publickey,keyboard-interactive).

 Please suggest.
 Thanks  regards
 Yogesh Kumar

  * Please do not print this email unless it is absolutely necessary. *

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com



Re: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread Mohammad Tariq
make sure username is same on both the machines. and you can copy the key
manually as well.

Regards,
Mohammad Tariq



On Thu, Oct 25, 2012 at 2:46 PM, yogesh.kuma...@wipro.com wrote:

  Hi Mohammad,

 It was first Issue, I have tried to copy the by using the command

 ssh-copy-id -i $HOME/.ssh/id_rsa.pub pluto@slave

 but it showed error.

 Master:~ mediaadmin$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub pluto@Slave
 -bash: ssh-copy-id: command not found

 Why is it so??

 Regards
 Yogesh Kumar

  --
 *From:* Mohammad Tariq [donta...@gmail.com]
 *Sent:* Thursday, October 25, 2012 2:01 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: ERROR:: SSH failour for distributed node hadoop cluster

  Make sure username on both the machines is same. Also, have you copied
 the public key to the slave machine?

 Regards,
 Mohammad Tariq



 On Thu, Oct 25, 2012 at 1:58 PM, yogesh.kuma...@wipro.com wrote:

  Hi all,

 I am trying to run the command

 ssh Master

 it runs and shows after entering password.
 Password: abc
 Last login: Thu Oct 25 13:51:06 2012 from master

 But ssh for Slave through error.



 ssh Slave

 it asks for password ans denie

 Password: abc
 Password: abc
 Password: abc
 Permission denied (publickey,keyboard-interactive).

 Please suggest.
 Thanks  regards
 Yogesh Kumar

  *Please do not print this email unless it is absolutely necessary. *

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com


   * Please do not print this email unless it is absolutely necessary. 
 *

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com



ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread yogesh.kumar13
Hi All,

I am trying to copy the public key by this command.

Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

I have two machines Master Name is pluto and same name is of Slave. (Admin)

And I got this error, Where I am going wrong?

ssh-copy-id: command not found


Please suggest

Thanks  regards
Yogesh Kumar



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com


Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread Nitin Pawar
operating system you are using will be of good help to answer your question.

Normally the command you are looking for is provided by openssh-clients
Install this package if not already.

If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
 Hi All,

 I am trying to copy the public key by this command.

 Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

 I have two machines Master Name is pluto and same name is of Slave. (Admin)

 And I got this error, Where I am going wrong?

 ssh-copy-id: command not found


 Please suggest

 Thanks  regards
 Yogesh Kumar


 The information contained in this electronic message and any attachments to
 this message are intended for the exclusive use of the addressee(s) and may
 contain proprietary, confidential or privileged information. If you are not
 the intended recipient, you should not disseminate, distribute or copy this
 e-mail. Please notify the sender immediately and destroy all copies of this
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should
 check this email and any attachments for the presence of viruses. The
 company accepts no liability for any damage caused by any virus transmitted
 by this email.

 www.wipro.com



-- 
Nitin Pawar


[no subject]

2012-10-25 Thread lei liu
http://blog.csdn.net/onlyqi/article/details/6544989
https://issues.apache.org/jira/browse/HDFS-2185
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
http://blog.csdn.net/chenpingbupt/article/details/7922042
https://issues.apache.org/jira/browse/HADOOP-8163


Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread Mohammad Tariq
scp  file_to_copy u...@remote.server.fi:/path/to/location

Regards,
Mohammad Tariq



On Thu, Oct 25, 2012 at 3:31 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 operating system you are using will be of good help to answer your
 question.

 Normally the command you are looking for is provided by openssh-clients
 Install this package if not already.

 If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

 On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
  Hi All,
 
  I am trying to copy the public key by this command.
 
  Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave
 
  I have two machines Master Name is pluto and same name is of Slave.
 (Admin)
 
  And I got this error, Where I am going wrong?
 
  ssh-copy-id: command not found
 
 
  Please suggest
 
  Thanks  regards
  Yogesh Kumar
 
 
  The information contained in this electronic message and any attachments
 to
  this message are intended for the exclusive use of the addressee(s) and
 may
  contain proprietary, confidential or privileged information. If you are
 not
  the intended recipient, you should not disseminate, distribute or copy
 this
  e-mail. Please notify the sender immediately and destroy all copies of
 this
  message and any attachments.
 
  WARNING: Computer viruses can be transmitted via email. The recipient
 should
  check this email and any attachments for the presence of viruses. The
  company accepts no liability for any damage caused by any virus
 transmitted
  by this email.
 
  www.wipro.com



 --
 Nitin Pawar



RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread yogesh.kumar13
Thanks All,

The copy has been done but here comes another horrible issue.

when I log in as Master


ssh Master it asks for Password

Master:~ mediaadmin$ ssh Master
Password: abc
Last login: Thu Oct 25 17:13:30 2012
Master:~ mediaadmin$ 


and for Slave it dosent ask.

Master:~ mediaadmin$ ssh pluto@Slave
Last login: Thu Oct 25 17:15:16 2012 from 10.203.33.80
plutos-iMac:~ pluto$ 



now if I run command start-dfs.sh from Master logged in terminal it asks for 
passwords

Master:~ mediaadmin$ ssh master
Password:
Last login: Thu Oct 25 17:16:44 2012 from master
Master:~ mediaadmin$ start-dfs.sh
starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
Password:Password: abc
Password: abc
Password:
Master: Permission denied (publickey,keyboard-interactive).
Password:
Password:
Slave: Permission denied (publickey,keyboard-interactive).



Why is it asking for password when I have configure password less ssh?
And even its not accepting Master password and Slave password..


Please help and suggest

Thanks  regards
Yogesh Kumar


From: Nitin Pawar [nitinpawar...@gmail.com]
Sent: Thursday, October 25, 2012 3:31 PM
To: user@hadoop.apache.org
Subject: Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

operating system you are using will be of good help to answer your question.

Normally the command you are looking for is provided by openssh-clients
Install this package if not already.

If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
 Hi All,

 I am trying to copy the public key by this command.

 Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

 I have two machines Master Name is pluto and same name is of Slave. (Admin)

 And I got this error, Where I am going wrong?

 ssh-copy-id: command not found


 Please suggest

 Thanks  regards
 Yogesh Kumar


 The information contained in this electronic message and any attachments to
 this message are intended for the exclusive use of the addressee(s) and may
 contain proprietary, confidential or privileged information. If you are not
 the intended recipient, you should not disseminate, distribute or copy this
 e-mail. Please notify the sender immediately and destroy all copies of this
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should
 check this email and any attachments for the presence of viruses. The
 company accepts no liability for any damage caused by any virus transmitted
 by this email.

 www.wipro.com



--
Nitin Pawar

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com


Re:

2012-10-25 Thread Harsh J
Did you mean to mail that to yourself as a means of bookmarks or did
you just want to share this bundle of unrelated links with us?

On Thu, Oct 25, 2012 at 3:43 PM, lei liu liulei...@gmail.com wrote:
 http://blog.csdn.net/onlyqi/article/details/6544989
 https://issues.apache.org/jira/browse/HDFS-2185
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
 http://blog.csdn.net/chenpingbupt/article/details/7922042
 https://issues.apache.org/jira/browse/HADOOP-8163



-- 
Harsh J


Release of HiBench 2.2 (a Hadoop benchmark suite)

2012-10-25 Thread Dai, Jason
Hi,

I would like to announce the availability of HiBench 2.2 at 
https://github.com/intel-hadoop/hibench. Since the release of HiBench 2.1, we 
have received many good feedbacks, and HiBench 2.2 provides an update to v2.1 
based on these feedbacks, including:

1)  Build automatic data generators for Nutch indexing and Bayesian 
classification workloads. In HiBench 2.1 they used fixed input data set, and 
cannot easily scale up or down.

2)  Change the PageRank workload to the implementation contained in the 
Pegasus project (http://www.cs.cmu.edu/~pegasus/). The previous PageRank 
workload in HiBench 2.1 comes from Mahout 0.6 and can run into out of memory 
problems with large input data; and Mahout has dropped the support for PageRank 
since (see MAHOUT-1049https://issues.apache.org/jira/browse/MAHOUT-1049).

3)  Upgrade the machine learning workloads (K-mean clustering and Bayesian 
classification) to Mahout 0.7, which fixes many issues/bugs in Mahout 0.6 (that 
is, the version we used in HiBench 2.1).

Thanks,
-Jason

_
From: Dai, Jason
Sent: Thursday, June 14, 2012 12:27 AM
To: common-u...@hadoop.apache.orgmailto:common-u...@hadoop.apache.org
Subject: Open source of HiBench 2.1 (a Hadoop benchmark suite)


Hi,

HiBench, a Hadoop benchmark suite constructed by Intel, is used intensively for 
Hadoop benchmarking, tuning  optimizations both inside Intel and by our 
customers/partners. It consists of a set of representative Hadoop programs 
including both micro-benchmarks and more real world applications (e.g., 
search, machine learning and Hive queries).

We have made HiBench 2.1 available under Apache License 2.0 at 
https://github.com/hibench/HiBench-2.1, and would like to get your feedbacks on 
how it can be further improved. BTW, please stop by the Intel booth if you are 
at Hadoop summit, so that we can have more interactive discussions on both 
HiBench and HiTune (our Hadoop performance analyzer open sourced at 
https://github.com/hitune/hitune).

Thanks,
-Jason




Re: Redirects from namenode to datanode not working

2012-10-25 Thread Visioner Sadak
any hints friends will i have to try this with a cluster set up?? with
datanode installed on a diffrnt ip address

On Tue, Oct 23, 2012 at 12:34 PM, Visioner Sadak
visioner.sa...@gmail.comwrote:

 my config files already have ips instead of localhost.yes if i copy
 paste ip in address bar it wrks is there any other place where ip has
 to be mentioned like etc/hosts in linux


 On Tue, Oct 23, 2012 at 12:11 PM, Georgy Gryaznov 
 georgy.gryaz...@gmail.com wrote:

 Hey Visioner.

 I think the problem here is that your config files are pointing at
 your localhost as the IP for the datanode/tasktracker. If you
 copy-paste the external IP for the box into the address bar, does it
 work?

 Thanks,
 Georgy

 On Thu, Oct 18, 2012 at 11:27 PM, Visioner Sadak
 visioner.sa...@gmail.com wrote:
  Hello friends,
   I m runnning hadoop in psuedo distr mode at a
 remote
  linux ip ,Thru the namenode WEB UI i m able to access the directory
  structure(thru data node) by clicking on browse the directory button but
  when i click on live nodes i am able to see my datanode at localhost
 when i
  click on tht it redirects my browser to localhost instead of my remote
 linux
  box ip and i cant see the pageany hints on whts happening is it
 becoz my
  datanode and namenode on same ip.
 
 





Re: HDFS HA IO Fencing

2012-10-25 Thread Todd Lipcon
Hi Liu,

Locks are not sufficient, because there is no way to enforce a lock in a
distributed system without unbounded blocking. What you might be referring
to is a lease, but leases are still problematic unless you can put bounds
on the speed with which clocks progress on different machines, _and_ have
strict guarantees on the way each node's scheduler works. With Linux and
Java, the latter is tough.

You may want to look into QuorumJournalManager which doesn't require
setting up IO fencing.

-Todd

On Thu, Oct 25, 2012 at 1:27 AM, lei liu liulei...@gmail.com wrote:

 I want to use HDFS HA function, I find the  IO Fencing function is
 complex in hadoop2.0. I think we can use  file lock to implement the  IO
 Fencing function, I think that is simple.

 Thanks,

 LiuLei




-- 
Todd Lipcon
Software Engineer, Cloudera


RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread Brahma Reddy Battula
I think master machine authorized-key  is missed.

Please do following..

ssh-copy-id -i ~/.ssh/id_rsa.pub {IP of Master machine}..

Before starting cluster better to check whether ssh is enable or not by doing 
ssh {slave or master IP} from Master ( here it should not ask passwd). 


From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com]
Sent: Thursday, October 25, 2012 7:49 PM
To: user@hadoop.apache.org
Subject: RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

Thanks All,

The copy has been done but here comes another horrible issue.

when I log in as Master


ssh Master it asks for Password

Master:~ mediaadmin$ ssh Master
Password: abc
Last login: Thu Oct 25 17:13:30 2012
Master:~ mediaadmin$


and for Slave it dosent ask.

Master:~ mediaadmin$ ssh pluto@Slave
Last login: Thu Oct 25 17:15:16 2012 from 10.203.33.80
plutos-iMac:~ pluto$



now if I run command start-dfs.sh from Master logged in terminal it asks for 
passwords

Master:~ mediaadmin$ ssh master
Password:
Last login: Thu Oct 25 17:16:44 2012 from master
Master:~ mediaadmin$ start-dfs.sh
starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
Password:Password: abc
Password: abc
Password:
Master: Permission denied (publickey,keyboard-interactive).
Password:
Password:
Slave: Permission denied (publickey,keyboard-interactive).



Why is it asking for password when I have configure password less ssh?
And even its not accepting Master password and Slave password..


Please help and suggest

Thanks  regards
Yogesh Kumar


From: Nitin Pawar [nitinpawar...@gmail.com]
Sent: Thursday, October 25, 2012 3:31 PM
To: user@hadoop.apache.org
Subject: Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

operating system you are using will be of good help to answer your question.

Normally the command you are looking for is provided by openssh-clients
Install this package if not already.

If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
 Hi All,

 I am trying to copy the public key by this command.

 Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

 I have two machines Master Name is pluto and same name is of Slave. (Admin)

 And I got this error, Where I am going wrong?

 ssh-copy-id: command not found


 Please suggest

 Thanks  regards
 Yogesh Kumar


 The information contained in this electronic message and any attachments to
 this message are intended for the exclusive use of the addressee(s) and may
 contain proprietary, confidential or privileged information. If you are not
 the intended recipient, you should not disseminate, distribute or copy this
 e-mail. Please notify the sender immediately and destroy all copies of this
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should
 check this email and any attachments for the presence of viruses. The
 company accepts no liability for any damage caused by any virus transmitted
 by this email.

 www.wipro.com



--
Nitin Pawar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com


RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread yogesh.kumar13
Hi Brahma, 

I am on Mac OS X it dosent have  copy cmd i.e 

sh-copy-id -i 

I copyed it as 

mediaadmin$ cat ~/.ssh/id_rsa.pub | ssh pluto@10.203.33.80 'cat  
~/.ssh/authorized_keys'
Password:

and did 
ssh 10.203.33.80 and it asked for password.

Master:~ mediaadmin$ ssh 10.203.33.80
Password:
Last login: Thu Oct 25 19:04:31 2012 from master


Please suggest

Thanks  regards
Yogesh Kumar


From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com]
Sent: Thursday, October 25, 2012 6:38 PM
To: user@hadoop.apache.org
Subject: RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

I think master machine authorized-key  is missed.

Please do following..

ssh-copy-id -i ~/.ssh/id_rsa.pub {IP of Master machine}..

Before starting cluster better to check whether ssh is enable or not by doing 
ssh {slave or master IP} from Master ( here it should not ask passwd).


From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com]
Sent: Thursday, October 25, 2012 7:49 PM
To: user@hadoop.apache.org
Subject: RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

Thanks All,

The copy has been done but here comes another horrible issue.

when I log in as Master


ssh Master it asks for Password

Master:~ mediaadmin$ ssh Master
Password: abc
Last login: Thu Oct 25 17:13:30 2012
Master:~ mediaadmin$


and for Slave it dosent ask.

Master:~ mediaadmin$ ssh pluto@Slave
Last login: Thu Oct 25 17:15:16 2012 from 10.203.33.80
plutos-iMac:~ pluto$



now if I run command start-dfs.sh from Master logged in terminal it asks for 
passwords

Master:~ mediaadmin$ ssh master
Password:
Last login: Thu Oct 25 17:16:44 2012 from master
Master:~ mediaadmin$ start-dfs.sh
starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
Password:Password: abc
Password: abc
Password:
Master: Permission denied (publickey,keyboard-interactive).
Password:
Password:
Slave: Permission denied (publickey,keyboard-interactive).



Why is it asking for password when I have configure password less ssh?
And even its not accepting Master password and Slave password..


Please help and suggest

Thanks  regards
Yogesh Kumar


From: Nitin Pawar [nitinpawar...@gmail.com]
Sent: Thursday, October 25, 2012 3:31 PM
To: user@hadoop.apache.org
Subject: Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

operating system you are using will be of good help to answer your question.

Normally the command you are looking for is provided by openssh-clients
Install this package if not already.

If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
 Hi All,

 I am trying to copy the public key by this command.

 Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

 I have two machines Master Name is pluto and same name is of Slave. (Admin)

 And I got this error, Where I am going wrong?

 ssh-copy-id: command not found


 Please suggest

 Thanks  regards
 Yogesh Kumar


 The information contained in this electronic message and any attachments to
 this message are intended for the exclusive use of the addressee(s) and may
 contain proprietary, confidential or privileged information. If you are not
 the intended recipient, you should not disseminate, distribute or copy this
 e-mail. Please notify the sender immediately and destroy all copies of this
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should
 check this email and any attachments for the presence of viruses. The
 company accepts no liability for any damage caused by any virus transmitted
 by this email.

 www.wipro.com



--
Nitin Pawar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please 

Re: Old vs New API

2012-10-25 Thread Alberto Cordioli
Thanks,
Alberto

On 24 October 2012 16:33, Harsh J ha...@cloudera.com wrote:
 Using either is fully supported in 2.x+ at least. Neither is
 deprecated, but I'd personally recommend using the new API going
 forward. There's no known major issues with it.

 FWIW, Apache HBase uses the new API for its MR-side utilities.

 But in any case - no worries if you stick with one over the other for
 whatever reason, not until a couple more major releases I should think.

 On Wed, Oct 24, 2012 at 5:16 PM, Michael Segel
 michael_se...@hotmail.com wrote:
 They were official, back around 2009, hence the first API was deprecated.

 The reason that they removed the deprecation was that the 'new' API didn't 
 have all of the features/methods of the old APIs.

 I learned using the new APIs and ToolRunner is your friend.
 So I would suggest using the new APIs.

 But that's just me.


 On Oct 24, 2012, at 5:02 AM, Alberto Cordioli cordioli.albe...@gmail.com 
 wrote:

 Thanks Bejoy,

 my only concern is that the new api were to become official quite
 some time ago, but this seems to be a long process.
 And honestly I don't understand why. The changes are not so invasive.
 I just want to be sure to learn the more suitable api for the future.

 Anyway, as you said, let's see if a committer can comment on this.


 Alberto

 On 22 October 2012 15:40, Bejoy KS bejoy.had...@gmail.com wrote:
 Hi alberto

 The new mapreduce API is coming to shape now. The majority of the classes 
 available in old API has been ported to new API as well.

 The Old mapred API was marked depreciated in an earlier version of hadoop 
 (0.20.x) but later it was un-depreciated as all the functionality in old 
 API was not available in new mapreduce API at that point.

 Now mapreduce API is pretty good and you can go ahead with that for 
 development.  AFAIK mapreduce API is the future.

 Let's wait for a commiter to officially comment on this.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Alberto Cordioli cordioli.albe...@gmail.com
 Date: Mon, 22 Oct 2012 15:22:41
 To: user@hadoop.apache.org
 Reply-To: user@hadoop.apache.org
 Subject: Old vs New API

 Hi all,

 I am using last stable Hadoop version (1.0.3) and I am implementing
 right now my first MR jobs.
 I read about the presence of 2 API: the old and the new one. I read
 some stuff about them, but I am not able to find quite fresh news.
 I read that the old api was deprecated, but in my version they do not
 seem to. Moreover the new api does not have all the features
 implemented (see for example the package contrib with its classes to
 do joins).

 I found this post on the ML:
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3ca6906bde1002230730s24d6092av1e57b46bad806...@mail.gmail.com%3E
 but it is very old (2010) and I think that further changes have been
 made meanwhile.

 My question is: does make sense to use the new api, instead of the old
 one? Does this new version providing other functionalities with
 respect to the older one?
 Or, given the slow progress in implementation, is better to use the old 
 api?


 Thanks.



 --
 Alberto Cordioli





 --
 Harsh J



-- 
Alberto Cordioli


RE: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread Kartashov, Andy
Yogesh,

Have you figured it out?  I had the same issue (needed passwordless ssh) and 
managed. Let me know if you are still stuck.

AK47

From: yogesh.kuma...@wipro.com [mailto:yogesh.kuma...@wipro.com]
Sent: Thursday, October 25, 2012 4:28 AM
To: user@hadoop.apache.org
Subject: ERROR:: SSH failour for distributed node hadoop cluster
Importance: High

Hi all,

I am trying to run the command

ssh Master

it runs and shows after entering password.
Password: abc
Last login: Thu Oct 25 13:51:06 2012 from master

But ssh for Slave through error.



ssh Slave

it asks for password ans denie

Password: abc
Password: abc
Password: abc
Permission denied (publickey,keyboard-interactive).

Please suggest.
Thanks  regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.comhttp://www.wipro.com

NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont 
confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est 
interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, 
supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? 
l'environnement avant d'imprimer le pr?sent courriel


RE: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread Kartashov, Andy
Yogesh,

One need to understand how passwordless ssh work.

Say there is a user “yosh”
He types ssh localhost and is prompted for a password. This is how to resolve 
this.

1.
Type : ssh-keygen -t rsa

-t stand for type and rsa (encryption)  - another type will be dsa.

Well, after you run above command,  a pair of private and public keys is 
generated and stored in your /home/yosh/.ssh folder.

“cd” in there and you will see two files. Id_rsa (private key) and id_rsa_pub 
(public key).

If you create a file called “authorized_keys” and place your public key in 
there you will achieve psswdless access. How to do it?

2.
cat ~/.ssh/id_rsa.pub  ~/.ssh/authorized_keys

The above command reads your PubKey and appends to the file. After that try 
“ssh localhost” again. Voila! No more password request.

Now, imagine you are ssh-ing into this very same machine from another computer 
where u  log-in as yosh. “ssh  ip”.  The machine will similarly let you in 
without password.

p.s. If you do not know ip address of that machine, find out by simply typing 
“hostname –i”.

Happy hadoop’ing.

AK47


From: yogesh dhari [mailto:yogeshdh...@live.com]
Sent: Thursday, October 25, 2012 12:12 PM
To: hadoop helpforoum
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster


Hi Andy,

I am still at that point, not able to solve it yet

Please help, Same issue with all settings and commands I have raise by Subject
ERROR: : Hadoop Installation in distributed mode.‏
to the mailing llist.


Thanks  Regards
Yogesh Kumar


From: andy.kartas...@mpac.ca
To: user@hadoop.apache.org
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster
Date: Thu, 25 Oct 2012 16:01:56 +
Yogesh,

Have you figured it out?  I had the same issue (needed passwordless ssh) and 
managed. Let me know if you are still stuck.

AK47

From: yogesh.kuma...@wipro.com [mailto:yogesh.kuma...@wipro.com]
Sent: Thursday, October 25, 2012 4:28 AM
To: user@hadoop.apache.org
Subject: ERROR:: SSH failour for distributed node hadoop cluster
Importance: High

Hi all,

I am trying to run the command

ssh Master

it runs and shows after entering password.
Password: abc
Last login: Thu Oct 25 13:51:06 2012 from master

But ssh for Slave through error.



ssh Slave

it asks for password ans denie

Password: abc
Password: abc
Password: abc
Permission denied (publickey,keyboard-interactive).

Please suggest.
Thanks  regards
Yogesh Kumar
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
www.wipro.comhttp://www.wipro.com
NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont 
confidentiels, protégés par le droit d'auteur et peuvent être couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autorisée est 
interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, 
supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à 
l'environnement avant d'imprimer le présent courriel
NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont 
confidentiels, protégés par le droit d'auteur et peuvent être couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autorisée est 
interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, 
supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à 
l'environnement avant d'imprimer le présent courriel


RE: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread yogesh dhari

Thanks Andy,

I got your point :-),

What I have done earlier..

I have configured 

1) One as a Master ( plays role of 
both Name node and Data Node)

2) Second as a Slave ( 
Only date node)



I have give same name to both Machines and they have Admin access.
pluto ( for both Master and Slave).



and generated ssh key pair through this way



Master:~ mediaadmin$ ssh-keygen -t rsa -P  

Generating public/private rsa key pair.

Enter file in which to save the key (/Users/mediaadmin/.ssh/id_rsa): 

/Users/mediaadmin/.ssh/id_rsa already exists.

Overwrite (y/n)? y

Your identification has been saved in /Users/mediaadmin/.ssh/id_rsa.

Your public key has been saved in /Users/mediaadmin/.ssh/id_rsa.pub.

The key fingerprint is:

32:0f:be:7a:ea:44:a1:4a:b6:b1:85:3f:1f:56:48:4b mediaadmin@Master

The key's randomart image is:

+--[ RSA 2048]+

| |

| |

|.E   |

|  ..o.o  |

| =...o+.S|

|o.*. ..= |

|.o o.o. .|

|   .+ o. |

|   .+=.  |

+-+

Copy to home

Master:~ mediaadmin$ cat $HOME/.ssh/id_rsa.pub  $HOME/.ssh/authorized_keys





and copy to Slave Node as 

mediaadmin$ cat ~/.ssh/id_rsa.pub | ssh pluto@Slave 'cat  
~/.ssh/authorized_keys'





Login to Slave node

Master:~ mediaadmin$ ssh pluto@Slave

Last login: Thu Oct 25 18:54:15 2012 from 10.203.33.80





and Log out

plutos-iMac:~ pluto$ exit

logout



Connection to Slave closed.





Login to Master node (here it asks for password)

Master:~ mediaadmin$ ssh Master

Password:

Last login: Thu Oct 25 19:03:27 2012 from master


HERE COMES THE ISSUE



when I run start-dfs.sh, its asks for password and I did enter the pass word of 
Master but i didn't accept.




Master:~ mediaadmin$ start-dfs.sh

starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out

Password:Password:

Password:

Password:

Slave: Permission denied (publickey,keyboard-interactive).



Password:

Master: starting datanode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-datanode-Master.out

Password:

Master: starting secondarynamenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-secondarynamenode-Master.out









When I enter password of Slave then it again goes like this.



Master:~ mediaadmin$ start-dfs.sh

starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out

Password:Password:

Password:

Password:

Password:

Password:

Master: Permission denied (publickey,keyboard-interactive).



Slave: Permission denied (publickey,keyboard-interactive).

Password:

Password:

Password:

Master: Permission denied (publickey,keyboard-interactive).









Please suggest me where I am going wrong, and why is asking for password when I 
configured it as password less SSH,



where I doing mistake in SSH or some thing else.





Thanks  Regards

Yogesh Kumar Dhari


From: andy.kartas...@mpac.ca
To: user@hadoop.apache.org
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster
Date: Thu, 25 Oct 2012 16:37:36 +








Yogesh,
 
One need to understand how passwordless ssh work.
 
Say there is a user “yosh”

He types ssh localhost and is prompted for a password. This is how to resolve 
this.
 
1.
Type : ssh-keygen -t rsa
 
-t stand for type and rsa (encryption)  - another type will be dsa.
 
Well, after you run above command,  a pair of private and public keys is 
generated and stored in your /home/yosh/.ssh folder.
 
“cd” in there and you will see two files. Id_rsa (private key) and id_rsa_pub 
(public key).
 
If you create a file called “authorized_keys” and place your public key in 
there you will achieve psswdless access. How to do it?
 
2.
cat ~/.ssh/id_rsa.pub  ~/.ssh/authorized_keys
 
The above command reads your PubKey and appends to the file. After that try 
“ssh localhost” again. Voila! No more password request.
 
Now, imagine you are ssh-ing into this very same machine from another computer 
where u  log-in as yosh. “ssh  ip”.  The machine will similarly let you in
 without password.
 
p.s. If you do not know ip address of that machine, find out by simply typing 
“hostname –i”.
 
Happy hadoop’ing.
 
AK47

 

 


From: yogesh dhari [mailto:yogeshdh...@live.com]


Sent: Thursday, October 25, 2012 12:12 PM

To: hadoop helpforoum

Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster


 



Hi Andy,



I am still at that point, not able to solve it yet



Please help, Same issue with all settings and commands I have raise by Subject 

ERROR: : Hadoop Installation in distributed mode.‏
to the mailing llist.





Thanks  Regards

Yogesh Kumar








From: andy.kartas...@mpac.ca

To: user@hadoop.apache.org

Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster

Date: Thu, 25 Oct 2012 16:01:56 +

Yogesh,
 
Have you figured it out?  I had the same issue (needed passwordless ssh) and 
managed. 

RE: ERROR:: SSH failour for distributed node hadoop cluster

2012-10-25 Thread Kartashov, Andy
Yoges,

If you are asked for a password you PSSWDLSS SSH is not working.

Shoot, forgot one detail. Please rememeber to set file authorized_keys to 600 
permission. :)


From: yogesh dhari [mailto:yogeshdh...@live.com]
Sent: Thursday, October 25, 2012 1:14 PM
To: hadoop helpforoum
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster

Thanks Andy,

I got your point :-),

What I have done earlier..

I have configured
1) One as a Master ( plays role of both Name node and Data Node)
2) Second as a Slave ( Only date node)

I have give same name to both Machines and they have Admin access. pluto ( for 
both Master and Slave).

and generated ssh key pair through this way

Master:~ mediaadmin$ ssh-keygen -t rsa -P  
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/mediaadmin/.ssh/id_rsa):
/Users/mediaadmin/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /Users/mediaadmin/.ssh/id_rsa.
Your public key has been saved in /Users/mediaadmin/.ssh/id_rsa.pub.
The key fingerprint is:
32:0f:be:7a:ea:44:a1:4a:b6:b1:85:3f:1f:56:48:4b mediaadmin@Master
The key's randomart image is:
+--[ RSA 2048]+
| |
| |
|.E   |
|  ..o.o  |
| =...o+.S|
|o.*. ..= |
|.o o.o. .|
|   .+ o. |
|   .+=.  |
+-+

Copy to home
Master:~ mediaadmin$ cat $HOME/.ssh/id_rsa.pub  $HOME/.ssh/authorized_keys


and copy to Slave Node as
mediaadmin$ cat ~/.ssh/id_rsa.pub | ssh pluto@Slave 'cat  
~/.ssh/authorized_keys'


Login to Slave node
Master:~ mediaadmin$ ssh pluto@Slave
Last login: Thu Oct 25 18:54:15 2012 from 10.203.33.80


and Log out
plutos-iMac:~ pluto$ exit
logout

Connection to Slave closed.


Login to Master node (here it asks for password)
Master:~ mediaadmin$ ssh Master
Password:
Last login: Thu Oct 25 19:03:27 2012 from master

HERE COMES THE ISSUE

when I run start-dfs.sh, its asks for password and I did enter the pass word of 
Master but i didn't accept.

Master:~ mediaadmin$ start-dfs.sh
starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
Password:Password:
Password:
Password:
Slave: Permission denied (publickey,keyboard-interactive).

Password:
Master: starting datanode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-datanode-Master.out
Password:
Master: starting secondarynamenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-secondarynamenode-Master.out




When I enter password of Slave then it again goes like this.

Master:~ mediaadmin$ start-dfs.sh
starting namenode, logging to 
/HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
Password:Password:
Password:
Password:
Password:
Password:
Master: Permission denied (publickey,keyboard-interactive).


Slave: Permission denied (publickey,keyboard-interactive).
Password:
Password:
Password:
Master: Permission denied (publickey,keyboard-interactive).




Please suggest me where I am going wrong, and why is asking for password when I 
configured it as password less SSH,

where I doing mistake in SSH or some thing else.


Thanks  Regards
Yogesh Kumar Dhari


From: andy.kartas...@mpac.ca
To: user@hadoop.apache.org
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster
Date: Thu, 25 Oct 2012 16:37:36 +
Yogesh,

One need to understand how passwordless ssh work.

Say there is a user “yosh”
He types ssh localhost and is prompted for a password. This is how to resolve 
this.

1.
Type : ssh-keygen -t rsa

-t stand for type and rsa (encryption)  - another type will be dsa.

Well, after you run above command,  a pair of private and public keys is 
generated and stored in your /home/yosh/.ssh folder.

“cd” in there and you will see two files. Id_rsa (private key) and id_rsa_pub 
(public key).

If you create a file called “authorized_keys” and place your public key in 
there you will achieve psswdless access. How to do it?

2.
cat ~/.ssh/id_rsa.pub  ~/.ssh/authorized_keys

The above command reads your PubKey and appends to the file. After that try 
“ssh localhost” again. Voila! No more password request.

Now, imagine you are ssh-ing into this very same machine from another computer 
where u  log-in as yosh. “ssh  ip”.  The machine will similarly let you in 
without password.

p.s. If you do not know ip address of that machine, find out by simply typing 
“hostname –i”.

Happy hadoop’ing.

AK47


From: yogesh dhari [mailto:yogeshdh...@live.com]
Sent: Thursday, October 25, 2012 12:12 PM
To: hadoop helpforoum
Subject: RE: ERROR:: SSH failour for distributed node hadoop cluster


Hi Andy,

I am still at that point, not able to solve it yet

Please help, Same issue with all settings and commands I have raise by Subject
ERROR: : Hadoop Installation in distributed mode.‏
to the mailing llist.


Thanks  Regards
Yogesh Kumar

From: 

Re: HDFS HA IO Fencing

2012-10-25 Thread Steve Loughran
On 25 October 2012 14:08, Todd Lipcon t...@cloudera.com wrote:

 Hi Liu,

 Locks are not sufficient, because there is no way to enforce a lock in a
 distributed system without unbounded blocking. What you might be referring
 to is a lease, but leases are still problematic unless you can put bounds
 on the speed with which clocks progress on different machines, _and_ have
 strict guarantees on the way each node's scheduler works. With Linux and
 Java, the latter is tough.


on any OS running in any virtual environment, including EC2, time is
entirely unpredictable, just to make things worse.


On a single machine you can use file locking as the OS will know that the
process is dead and closes the file; other programs can attempt to open the
same file with exclusive locking -and, by getting the right failures, know
that something else has the file, hence the other process is live. Shared
NFS storage you need to mount with softlock set precisely to stop file
locks lasting until some lease has expired, because the on-host liveness
probes detect failure faster and want to react to it.


-Steve


datanode daemon

2012-10-25 Thread Kartashov, Andy
Guys,

I finally solved ALL the Errors: in  ...datanode*.log  after trying to start 
the node with service datanode start.
The errors were:
- conflicting NN DD ids - solved through reformatting NN.
- could not connect to 127.0.0.1:8020 - Connection refused - solved through 
correcting a typo inside hdfs-site.xml under dfs.namenode.http-address; somehow 
had the default value i/o localhost. (Running pseudo-mode)
- conf was pointing to the wrong sLink - solved by running alternatives -set 
hadoop-conf conf.myconf

However, when I run service -status-all, still see that datanode [FAILED] 
message. All others, NN, SNN, JT, TT are running [OK].


1.   Starting daemons, all seems OK:
Starting Hadoop datanode:  [  OK  ]
starting datanode, logging to 
/home/hadoop/logs/hadoop-root-datanode-ip-10-204-47-138.out
Starting Hadoop namenode:  [  OK  ]
starting namenode, logging to 
/home/hadoop/logs/hadoop-hdfs-namenode-ip-10-204-47-138.out
Starting Hadoop secondarynamenode: [  OK  ]
starting secondarynamenode, logging to 
/home/hadoop/logs/hadoop-hdfs-secondarynamenode-ip-10-204-47-138.out

2.
running service -status-all command and get:
Hadoop datanode is not running [FAILED]
Hadoop namenode is running [  OK  ]
Hadoop secondarynamenode is running[  OK  ]

3.
Here is log file on DN:
2012-10-25 15:33:37,554 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = ip-10-204-47-138.ec2.internal/10.204.47.138
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.0.0-cdh4.1.1
STARTUP_MSG:   classpath = /etc/ha..
...
..
2012-10-25 15:33:38,098 WARN org.apache.hadoop.hdfs.server.common.Util: Path 
/home/hadoop/dfs/data should be specified as a URI in configuration files. 
Please update hdfs configuration.
2012-10-25 15:33:41,589 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
loaded properties from hadoop-metrics2.properties
2012-10-25 15:33:42,125 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Scheduled snapshot period at 10 second(s).
2012-10-25 15:33:42,125 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
DataNode metrics system started
2012-10-25 15:33:42,204 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Configured hostname is ip-10-204-47-138.ec2.internal
2012-10-25 15:33:42,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened streaming server at /0.0.0.0:50010
2012-10-25 15:33:42,323 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Balancing bandwith is 1048576 bytes/s
2012-10-25 15:33:42,412 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-10-25 15:33:42,603 INFO org.apache.hadoop.http.HttpServer: Added global 
filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-10-25 15:33:42,607 INFO org.apache.hadoop.http.HttpServer: Added filter 
static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context datanode
2012-10-25 15:33:42,607 INFO org.apache.hadoop.http.HttpServer: Added filter 
static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context logs
2012-10-25 15:33:42,607 INFO org.apache.hadoop.http.HttpServer: Added filter 
static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context static
2012-10-25 15:33:42,682 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened info server at 0.0.0.0:50075
2012-10-25 15:33:42,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
dfs.webhdfs.enabled = false
2012-10-25 15:33:42,690 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50075
2012-10-25 15:33:42,690 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2012-10-25 15:33:43,601 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50075
2012-10-25 15:33:43,787 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 50020
2012-10-25 15:33:43,905 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened IPC server at /0.0.0.0:50020
2012-10-25 15:33:43,917 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Refresh request received for nameservices: null
2012-10-25 15:33:43,943 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Starting BPOfferServices for nameservices: default
2012-10-25 15:33:43,950 WARN org.apache.hadoop.hdfs.server.common.Util: Path 
/home/hadoop/dfs/data should be specified as a URI in configuration files. 
Please update hdfs configuration.
2012-10-25 15:33:43,958 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool registering (storage id unknown) service to 
localhost/127.0.0.1:8020 starting to offer service

Re: File Permissions on s3 FileSystem

2012-10-25 Thread Parth Savani
Hello Harsh,
 I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here -
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J ha...@cloudera.com wrote:

 Hey Parth,

 I don't think its possible to run MR by basing the FS over S3
 completely. You can use S3 for I/O for your files, but your
 fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
 filesystems. This way, your MR framework can run/distribute its files
 well, and also still be able to process S3 URLs passed as input or
 output locations.

 On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani pa...@sensenetworks.com
 wrote:
  Hello Everyone,
  I am trying to run a hadoop job with s3n as my filesystem.
  I changed the following properties in my hdfs-site.xml
 
  fs.default.name=s3n://KEY:VALUE@bucket/
  mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
 
  When i run the job from ec2, I get the following error
 
  The ownership on the staging directory
  s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
 owned
  by   The directory must be owned by the submitter ec2-user or by ec2-user
  at
 
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
  at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
 
  I am using cloudera CDH4 hadoop distribution. The error is thrown from
  JobSubmissionFiles.java class
   public static Path getStagingDir(JobClient client, Configuration conf)
throws IOException, InterruptedException {
  Path stagingArea = client.getStagingAreaDir();
  FileSystem fs = stagingArea.getFileSystem(conf);
  String realUser;
  String currentUser;
  UserGroupInformation ugi = UserGroupInformation.getLoginUser();
  realUser = ugi.getShortUserName();
  currentUser =
 UserGroupInformation.getCurrentUser().getShortUserName();
  if (fs.exists(stagingArea)) {
FileStatus fsStatus = fs.getFileStatus(stagingArea);
String owner = fsStatus.getOwner();
if (!(owner.equals(currentUser) || owner.equals(realUser))) {
   throw new IOException(The ownership on the staging directory 
 +
stagingArea +  is not as expected.  +
It is owned by  + owner + . The directory must
  +
be owned by the submitter  + currentUser +  or
  +
by  + realUser);
}
if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
  LOG.info(Permissions on staging directory  + stagingArea + 
 are 
  +
incorrect:  + fsStatus.getPermission() + . Fixing
 permissions 
  +
to correct value  + JOB_DIR_PERMISSION);
  fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
}
  } else {
fs.mkdirs(stagingArea,
new FsPermission(JOB_DIR_PERMISSION));
  }
  return stagingArea;
}
 
 
 
  I think my job calls getOwner() which returns NULL since s3 does not have
  file permissions which results in the IO exception that i am getting.
 
  Any workaround for this? Any idea how i could you s3 as the filesystem
 with
  hadoop on distributed mode?



 --
 Harsh J



Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

2012-10-25 Thread Andy Isaacson
On Thu, Oct 25, 2012 at 7:01 AM,  yogesh.kuma...@wipro.com wrote:
 Hi Brahma,

 I am on Mac OS X it dosent have  copy cmd i.e

 sh-copy-id -i

 I copyed it as

 mediaadmin$ cat ~/.ssh/id_rsa.pub | ssh pluto@10.203.33.80 'cat  
 ~/.ssh/authorized_keys'
 Password:

 and did
 ssh 10.203.33.80 and it asked for password.

The most common cause of this (once the .pub is properly copied to
authorized_keys) is permissions problems. Make sure your .ssh/
directory is mode 700 (rwx--) and your .ssh/authorized_keys file
is mode 600 (rw---) and your home directory is not group-writable.

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chmod go-w ~

If the remote machine is running RHEL or CentOS, you might also have a
SELinux labelling problem.
https://bugzilla.redhat.com/show_bug.cgi?id=499343 and
http://bugs.centos.org/view.php?id=4959

The workaround is to use the restorecon command, but it is easy to
misuse and I am not an expert so I am uncomfortable providing explicit
instructions. Check your /var/log/auth.log or similar on the remote
host to see if it has messages explaining why pubkey authentication
was not allowed.

-andy


 Master:~ mediaadmin$ ssh 10.203.33.80
 Password:
 Last login: Thu Oct 25 19:04:31 2012 from master


 Please suggest

 Thanks  regards
 Yogesh Kumar

 
 From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com]
 Sent: Thursday, October 25, 2012 6:38 PM
 To: user@hadoop.apache.org
 Subject: RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

 I think master machine authorized-key  is missed.

 Please do following..

 ssh-copy-id -i ~/.ssh/id_rsa.pub {IP of Master machine}..

 Before starting cluster better to check whether ssh is enable or not by doing 
 ssh {slave or master IP} from Master ( here it should not ask passwd).

 
 From: yogesh.kuma...@wipro.com [yogesh.kuma...@wipro.com]
 Sent: Thursday, October 25, 2012 7:49 PM
 To: user@hadoop.apache.org
 Subject: RE: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

 Thanks All,

 The copy has been done but here comes another horrible issue.

 when I log in as Master


 ssh Master it asks for Password

 Master:~ mediaadmin$ ssh Master
 Password: abc
 Last login: Thu Oct 25 17:13:30 2012
 Master:~ mediaadmin$


 and for Slave it dosent ask.

 Master:~ mediaadmin$ ssh pluto@Slave
 Last login: Thu Oct 25 17:15:16 2012 from 10.203.33.80
 plutos-iMac:~ pluto$



 now if I run command start-dfs.sh from Master logged in terminal it asks for 
 passwords

 Master:~ mediaadmin$ ssh master
 Password:
 Last login: Thu Oct 25 17:16:44 2012 from master
 Master:~ mediaadmin$ start-dfs.sh
 starting namenode, logging to 
 /HADOOP/hadoop-0.20.2/bin/../logs/hadoop-mediaadmin-namenode-Master.out
 Password:Password: abc
 Password: abc
 Password:
 Master: Permission denied (publickey,keyboard-interactive).
 Password:
 Password:
 Slave: Permission denied (publickey,keyboard-interactive).
 


 Why is it asking for password when I have configure password less ssh?
 And even its not accepting Master password and Slave password..


 Please help and suggest

 Thanks  regards
 Yogesh Kumar

 
 From: Nitin Pawar [nitinpawar...@gmail.com]
 Sent: Thursday, October 25, 2012 3:31 PM
 To: user@hadoop.apache.org
 Subject: Re: ERROR: ssh-copy-id: command not found IN HADOOP DISTRIBUTED MODE

 operating system you are using will be of good help to answer your question.

 Normally the command you are looking for is provided by openssh-clients
 Install this package if not already.

 If installed normally on a redhat system its placed at /usr/bin/ssh-copy-id

 On Thu, Oct 25, 2012 at 3:24 PM,  yogesh.kuma...@wipro.com wrote:
 Hi All,

 I am trying to copy the public key by this command.

 Master:~ mediaadmin$ ssh-copy -id -i $HOME/.ssh/id_rsa.pub pluto@Slave

 I have two machines Master Name is pluto and same name is of Slave. (Admin)

 And I got this error, Where I am going wrong?

 ssh-copy-id: command not found


 Please suggest

 Thanks  regards
 Yogesh Kumar


 The information contained in this electronic message and any attachments to
 this message are intended for the exclusive use of the addressee(s) and may
 contain proprietary, confidential or privileged information. If you are not
 the intended recipient, you should not disseminate, distribute or copy this
 e-mail. Please notify the sender immediately and destroy all copies of this
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should
 check this email and any attachments for the presence of viruses. The
 company accepts no liability for any damage caused by any virus transmitted
 by this email.

 www.wipro.com



 --
 Nitin Pawar

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are 

Re: reference architecture

2012-10-25 Thread Steve Loughran
On 25 October 2012 20:24, Daniel Käfer d.kae...@hs-furtwangen.de wrote:

 Hello all,

 I'm looking for a reference architecture for hadoop. The only result I
 found is Lambda architecture from Nathan Marz[0].


I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, how to store the data, where he looks at all the options.
Joining is the other big issue.

http://steveloughran.blogspot.co.uk/2012/10/hadoop-in-practice-applied-hadoop.html

Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly


 With architecture I mean answers to question like:
 - How should I store the data? CSV, Thirft, ProtoBuf
 - How should I model the data? ER-Model, Starschema, something new?
 - normalized or denormalized or both (master data normalized, then
 transformation to denormalized, like ETL)
 - How should i combine database and HDFS-Files?

 Are there any other documented architectures for hadoop?

 Regards
 Daniel Käfer


 [0] http://www.manning.com/marz/ just a preprint yet, not completed




Re: Unsatisfied link error - how to load native library without copying it in /lib/native folder

2012-10-25 Thread Brock Noland
Hi,

That should be:

-files path_to_my_library.so

and to include jars in for your mrjobs, you would do:

2) -libjars path_to_my1.jar,path_to_my2.jar

Brock

On Thu, Oct 25, 2012 at 6:10 PM, Dipesh Khakhkhar
dipeshsoftw...@gmail.com wrote:
 Hi,

 I am a new hadoop user and have few very basic questions (they might sound
 very stupid to many people so please bear with me).

 I am running a MR task and my launcher program needs to load a library using
 System.loadLibrary(somelibrary). This works fine if I put this library in
 lib/native/Linux-amd64-64. I tried the following -

 1. provided -files=/path_to_directory_containging_my_library
 2. provided the following in mapred-site.xml (didn't try it in core-site.xml
 or hdfs-site.xml)

 -Djava.library.path=//path_to_directory_containging_my_library

 I'm using hadoop 1.0.3 and this is a single node cluster for testing
 purpose.

 I have a production environment where I'm running 4 data nodes and currently
 I'm copying this file in  lib/native/Linux-amd64-64 folder in each node's
 hadoop installation.

 A related question regarding providing jars required for running the whole
 M/R application - currently I have edited hadoop-classpath variable in
 hadoop-env.sh. For cluster if I provide -libjars option will that work
 without editing classpath? I require this jar's classes before launching M/R
 jobs.

 Also how can I provide my application jar ( i.e. bin/hadoop jar myjar
 com.x.x.ProgramName )  in the data nodes? Currently I'm copying it in the
 lib directory of hadoop installation.

 Thanks in advance for answering my queries.

 Thanks.



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


Re: Unsatisfied link error - how to load native library without copying it in /lib/native folder

2012-10-25 Thread Dipesh Khakhkhar
Thanks for answering my query.

1. I have tried -files path _o_my_libary.so while invoking my MR
application but I still UnsatisfiedLinkError: no mylibrary in
java.library.path

2. I have removed path to my jar in hadoop-classpath in hadoop-env.sh and
provide -libjars path_to_myfile.jar and tried running my MR application
(bin/hadoop jar..) but it failed to load class from the jar file
mentioned in libjars path. I'm using this classes from this jar before
launching my M/R jobs.

Unfortunately above methods didn't work for me.

Thanks.


On Thu, Oct 25, 2012 at 4:50 PM, Brock Noland br...@cloudera.com wrote:

 Hi,

 That should be:

 -files path_to_my_library.so

 and to include jars in for your mrjobs, you would do:

 2) -libjars path_to_my1.jar,path_to_my2.jar

 Brock

 On Thu, Oct 25, 2012 at 6:10 PM, Dipesh Khakhkhar
 dipeshsoftw...@gmail.com wrote:
  Hi,
 
  I am a new hadoop user and have few very basic questions (they might
 sound
  very stupid to many people so please bear with me).
 
  I am running a MR task and my launcher program needs to load a library
 using
  System.loadLibrary(somelibrary). This works fine if I put this library in
  lib/native/Linux-amd64-64. I tried the following -
 
  1. provided -files=/path_to_directory_containging_my_library
  2. provided the following in mapred-site.xml (didn't try it in
 core-site.xml
  or hdfs-site.xml)
 
  -Djava.library.path=//path_to_directory_containging_my_library
 
  I'm using hadoop 1.0.3 and this is a single node cluster for testing
  purpose.
 
  I have a production environment where I'm running 4 data nodes and
 currently
  I'm copying this file in  lib/native/Linux-amd64-64 folder in each node's
  hadoop installation.
 
  A related question regarding providing jars required for running the
 whole
  M/R application - currently I have edited hadoop-classpath variable in
  hadoop-env.sh. For cluster if I provide -libjars option will that work
  without editing classpath? I require this jar's classes before launching
 M/R
  jobs.
 
  Also how can I provide my application jar ( i.e. bin/hadoop jar myjar
  com.x.x.ProgramName )  in the data nodes? Currently I'm copying it in the
  lib directory of hadoop installation.
 
  Thanks in advance for answering my queries.
 
  Thanks.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/



Re: Unsatisfied link error - how to load native library without copying it in /lib/native folder

2012-10-25 Thread Brock Noland
1) Does your local program use the native library before submitting
the job to the cluster?

Here is an example of using native code in MR
https://github.com/brockn/hadoop-thumbnail

2) I thought libjars would work for local classpath issues as well as
remove. However, to add the jar to your local classpath as well you
can:

env HADOOP_CLASSPATH=my.jar hadoop jar ...

Brock


On Thu, Oct 25, 2012 at 7:11 PM, Dipesh Khakhkhar
dipeshsoftw...@gmail.com wrote:
 Thanks for answering my query.

 1. I have tried -files path _o_my_libary.so while invoking my MR application
 but I still UnsatisfiedLinkError: no mylibrary in java.library.path

 2. I have removed path to my jar in hadoop-classpath in hadoop-env.sh and
 provide -libjars path_to_myfile.jar and tried running my MR application
 (bin/hadoop jar..) but it failed to load class from the jar file
 mentioned in libjars path. I'm using this classes from this jar before
 launching my M/R jobs.

 Unfortunately above methods didn't work for me.

 Thanks.


 On Thu, Oct 25, 2012 at 4:50 PM, Brock Noland br...@cloudera.com wrote:

 Hi,

 That should be:

 -files path_to_my_library.so

 and to include jars in for your mrjobs, you would do:

 2) -libjars path_to_my1.jar,path_to_my2.jar

 Brock

 On Thu, Oct 25, 2012 at 6:10 PM, Dipesh Khakhkhar
 dipeshsoftw...@gmail.com wrote:
  Hi,
 
  I am a new hadoop user and have few very basic questions (they might
  sound
  very stupid to many people so please bear with me).
 
  I am running a MR task and my launcher program needs to load a library
  using
  System.loadLibrary(somelibrary). This works fine if I put this library
  in
  lib/native/Linux-amd64-64. I tried the following -
 
  1. provided -files=/path_to_directory_containging_my_library
  2. provided the following in mapred-site.xml (didn't try it in
  core-site.xml
  or hdfs-site.xml)
 
  -Djava.library.path=//path_to_directory_containging_my_library
 
  I'm using hadoop 1.0.3 and this is a single node cluster for testing
  purpose.
 
  I have a production environment where I'm running 4 data nodes and
  currently
  I'm copying this file in  lib/native/Linux-amd64-64 folder in each
  node's
  hadoop installation.
 
  A related question regarding providing jars required for running the
  whole
  M/R application - currently I have edited hadoop-classpath variable in
  hadoop-env.sh. For cluster if I provide -libjars option will that work
  without editing classpath? I require this jar's classes before launching
  M/R
  jobs.
 
  Also how can I provide my application jar ( i.e. bin/hadoop jar myjar
  com.x.x.ProgramName )  in the data nodes? Currently I'm copying it in
  the
  lib directory of hadoop installation.
 
  Thanks in advance for answering my queries.
 
  Thanks.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


python write hdfs over thrift,gzip file changes

2012-10-25 Thread hisname
 hi,all;
 
I want to write file to hdfs over thrift .
If the file is gzip or tar file , after uploading the files,i find the file 
size changes and can not tar xvzf/xvf anymore .
For normal plain text file ,it works well .
 
[hadoop@HOST s_cripts]$ echo $LANG
en_US.UTF-8
[hadoop@HOST s_cripts]$
[hadoop@HOST s_cripts]$ jps
25868 TaskTracker
9116 Jps
25928 HadoopThriftServer   #the thrift server
25749 JobTracker
25655 SecondaryNameNode
25375 NameNode
25495 DataNode
[hadoop@HOST s_cripts]$
[hadoop@HOST s_cripts]$ pwd
/home/hadoop/hadoop/src/contrib/thriftfs/s_cripts
[hadoop@HOST s_cripts]$ hadoop fs -ls log/ff.tar.gz
ls: Cannot access log/ff.tar.gz: No such file or directory.
[hadoop@HOST s_cripts]$ python hdfs.py
hdfs put ./my.tar.gz log/ff.tar.gz
thrift.protocol.TBinaryProtocol.TBinaryProtocol instance at 0x2348e60
in writeString :688
upload over:688
hdfs quit
[hadoop@HOST s_cripts]$ hadoop fs -ls log/ff.tar.gz
Found 1 items
-rw-r--r--   1 hadoop supergroup   1253 2012-10-25 08:57 
/user/hadoop/log/ff.tar.gz#notice the size here is 1253
[hadoop@HOST s_cripts]$ ls -l my.tar.gz
-rw-rw-r-- 1 hadoop hadoop 688 Oct 24 14:43 my.tar.gz #notice the size here 
is 688
[hadoop@HOST s_cripts]$ file my.tar.gz
my.tar.gz: gzip compressed data, from Unix, last modified: Wed Oct 24 14:43:29 
2012#the file format
[hadoop@HOST s_cripts]$ hadoop fs -get log/ff.tar.gz .
[hadoop@HOST s_cripts]$ file ff.tar.gz
ff.tar.gz: data   #the file format
[hadoop@HOST s_cripts]$ tar xvzf ff.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
[hadoop@HOST s_cripts]$
[hadoop@HOST s_cripts]$ head -2 my.tar.gz |xxd
000: 1f8b 0800 118e 8750 0003 ed99 4d53 db30  ...PMS.0
010: 1086 732d bf42 070e 7040 966c c78e 7da3  ..s-.B..p@.l..}.
020: 4006 2ec0 8c69 7be8 7418 c551 1c37 b2e4  @i{.t..Q.7..
[hadoop@HOST s_cripts]$ head -2 ff.tar.gz |xxd
000: 1fef bfbd 0800 11ef bfbd efbf bd50 0003  .P..
010: efbf bd4d 53ef bfbd 3010 efbf bd73 2def  ...MS...0s-.
020: bfbd 4207 0e70 40ef bfbd 6cc7 8e7d efbf  ..B..p@...l..}..
030: bd40 062e efbf bdef bfbd 697b efbf bd74  .@i{...t
040: 18ef bfbd 511c 37ef bfbd efbf bd65 0aef  Q.7..e..
 
thrift server and hdfs.py client on the same box(HOST) .
If i use hadoop shell cmd to put/get the files,everything goes ok .
It seems that thrift client  write in binnary mode to thrift server,but the 
thrift server write the data encoded in other charset to hdfs files .
Why the uploaded files changes ? thanks a lot !

Re: Unsatisfied link error - how to load native library without copying it in /lib/native folder

2012-10-25 Thread Dipesh Khakhkhar
Yes, I am trying to use both (classes from my jar file and the native
library) before submitting job to the cluster.

Everything works, if i put native library in lib/native/Linux-amd64-64
folder and add path to my jar in hadoop-env.sh

I thought -files/-archives/-libjars options will be very useful in running
jobs on every data node without any need to copy these jars and libraries
to each of the node and that too in the hadoop folders.

Thanks.


On Thu, Oct 25, 2012 at 5:59 PM, Brock Noland br...@cloudera.com wrote:

 1) Does your local program use the native library before submitting
 the job to the cluster?

 Here is an example of using native code in MR
 https://github.com/brockn/hadoop-thumbnail

 2) I thought libjars would work for local classpath issues as well as
 remove. However, to add the jar to your local classpath as well you
 can:

 env HADOOP_CLASSPATH=my.jar hadoop jar ...

 Brock


 On Thu, Oct 25, 2012 at 7:11 PM, Dipesh Khakhkhar
 dipeshsoftw...@gmail.com wrote:
  Thanks for answering my query.
 
  1. I have tried -files path _o_my_libary.so while invoking my MR
 application
  but I still UnsatisfiedLinkError: no mylibrary in java.library.path
 
  2. I have removed path to my jar in hadoop-classpath in hadoop-env.sh and
  provide -libjars path_to_myfile.jar and tried running my MR application
  (bin/hadoop jar..) but it failed to load class from the jar file
  mentioned in libjars path. I'm using this classes from this jar before
  launching my M/R jobs.
 
  Unfortunately above methods didn't work for me.
 
  Thanks.
 
 
  On Thu, Oct 25, 2012 at 4:50 PM, Brock Noland br...@cloudera.com
 wrote:
 
  Hi,
 
  That should be:
 
  -files path_to_my_library.so
 
  and to include jars in for your mrjobs, you would do:
 
  2) -libjars path_to_my1.jar,path_to_my2.jar
 
  Brock
 
  On Thu, Oct 25, 2012 at 6:10 PM, Dipesh Khakhkhar
  dipeshsoftw...@gmail.com wrote:
   Hi,
  
   I am a new hadoop user and have few very basic questions (they might
   sound
   very stupid to many people so please bear with me).
  
   I am running a MR task and my launcher program needs to load a library
   using
   System.loadLibrary(somelibrary). This works fine if I put this library
   in
   lib/native/Linux-amd64-64. I tried the following -
  
   1. provided -files=/path_to_directory_containging_my_library
   2. provided the following in mapred-site.xml (didn't try it in
   core-site.xml
   or hdfs-site.xml)
  
   -Djava.library.path=//path_to_directory_containging_my_library
  
   I'm using hadoop 1.0.3 and this is a single node cluster for testing
   purpose.
  
   I have a production environment where I'm running 4 data nodes and
   currently
   I'm copying this file in  lib/native/Linux-amd64-64 folder in each
   node's
   hadoop installation.
  
   A related question regarding providing jars required for running the
   whole
   M/R application - currently I have edited hadoop-classpath variable in
   hadoop-env.sh. For cluster if I provide -libjars option will that work
   without editing classpath? I require this jar's classes before
 launching
   M/R
   jobs.
  
   Also how can I provide my application jar ( i.e. bin/hadoop jar myjar
   com.x.x.ProgramName )  in the data nodes? Currently I'm copying it in
   the
   lib directory of hadoop installation.
  
   Thanks in advance for answering my queries.
  
   Thanks.
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce -
  http://incubator.apache.org/mrunit/
 
 



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/



MultipleOutputs directed to two different locations

2012-10-25 Thread David Parks
I've got MultipleOutputs configured to generate 2 named outputs. I'd like to
send one to s3n:// and one to hdfs://

Is this possible?  One is a final summary report, the other is input to the
next job.

Thanks,
David




Hadoop 2.0.2 -- warnings

2012-10-25 Thread Nishant Neeraj
I am new to Hadoop. When I execute

bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.2-alpha.jar  pi -
Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory
-libjars
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha.jar 16
1

1. I get many warnings, see footnote [1]. How to get rid of them?
2. I also get WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable.
Why is so?

You can see config files in footnote [2]

= = = =
[1]
12/10/26 09:59:31 WARN conf.Configuration: mapred.job.classpath.files is
deprecated. Instead, use mapreduce.job.classpath.files
12/10/26 09:59:31 WARN conf.Configuration: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
12/10/26 09:59:31 WARN conf.Configuration: mapred.cache.files is
deprecated. Instead, use mapreduce.job.cache.files
12/10/26 09:59:31 WARN conf.Configuration:
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
12/10/26 09:59:31 WARN conf.Configuration: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
12/10/26 09:59:31 WARN conf.Configuration: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class
12/10/26 09:59:31 WARN conf.Configuration:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
12/10/26 09:59:31 WARN conf.Configuration: mapred.used.genericoptionsparser
is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/10/26 09:59:31 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class
12/10/26 09:59:31 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
12/10/26 09:59:31 WARN conf.Configuration: mapreduce.reduce.class is
deprecated. Instead, use mapreduce.job.reduce.class
12/10/26 09:59:31 WARN conf.Configuration: mapreduce.inputformat.class is
deprecated. Instead, use mapreduce.job.inputformat.class
12/10/26 09:59:31 WARN conf.Configuration: mapred.input.dir is deprecated.
Instead, use mapreduce.input.fileinputformat.inputdir
12/10/26 09:59:31 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
12/10/26 09:59:31 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class
12/10/26 09:59:31 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
12/10/26 09:59:31 WARN conf.Configuration: mapred.cache.files.timestamps is
deprecated. Instead, use mapreduce.job.cache.files.timestamps
12/10/26 09:59:31 WARN conf.Configuration: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class
12/10/26 09:59:31 WARN conf.Configuration: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir



[2]
!-- core-site.xml --
configuration
 property
  namefs.default.name/name
  valuehdfs://localhost:9000/value
 /property
/configuration
--
!-- hdfs-site.xml --
configuration
 property
  namedfs.replication/name
  value1/value
 /property
/configuration
---
!-- mapred-site.xml --
configuration
 property
  namemapreduce.framework.name/name
  valueyarn/value
 /property
/configuration

!-- yarn-site.xml --
configuration
 property
  nameyarn.nodemanager.aux-services/name
  valuemapreduce.shuffle/value
 /property
 property
  nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
  valueorg.apache.hadoop.mapred.ShuffleHandler/value
 /property
/configuration