Re: How to run many jobs at the same time?

2009-04-22 Thread nguyenhuynh.mr
Billy Pearson wrote:

 The only way I know of is try using different Scheduling Queue's for
 each group

 Billy

 nguyenhuynh.mr nguyenhuynh...@gmail.com wrote in message
 news:49ee6e56.7080...@gmail.com...
 Tom White wrote:

 You need to start each JobControl in its own thread so they can run
 concurrently. Something like:

 Thread t = new Thread(jobControl);
 t.start();

 Then poll the jobControl.allFinished() method.

 Tom

 On Tue, Apr 21, 2009 at 10:02 AM, nguyenhuynh.mr
 nguyenhuynh...@gmail.com wrote:

 Hi all!


 I have some jobs: job1, job2, job3,... . Each job working with the
 group. To control jobs, I have JobControllers, each JobController
 control  jobs  follow the  specified  group.


 Example:

 - Have 2 Group: g1 and g2

 - 2 JobController: jController1, jcontroller2

  + jController1 contains jobs: job1, job2, job3, ...

  + jController2 contains jobs: job1, job2, job3, ...


 * To run jobs, I sue:

 for (i=0; i2; i++){

jCtrl[i]= new jController(group i);

jCtrl[i].run();

 }


 * I want jController1 and jController2 run parallel. But actual, when
 jController1 finished,  jController2 begin run.


 Why?

 Please help me!


 * P/s: jController use org.apache.hadoop.mapred.jobcontrol.JobControl


 Thanks,


 cheer,

 Nguyen.





 Thanks for your response!

 I have used Thread to start JobControl, some things like:

 public class JobController{

public JobController(String g){
   .
}

public run(){
   Job j1 = new Job(..);
   Job j2 =new Job(..);
   JobControl jc = new JobControl(group1);

   Threat t=new Thread(jc);
   t.start();

  while(! jc.allFinish()){
  // Display state 
   }
}
 }

 * To run the code some like:
 JobController[] jController=new JController[2];
 for (int i=0; i2; i++){
jController[i]=new JobController(group[i]);
JCOntroller[i].run();

 }

 * But not parallel run :( !

 Please help me!

 Thanks,

 Best regards,
 Nguyen,





Thanks for all your help!

Please show detail your solution and give me a example.

Thanks much,

Best regards,
Nguyen.



Re: Num map task?

2009-04-22 Thread Edward J. Yoon
Hi,

In that case, The atomic unit of split is a file. So, you need to
increase the number of files. or Use the TextInputFormat as below.

jobConf.setInputFormat(TextInputFormat.class);

On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr
nguyenhuynh...@gmail.com wrote:
 Hi all!


 I have a MR job use to import contents into HBase.

 The content is text file in HDFS. I used the maps file to store local
 path of contents.

 Each content has the map file. ( the map is a text file in HDFS and
 contain 1 line info).


 I created the maps directory used to contain map files. And the  this
 maps directory used to input path for job.

 When i run job, the number map task is same number map files.
 Ex: I have 5 maps file - 5 map tasks.

 Therefor, the map phase is slowly :(

 Why the map phase is slowly if the number map task large and the number
 map task is equal number of files?.

 * p/s: Run jobs with: 3 node: 1 server and 2 slaver

 Please help me!
 Thanks.

 Best,
 Nguyen.






-- 
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org


Re: Num map task?

2009-04-22 Thread nguyenhuynh.mr
Edward J. Yoon wrote:

 Hi,

 In that case, The atomic unit of split is a file. So, you need to
 increase the number of files. or Use the TextInputFormat as below.

 jobConf.setInputFormat(TextInputFormat.class);

 On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr
 nguyenhuynh...@gmail.com wrote:
   
 Hi all!


 I have a MR job use to import contents into HBase.

 The content is text file in HDFS. I used the maps file to store local
 path of contents.

 Each content has the map file. ( the map is a text file in HDFS and
 contain 1 line info).


 I created the maps directory used to contain map files. And the  this
 maps directory used to input path for job.

 When i run job, the number map task is same number map files.
 Ex: I have 5 maps file - 5 map tasks.

 Therefor, the map phase is slowly :(

 Why the map phase is slowly if the number map task large and the number
 map task is equal number of files?.

 * p/s: Run jobs with: 3 node: 1 server and 2 slaver

 Please help me!
 Thanks.

 Best,
 Nguyen.



 



   
Current, I use TextInputformat to set InputFormat for map phase.


Custom Input Split

2009-04-22 Thread Rakhi Khatwani
Hi,
 I have a table with N records,
 now i want to run a map reduce job with 4 maps and 0 reduces.
 is there a way i can create my own custom input split so that i can
send 'n' records to each map??
if there is a way, can i have a sample code snippet to gain better
understanding?

Thanks
Raakhi.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.


2009/4/22 jason hadoop jason.had...@gmail.com

 Most likely that machine is affected by some firewall somewhere that
 prevents traffic on port 50075. The no route to host is a strong indicator,
 particularly if the Datanote registered with the namenode.



Yes, this was my first thought as well. But there is no firewall, and the
port can be connected via netcat from any other machine.

Any other idea?

Thanks.


NameNode Startup Problem

2009-04-22 Thread Tamir Kamara
Hi,

After a while working with hadoop I'm now faced with a situation where the
namenode won't start up. I'm working with a patched up version of 0.19.1
with ganglia patches (3422, 4675) and with 5269 which suppose to deal with
killed_unclean task status and the massive serious problem lines in the JT
logs.
The latest NN logs are below.

Can you help me figure out what is going on ?

Thanks,
Tamir



2009-04-22 18:12:36,966 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = lb-emu-3/192.168.14.11
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.19.2-dev
STARTUP_MSG:   build =  -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50
IDT 2009
/
2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2009-04-22 18:12:37,456 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
lb-emu-3.israel.verisign.com/192.168.14.11:54310
2009-04-22 18:12:37,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-04-22 18:12:37,474 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC
ontext
2009-04-22 18:12:37,627 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2009-04-22 18:12:37,628 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2009-04-22 18:12:37,628 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-04-22 18:12:37,649 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.sp
i.NullContext
2009-04-22 18:12:37,651 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 3427
2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 28
2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 488333 loaded in 0 seconds.
2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of size
82110 edits # 477 loaded in
 0 seconds.
2009-04-22 18:12:40,893 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached
end of edit log Number of transactions found 36635
2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of
size 5229334 edits # 36635 l
oaded in 2 seconds.
2009-04-22 18:12:41,024 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: saveLeases found path
/tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2
but no matching entry in namespace.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server: Stopping server
on 54310
2009-04-22 18:12:41,038 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
saveLeases found path
/tmp/temp623789763/tmp659456056/_temporary/_attempt_20090
4211331_0010_r_02_0/part-2 but no matching entry in namespace.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
at

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread jason hadoop
the no route to host message means one of two things, either there is no
actual route, which would have generated a different error, or some firewall
is sending back a new route message.

I have seen the now route to host problem several times, and it is usually
because there is a firewall in place that no one is expecting to be there.

In the following IP and PORT are the IP address and port from the failure
message in your log file. the server machine is the machine that has IP as
an address, and the remote machine is the machine that the connection is
failing on.

The way to diagnose this explicitly is:
1) on the server machine that should be accepting connections on the port,
telnet localhost PORT, and telnet IP PORT you should get a connection, if
not then the server is not binding the port.
2) on the remote machine verify that you can communicate to the server
machine via normal tools such as ssh and or ping and or traceroute, using
the IP address from the error message in your log file
3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
(3) does not, then there is something blocking packets for the port range in
question. If (3) does succeed then there is some probably interesting
problem.



On Wed, Apr 22, 2009 at 7:31 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 No route to host generally means machines have routing problems. Machine
 A
  doesnt know how to route packets to Machine B. Reboot everything, router
  first, see if it goes away. Otherwise, now is the time to learn to debug
  routing problems. traceroute is the best starting place


 I used traceroute to check whether the problematic node is accessible by
 other machines. It just works - all except HDFS that it.

 Any way to check what causes this exception?

 Regards.




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?

2009-04-22 Thread javateck javateck
not actually
When I just run a standalone server, meaning the server is a namenode,
datanode, jobtracker and tasktracker, and I configured the map max to 10, I
have 174 62~75 MB files, my block size is 65MB. I can see that 189 map tasks
are generated for this, and only 2 are running, others are waiting.

When I configured another datanode, and have the same settings for
tasktracker, and then the task is running at 12 map tasks for the same task
which produces 189 map tasks, it's using 2 map task slots from my namenode
and 10 slots from my datanode.

I just can't figure out why the namenode is just running at 2 map tasks
while 10 are available.

On Tue, Apr 21, 2009 at 7:47 PM, jason hadoop jason.had...@gmail.comwrote:

 There must be only 2 input splits being produced for your job.
 Either you have 2 unsplitable files, or the input file(s) you have are not
 large enough compared to the block size to be split.

 Table 6-1 in chapter 06 gives a breakdown of all of the configuration
 parameters that affect split size in hadoop 0.19. Alphas are available :)

 This is detailed in my book in ch06

 On Tue, Apr 21, 2009 at 5:07 PM, javateck javateck javat...@gmail.com
 wrote:

  anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not
  working?
  I set it to 10, but still see only 2 map tasks running when running one
 job
 



 --
 Alpha Chapters of my book on Hadoop are available
 http://www.apress.com/book/view/9781430219422



RE: Hadoop UI beta

2009-04-22 Thread Patterson, Josh
Stefan,
Thanks for contributing this, this is very nice. We may and try and use
the Hadoop-ui (web server part) as a XML data source to feed a web app
showing user's the state of their jobs as this seems like a good simple
webserver to customize for pulling job info to another server or via
AJAX. Thanks!

Josh Patterson
TVA 

-Original Message-
From: Stefan Podkowinski [mailto:spo...@gmail.com] 
Sent: Tuesday, March 31, 2009 7:12 AM
To: core-user@hadoop.apache.org
Subject: ANN: Hadoop UI beta

Hello,

I'd like to invite you to take a look at the recently released first
beta of Hadoop UI, a graphical Flex/Java based client for Hadoop Core.
Hadoo UI currently includes a HDFS file explorer and basic job
tracking features.

Get it here:
http://code.google.com/p/hadoop-ui/

As this is the first release it may (and does) still contain bugs, but
I'd like to give everyone the chance to send feedback as early as
possible.
Give it a try :)

- Stefan


Re: How to access data node without a passphrase?

2009-04-22 Thread Alex Loddengaard
RPMs won't work on Ubuntu, but we're almost finished with DEBs, which will
work on Ubuntu.  Shoot Todd an email if you want to try out our DEBs:

t...@cloudera.com

Are you asking about choosing a Linux distribution?  The problem with Ubuntu
is that it changes very frequently and generally uses relatively new
software, making it a great desktop distribution, but perhaps not as good of
a stable server distribution.  (Note that the LTS Ubuntu releases are
supported longer but still user newer, possibly unstable software.)  I think
that the majority of Linux server people use Redhat derivatives, in
particular RHEL and CentOS, as they're not updated frequently and use stable
software (RHEL costs money; CentOS is free).  That said, CentOS is annoying
to administer if you're hoping to use a version of Python newer than 2.4.
I'm sure that the Debian people on this list will yell at me for saying
Redhat derivatives are the majority, but we'll see I guess.

So anyway, give Todd a shout if you want to try DEBs out.  Otherwise, if
you're interested in going down the Redhat derivative route (Fedora, RHEL,
CentOS), you can use the RPMs.

Alex

On Tue, Apr 21, 2009 at 10:04 PM, Yabo-Arber Xu arber.resea...@gmail.comwrote:

 Thanks for all your help, especially Asteem's detailed instruction. It
 works
 now!

 Alex: I did not use RPMs, but several of my existing nodes are installed
 with Ubuntu. Is there any diff on running Hadoop on Ubuntu? I am thinking
 of
 choosing one before I started scaling up the cluster, but not sure which
 one
 benefit from long-term, i.e. get more support etc.

 Best
 Arber


 On Wed, Apr 22, 2009 at 12:35 PM, Puri, Aseem aseem.p...@honeywell.com
 wrote:

  cat ~/.ssh/master-key.pub  ~/.ssh/authorized_keys
 



Re: NameNode Startup Problem

2009-04-22 Thread Alex Loddengaard
Can you post your hadoop-site.xml?  Also, what prompted this problem?  Did
you bounce the cluster?

Alex

On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com wrote:

 Hi,

 After a while working with hadoop I'm now faced with a situation where the
 namenode won't start up. I'm working with a patched up version of 0.19.1
 with ganglia patches (3422, 4675) and with 5269 which suppose to deal with
 killed_unclean task status and the massive serious problem lines in the
 JT
 logs.
 The latest NN logs are below.

 Can you help me figure out what is going on ?

 Thanks,
 Tamir



 2009-04-22 18:12:36,966 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = lb-emu-3/192.168.14.11
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.19.2-dev
 STARTUP_MSG:   build =  -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50
 IDT 2009
 /
 2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=NameNode, port=54310
 2009-04-22 18:12:37,456 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 lb-emu-3.israel.verisign.com/192.168.14.11:54310
 2009-04-22http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467
  INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2009-04-22 18:12:37,474 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
 Initializing
 NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC
 ontext
 2009-04-22 18:12:37,627 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
 2009-04-22 18:12:37,628 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2009-04-22 18:12:37,628 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2009-04-22 18:12:37,649 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.sp
 i.NullContext
 2009-04-22 18:12:37,651 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 3427
 2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 28
 2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 488333 loaded in 0 seconds.
 2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of
 size
 82110 edits # 477 loaded in
  0 seconds.
 2009-04-22 18:12:40,893 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode,
 reached
 end of edit log Number of transactions found 36635
 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of
 size 5229334 edits # 36635 l
 oaded in 2 seconds.
 2009-04-22 18:12:41,024 ERROR
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 java.io.IOException: saveLeases found path

 /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2
 but no matching entry in namespace.
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608)
at

 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010)
at

 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031)
at

 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288)
at

 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194)
at

 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
 2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server: Stopping server
 on 54310
 2009-04-22 18:12:41,038 ERROR
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
 saveLeases found path
 /tmp/temp623789763/tmp659456056/_temporary/_attempt_20090
 4211331_0010_r_02_0/part-2 but no matching entry in namespace.
at

 

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi


There is some mismatch here.. what is the expected ip address of this 
machine (or does it have multiple interfaces and properly routed)? 
Looking at the Receiving Block message DN thinks its address is 
192.168.253.20 but NN thinks it is 253.32 (and client is able to connect 
 using 253.32).


If you want to find the destination ip that this DN is unable to connect 
to, you can check client's log for this block number.


Stas Oskin wrote:

Hi.


2009/4/22 jason hadoop jason.had...@gmail.com


Most likely that machine is affected by some firewall somewhere that
prevents traffic on port 50075. The no route to host is a strong indicator,
particularly if the Datanote registered with the namenode.




Yes, this was my first thought as well. But there is no firewall, and the
port can be connected via netcat from any other machine.

Any other idea?

Thanks.





Re: NameNode Startup Problem

2009-04-22 Thread Tamir Kamara
Hey,

hadoop-site.xml from the name node is attached.
I performed a cluster restart and then it would come up.

Thanks in advance,
Tamir

On Wed, Apr 22, 2009 at 9:03 PM, Alex Loddengaard a...@cloudera.com wrote:

 Can you post your hadoop-site.xml?  Also, what prompted this problem?  Did
 you bounce the cluster?

 Alex

 On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com
 wrote:

  Hi,
 
  After a while working with hadoop I'm now faced with a situation where
 the
  namenode won't start up. I'm working with a patched up version of 0.19.1
  with ganglia patches (3422, 4675) and with 5269 which suppose to deal
 with
  killed_unclean task status and the massive serious problem lines in the
  JT
  logs.
  The latest NN logs are below.
 
  Can you help me figure out what is going on ?
 
  Thanks,
  Tamir
 
 
 
  2009-04-22 18:12:36,966 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
  /
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = lb-emu-3/192.168.14.11
  STARTUP_MSG:   args = []
  STARTUP_MSG:   version = 0.19.2-dev
  STARTUP_MSG:   build =  -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50
  IDT 2009
  /
  2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
  Initializing RPC Metrics with hostName=NameNode, port=54310
  2009-04-22 18:12:37,456 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
  lb-emu-3.israel.verisign.com/192.168.14.11:54310
  2009-04-22
 http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467
 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
  Initializing JVM Metrics with processName=NameNode, sessionId=null
  2009-04-22 18:12:37,474 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
  Initializing
  NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC
  ontext
  2009-04-22 18:12:37,627 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 fsOwner=hadoop,hadoop
  2009-04-22 18:12:37,628 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 supergroup=supergroup
  2009-04-22 18:12:37,628 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
  isPermissionEnabled=true
  2009-04-22 18:12:37,649 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
  Initializing FSNamesystemMetrics using context
  object:org.apache.hadoop.metrics.sp
  i.NullContext
  2009-04-22 18:12:37,651 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
  FSNamesystemStatusMBean
  2009-04-22 18:12:37,814 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files = 3427
  2009-04-22 18:12:38,486 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files under construction = 28
  2009-04-22 18:12:38,511 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Image file of size 488333 loaded in 0 seconds.
  2009-04-22 18:12:38,634 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of
  size
  82110 edits # 477 loaded in
   0 seconds.
  2009-04-22 18:12:40,893 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode,
  reached
  end of edit log Number of transactions found 36635
  2009-04-22 18:12:40,893 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new
 of
  size 5229334 edits # 36635 l
  oaded in 2 seconds.
  2009-04-22 18:12:41,024 ERROR
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
  initialization failed.
  java.io.IOException: saveLeases found path
 
 
 /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2
  but no matching entry in namespace.
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
 at
  org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
 at
  org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
 at
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
  2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server: 

[ANNOUNCE] Hadoop release 0.20.0 available

2009-04-22 Thread Nigel Daley
Release 0.20.0 contains many improvements, new features, bug fixes and  
optimizations.


For Hadoop release details and downloads, visit:
http://hadoop.apache.org/core/releases.html

Hadoop 0.20.0 Release Notes are at
http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html

Thanks to all who contributed to this release!

Nigel



Re: [ANNOUNCE] Hadoop release 0.20.0 available

2009-04-22 Thread Farhan Husain
Has the release 0.19 now become a stable one?

On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com wrote:

 Release 0.20.0 contains many improvements, new features, bug fixes and
 optimizations.

 For Hadoop release details and downloads, visit:
 http://hadoop.apache.org/core/releases.html

 Hadoop 0.20.0 Release Notes are at
 http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html

 Thanks to all who contributed to this release!

 Nigel




Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

There is some mismatch here.. what is the expected ip address of this
 machine (or does it have multiple interfaces and properly routed)? Looking
 at the Receiving Block message DN thinks its address is 192.168.253.20 but
 NN thinks it is 253.32 (and client is able to connect  using 253.32).

 If you want to find the destination ip that this DN is unable to connect
 to, you can check client's log for this block number.



Hmm, .253.32 is the client workstation (has only our test application with
core-hadoop.jar + configs).

The expected address of the DataNode should be 192.168.253.20.

According to what I seen, the problem is in DataNode itself - it just throws
the Datanoderegistration every so often:


2009-04-23 00:05:05,961 INFO org.apache.hadoop.dfs.DataNode: Receiving block
blk_7209884038924026671_8033 src: /192.168.253.32:42932
 dest: /192.168.253.32:50010
2009-04-23 00:05:05,962 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_7209884038924026671_8033 received exception java.net.NoR
outeToHostException: No route to host
2009-04-23 00:05:05,962 ERROR org.apache.hadoop.dfs.DataNode:
DatanodeRegistration(192.168.253.20:50010, storageID=DS-1790181121-127
.0.0.1-50010-1239123237447, infoPort=50075, ipcPort=50020):DataXceiver:
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092)
at java.lang.Thread.run(Thread.java:619)

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

The way to diagnose this explicitly is:
 1) on the server machine that should be accepting connections on the port,
 telnet localhost PORT, and telnet IP PORT you should get a connection, if
 not then the server is not binding the port.
 2) on the remote machine verify that you can communicate to the server
 machine via normal tools such as ssh and or ping and or traceroute, using
 the IP address from the error message in your log file
 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
 (3) does not, then there is something blocking packets for the port range
 in
 question. If (3) does succeed then there is some probably interesting
 problem.


 Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.

I agree there is indeed an interesting problem :). Question is how it can be
solved.

Thanks.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Matt Massie
Stas-

Is it possible to paste the output from the following command on both your
DataNode and NameNode?

% route -v -n

-Matt


On Wed, Apr 22, 2009 at 4:36 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 The way to diagnose this explicitly is:
  1) on the server machine that should be accepting connections on the
 port,
  telnet localhost PORT, and telnet IP PORT you should get a connection, if
  not then the server is not binding the port.
  2) on the remote machine verify that you can communicate to the server
  machine via normal tools such as ssh and or ping and or traceroute, using
  the IP address from the error message in your log file
  3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
  (3) does not, then there is something blocking packets for the port range
  in
  question. If (3) does succeed then there is some probably interesting
  problem.
 

  Tried in step 3 to telnet both the 50010 and the 8010 ports of the
 problematic datanode - both worked.

 I agree there is indeed an interesting problem :). Question is how it can
 be
 solved.

 Thanks.



Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

Is it possible to paste the output from the following command on both your
 DataNode and NameNode?

 % route -v -n


Sure, here it is:

Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse
Iface
192.168.253.0   0.0.0.0 255.255.255.0   U 0  00 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0  00 eth0
0.0.0.0 192.168.253.1   0.0.0.0 UG0  00 eth0


As you might recall, the problematic data node runs in same server as the
NameNode.

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Matt Massie
Just for clarity: are you using any type of virtualization (e.g. vmware,
xen) or just running the DataNode java process on the same machine?

What is fs.default.name set to in your hadoop-site.xml?

-Matt


On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Is it possible to paste the output from the following command on both your
  DataNode and NameNode?
 
  % route -v -n
 

 Sure, here it is:

 Kernel IP routing table
 Destination Gateway Genmask Flags Metric RefUse
 Iface
 192.168.253.0   0.0.0.0 255.255.255.0   U 0  00
 eth0
 169.254.0.0 0.0.0.0 255.255.0.0 U 0  00
 eth0
 0.0.0.0 192.168.253.1   0.0.0.0 UG0  00
 eth0


 As you might recall, the problematic data node runs in same server as the
 NameNode.

 Regards.



Re: How to access data node without a passphrase?

2009-04-22 Thread Yabo-Arber Xu
Dear Alex,

Thanks for your suggestion. I would be very interested in try RPMs with
DEBs, and will shoot an email to Todd soon.

Best,
Arber


On Thu, Apr 23, 2009 at 2:01 AM, Alex Loddengaard a...@cloudera.com wrote:

 RPMs won't work on Ubuntu, but we're almost finished with DEBs, which will
 work on Ubuntu.  Shoot Todd an email if you want to try out our DEBs:

 t...@cloudera.com

 Are you asking about choosing a Linux distribution?  The problem with
 Ubuntu
 is that it changes very frequently and generally uses relatively new
 software, making it a great desktop distribution, but perhaps not as good
 of
 a stable server distribution.  (Note that the LTS Ubuntu releases are
 supported longer but still user newer, possibly unstable software.)  I
 think
 that the majority of Linux server people use Redhat derivatives, in
 particular RHEL and CentOS, as they're not updated frequently and use
 stable
 software (RHEL costs money; CentOS is free).  That said, CentOS is annoying
 to administer if you're hoping to use a version of Python newer than 2.4.
 I'm sure that the Debian people on this list will yell at me for saying
 Redhat derivatives are the majority, but we'll see I guess.

 So anyway, give Todd a shout if you want to try DEBs out.  Otherwise, if
 you're interested in going down the Redhat derivative route (Fedora, RHEL,
 CentOS), you can use the RPMs.

 Alex

 On Tue, Apr 21, 2009 at 10:04 PM, Yabo-Arber Xu arber.resea...@gmail.com
 wrote:

  Thanks for all your help, especially Asteem's detailed instruction. It
  works
  now!
 
  Alex: I did not use RPMs, but several of my existing nodes are installed
  with Ubuntu. Is there any diff on running Hadoop on Ubuntu? I am thinking
  of
  choosing one before I started scaling up the cluster, but not sure which
  one
  benefit from long-term, i.e. get more support etc.
 
  Best
  Arber
 
 
  On Wed, Apr 22, 2009 at 12:35 PM, Puri, Aseem aseem.p...@honeywell.com
  wrote:
 
   cat ~/.ssh/master-key.pub  ~/.ssh/authorized_keys
  
 



Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread jason hadoop
I wonder if this is an obscure case of out of file descriptors. I would
expect a different message out of the jvm core

On Wed, Apr 22, 2009 at 5:34 PM, Matt Massie m...@cloudera.com wrote:

 Just for clarity: are you using any type of virtualization (e.g. vmware,
 xen) or just running the DataNode java process on the same machine?

 What is fs.default.name set to in your hadoop-site.xml?

 -Matt


 On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote:

  Hi.
 
  Is it possible to paste the output from the following command on both
 your
   DataNode and NameNode?
  
   % route -v -n
  
 
  Sure, here it is:
 
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric RefUse
  Iface
  192.168.253.0   0.0.0.0 255.255.255.0   U 0  00
  eth0
  169.254.0.0 0.0.0.0 255.255.0.0 U 0  00
  eth0
  0.0.0.0 192.168.253.1   0.0.0.0 UG0  00
  eth0
 
 
  As you might recall, the problematic data node runs in same server as the
  NameNode.
 
  Regards.
 




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: NameNode Startup Problem

2009-04-22 Thread jason hadoop
It looks like this is during the hdfs recovery phase of the cluster start.
Perhaps a tmp cleaner has removed some of the files, and now this portion of
the restart is causing a failure.


I am not terribly familiar with the job recovery code.

On Wed, Apr 22, 2009 at 11:44 AM, Tamir Kamara tamirkam...@gmail.comwrote:

 Hey,

 hadoop-site.xml from the name node is attached.
 I performed a cluster restart and then it would come up.

 Thanks in advance,
 Tamir


 On Wed, Apr 22, 2009 at 9:03 PM, Alex Loddengaard a...@cloudera.comwrote:

 Can you post your hadoop-site.xml?  Also, what prompted this problem?  Did
 you bounce the cluster?

 Alex

 On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com
 wrote:

  Hi,
 
  After a while working with hadoop I'm now faced with a situation where
 the
  namenode won't start up. I'm working with a patched up version of 0.19.1
  with ganglia patches (3422, 4675) and with 5269 which suppose to deal
 with
  killed_unclean task status and the massive serious problem lines in
 the
  JT
  logs.
  The latest NN logs are below.
 
  Can you help me figure out what is going on ?
 
  Thanks,
  Tamir
 
 
 
  2009-04-22 18:12:36,966 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
  /
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = lb-emu-3/192.168.14.11
  STARTUP_MSG:   args = []
  STARTUP_MSG:   version = 0.19.2-dev
  STARTUP_MSG:   build =  -r ; compiled by 'tkamara' on Tue Apr 21
 12:03:50
  IDT 2009
  /
  2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
  Initializing RPC Metrics with hostName=NameNode, port=54310
  2009-04-22 18:12:37,456 INFO
  org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
  lb-emu-3.israel.verisign.com/192.168.14.11:54310
  2009-04-22
 http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467
 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
  Initializing JVM Metrics with processName=NameNode, sessionId=null
  2009-04-22 18:12:37,474 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
  Initializing
  NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullC
  ontext
  2009-04-22 18:12:37,627 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 fsOwner=hadoop,hadoop
  2009-04-22 18:12:37,628 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 supergroup=supergroup
  2009-04-22 18:12:37,628 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
  isPermissionEnabled=true
  2009-04-22 18:12:37,649 INFO
  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
  Initializing FSNamesystemMetrics using context
  object:org.apache.hadoop.metrics.sp
  i.NullContext
  2009-04-22 18:12:37,651 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
  FSNamesystemStatusMBean
  2009-04-22 18:12:37,814 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files = 3427
  2009-04-22 18:12:38,486 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Number of files under construction = 28
  2009-04-22 18:12:38,511 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Image file of size 488333 loaded in 0 seconds.
  2009-04-22 18:12:38,634 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of
  size
  82110 edits # 477 loaded in
   0 seconds.
  2009-04-22 18:12:40,893 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode,
  reached
  end of edit log Number of transactions found 36635
  2009-04-22 18:12:40,893 INFO
 org.apache.hadoop.hdfs.server.common.Storage:
  Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new
 of
  size 5229334 edits # 36635 l
  oaded in 2 seconds.
  2009-04-22 18:12:41,024 ERROR
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
  initialization failed.
  java.io.IOException: saveLeases found path
 
 
 /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2
  but no matching entry in namespace.
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
   

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi

Stas Oskin wrote:


 Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.


Shouldn't you be testing connecting _from_ the datanode? The error you 
posted is while this DN is trying connect to another DN.


Raghu.


I agree there is indeed an interesting problem :). Question is how it can be
solved.

Thanks.





Re: core-user Digest 23 Apr 2009 02:09:48 -0000 Issue 887

2009-04-22 Thread Nigel Daley
No, I didn't mark 0.19.1 stable.  I left 0.18.3 as our most stable  
release.


My company skipped deploying 0.19.x so I have no experience with that  
branch.  Others?


Nige


Has the release 0.19 now become a stable one?

On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com  
wrote:


Release 0.20.0 contains many improvements, new features, bug fixes  
and

optimizations.

For Hadoop release details and downloads, visit:
http://hadoop.apache.org/core/releases.html

Hadoop 0.20.0 Release Notes are at
http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html

Thanks to all who contributed to this release!

Nigel




RE: core-user Digest 23 Apr 2009 02:09:48 -0000 Issue 887

2009-04-22 Thread Koji Noguchi
Nigel, 

When you have time, could you release 0.18.4 that contains some of the
patches that make our clusters 'stable'?
 
Koji

-Original Message-
From: Nigel Daley [mailto:nda...@yahoo-inc.com] 
Sent: Wednesday, April 22, 2009 10:31 PM
To: core-user@hadoop.apache.org
Subject: Re: core-user Digest 23 Apr 2009 02:09:48 - Issue 887

No, I didn't mark 0.19.1 stable.  I left 0.18.3 as our most stable  
release.

My company skipped deploying 0.19.x so I have no experience with that  
branch.  Others?

Nige

 Has the release 0.19 now become a stable one?

 On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com  
 wrote:

 Release 0.20.0 contains many improvements, new features, bug fixes  
 and
 optimizations.

 For Hadoop release details and downloads, visit:
 http://hadoop.apache.org/core/releases.html

 Hadoop 0.20.0 Release Notes are at
 http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html

 Thanks to all who contributed to this release!

 Nigel