Re: How to run many jobs at the same time?
Billy Pearson wrote: The only way I know of is try using different Scheduling Queue's for each group Billy nguyenhuynh.mr nguyenhuynh...@gmail.com wrote in message news:49ee6e56.7080...@gmail.com... Tom White wrote: You need to start each JobControl in its own thread so they can run concurrently. Something like: Thread t = new Thread(jobControl); t.start(); Then poll the jobControl.allFinished() method. Tom On Tue, Apr 21, 2009 at 10:02 AM, nguyenhuynh.mr nguyenhuynh...@gmail.com wrote: Hi all! I have some jobs: job1, job2, job3,... . Each job working with the group. To control jobs, I have JobControllers, each JobController control jobs follow the specified group. Example: - Have 2 Group: g1 and g2 - 2 JobController: jController1, jcontroller2 + jController1 contains jobs: job1, job2, job3, ... + jController2 contains jobs: job1, job2, job3, ... * To run jobs, I sue: for (i=0; i2; i++){ jCtrl[i]= new jController(group i); jCtrl[i].run(); } * I want jController1 and jController2 run parallel. But actual, when jController1 finished, jController2 begin run. Why? Please help me! * P/s: jController use org.apache.hadoop.mapred.jobcontrol.JobControl Thanks, cheer, Nguyen. Thanks for your response! I have used Thread to start JobControl, some things like: public class JobController{ public JobController(String g){ . } public run(){ Job j1 = new Job(..); Job j2 =new Job(..); JobControl jc = new JobControl(group1); Threat t=new Thread(jc); t.start(); while(! jc.allFinish()){ // Display state } } } * To run the code some like: JobController[] jController=new JController[2]; for (int i=0; i2; i++){ jController[i]=new JobController(group[i]); JCOntroller[i].run(); } * But not parallel run :( ! Please help me! Thanks, Best regards, Nguyen, Thanks for all your help! Please show detail your solution and give me a example. Thanks much, Best regards, Nguyen.
Re: Num map task?
Hi, In that case, The atomic unit of split is a file. So, you need to increase the number of files. or Use the TextInputFormat as below. jobConf.setInputFormat(TextInputFormat.class); On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr nguyenhuynh...@gmail.com wrote: Hi all! I have a MR job use to import contents into HBase. The content is text file in HDFS. I used the maps file to store local path of contents. Each content has the map file. ( the map is a text file in HDFS and contain 1 line info). I created the maps directory used to contain map files. And the this maps directory used to input path for job. When i run job, the number map task is same number map files. Ex: I have 5 maps file - 5 map tasks. Therefor, the map phase is slowly :( Why the map phase is slowly if the number map task large and the number map task is equal number of files?. * p/s: Run jobs with: 3 node: 1 server and 2 slaver Please help me! Thanks. Best, Nguyen. -- Best Regards, Edward J. Yoon edwardy...@apache.org http://blog.udanax.org
Re: Num map task?
Edward J. Yoon wrote: Hi, In that case, The atomic unit of split is a file. So, you need to increase the number of files. or Use the TextInputFormat as below. jobConf.setInputFormat(TextInputFormat.class); On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr nguyenhuynh...@gmail.com wrote: Hi all! I have a MR job use to import contents into HBase. The content is text file in HDFS. I used the maps file to store local path of contents. Each content has the map file. ( the map is a text file in HDFS and contain 1 line info). I created the maps directory used to contain map files. And the this maps directory used to input path for job. When i run job, the number map task is same number map files. Ex: I have 5 maps file - 5 map tasks. Therefor, the map phase is slowly :( Why the map phase is slowly if the number map task large and the number map task is equal number of files?. * p/s: Run jobs with: 3 node: 1 server and 2 slaver Please help me! Thanks. Best, Nguyen. Current, I use TextInputformat to set InputFormat for map phase.
Custom Input Split
Hi, I have a table with N records, now i want to run a map reduce job with 4 maps and 0 reduces. is there a way i can create my own custom input split so that i can send 'n' records to each map?? if there is a way, can i have a sample code snippet to gain better understanding? Thanks Raakhi.
Re: No route to host prevents from storing files to HDFS
Hi. 2009/4/22 jason hadoop jason.had...@gmail.com Most likely that machine is affected by some firewall somewhere that prevents traffic on port 50075. The no route to host is a strong indicator, particularly if the Datanote registered with the namenode. Yes, this was my first thought as well. But there is no firewall, and the port can be connected via netcat from any other machine. Any other idea? Thanks.
NameNode Startup Problem
Hi, After a while working with hadoop I'm now faced with a situation where the namenode won't start up. I'm working with a patched up version of 0.19.1 with ganglia patches (3422, 4675) and with 5269 which suppose to deal with killed_unclean task status and the massive serious problem lines in the JT logs. The latest NN logs are below. Can you help me figure out what is going on ? Thanks, Tamir 2009-04-22 18:12:36,966 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = lb-emu-3/192.168.14.11 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.2-dev STARTUP_MSG: build = -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50 IDT 2009 / 2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2009-04-22 18:12:37,456 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: lb-emu-3.israel.verisign.com/192.168.14.11:54310 2009-04-22 18:12:37,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-04-22 18:12:37,474 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC ontext 2009-04-22 18:12:37,627 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-04-22 18:12:37,649 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.sp i.NullContext 2009-04-22 18:12:37,651 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 3427 2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 28 2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 488333 loaded in 0 seconds. 2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of size 82110 edits # 477 loaded in 0 seconds. 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 36635 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of size 5229334 edits # 36635 l oaded in 2 seconds. 2009-04-22 18:12:41,024 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) 2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2009-04-22 18:12:41,038 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_20090 4211331_0010_r_02_0/part-2 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at
Re: No route to host prevents from storing files to HDFS
the no route to host message means one of two things, either there is no actual route, which would have generated a different error, or some firewall is sending back a new route message. I have seen the now route to host problem several times, and it is usually because there is a firewall in place that no one is expecting to be there. In the following IP and PORT are the IP address and port from the failure message in your log file. the server machine is the machine that has IP as an address, and the remote machine is the machine that the connection is failing on. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. On Wed, Apr 22, 2009 at 7:31 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. No route to host generally means machines have routing problems. Machine A doesnt know how to route packets to Machine B. Reboot everything, router first, see if it goes away. Otherwise, now is the time to learn to debug routing problems. traceroute is the best starting place I used traceroute to check whether the problematic node is accessible by other machines. It just works - all except HDFS that it. Any way to check what causes this exception? Regards. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?
not actually When I just run a standalone server, meaning the server is a namenode, datanode, jobtracker and tasktracker, and I configured the map max to 10, I have 174 62~75 MB files, my block size is 65MB. I can see that 189 map tasks are generated for this, and only 2 are running, others are waiting. When I configured another datanode, and have the same settings for tasktracker, and then the task is running at 12 map tasks for the same task which produces 189 map tasks, it's using 2 map task slots from my namenode and 10 slots from my datanode. I just can't figure out why the namenode is just running at 2 map tasks while 10 are available. On Tue, Apr 21, 2009 at 7:47 PM, jason hadoop jason.had...@gmail.comwrote: There must be only 2 input splits being produced for your job. Either you have 2 unsplitable files, or the input file(s) you have are not large enough compared to the block size to be split. Table 6-1 in chapter 06 gives a breakdown of all of the configuration parameters that affect split size in hadoop 0.19. Alphas are available :) This is detailed in my book in ch06 On Tue, Apr 21, 2009 at 5:07 PM, javateck javateck javat...@gmail.com wrote: anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not working? I set it to 10, but still see only 2 map tasks running when running one job -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
RE: Hadoop UI beta
Stefan, Thanks for contributing this, this is very nice. We may and try and use the Hadoop-ui (web server part) as a XML data source to feed a web app showing user's the state of their jobs as this seems like a good simple webserver to customize for pulling job info to another server or via AJAX. Thanks! Josh Patterson TVA -Original Message- From: Stefan Podkowinski [mailto:spo...@gmail.com] Sent: Tuesday, March 31, 2009 7:12 AM To: core-user@hadoop.apache.org Subject: ANN: Hadoop UI beta Hello, I'd like to invite you to take a look at the recently released first beta of Hadoop UI, a graphical Flex/Java based client for Hadoop Core. Hadoo UI currently includes a HDFS file explorer and basic job tracking features. Get it here: http://code.google.com/p/hadoop-ui/ As this is the first release it may (and does) still contain bugs, but I'd like to give everyone the chance to send feedback as early as possible. Give it a try :) - Stefan
Re: How to access data node without a passphrase?
RPMs won't work on Ubuntu, but we're almost finished with DEBs, which will work on Ubuntu. Shoot Todd an email if you want to try out our DEBs: t...@cloudera.com Are you asking about choosing a Linux distribution? The problem with Ubuntu is that it changes very frequently and generally uses relatively new software, making it a great desktop distribution, but perhaps not as good of a stable server distribution. (Note that the LTS Ubuntu releases are supported longer but still user newer, possibly unstable software.) I think that the majority of Linux server people use Redhat derivatives, in particular RHEL and CentOS, as they're not updated frequently and use stable software (RHEL costs money; CentOS is free). That said, CentOS is annoying to administer if you're hoping to use a version of Python newer than 2.4. I'm sure that the Debian people on this list will yell at me for saying Redhat derivatives are the majority, but we'll see I guess. So anyway, give Todd a shout if you want to try DEBs out. Otherwise, if you're interested in going down the Redhat derivative route (Fedora, RHEL, CentOS), you can use the RPMs. Alex On Tue, Apr 21, 2009 at 10:04 PM, Yabo-Arber Xu arber.resea...@gmail.comwrote: Thanks for all your help, especially Asteem's detailed instruction. It works now! Alex: I did not use RPMs, but several of my existing nodes are installed with Ubuntu. Is there any diff on running Hadoop on Ubuntu? I am thinking of choosing one before I started scaling up the cluster, but not sure which one benefit from long-term, i.e. get more support etc. Best Arber On Wed, Apr 22, 2009 at 12:35 PM, Puri, Aseem aseem.p...@honeywell.com wrote: cat ~/.ssh/master-key.pub ~/.ssh/authorized_keys
Re: NameNode Startup Problem
Can you post your hadoop-site.xml? Also, what prompted this problem? Did you bounce the cluster? Alex On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com wrote: Hi, After a while working with hadoop I'm now faced with a situation where the namenode won't start up. I'm working with a patched up version of 0.19.1 with ganglia patches (3422, 4675) and with 5269 which suppose to deal with killed_unclean task status and the massive serious problem lines in the JT logs. The latest NN logs are below. Can you help me figure out what is going on ? Thanks, Tamir 2009-04-22 18:12:36,966 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = lb-emu-3/192.168.14.11 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.2-dev STARTUP_MSG: build = -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50 IDT 2009 / 2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2009-04-22 18:12:37,456 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: lb-emu-3.israel.verisign.com/192.168.14.11:54310 2009-04-22http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-04-22 18:12:37,474 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC ontext 2009-04-22 18:12:37,627 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-04-22 18:12:37,649 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.sp i.NullContext 2009-04-22 18:12:37,651 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 3427 2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 28 2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 488333 loaded in 0 seconds. 2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of size 82110 edits # 477 loaded in 0 seconds. 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 36635 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of size 5229334 edits # 36635 l oaded in 2 seconds. 2009-04-22 18:12:41,024 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) 2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2009-04-22 18:12:41,038 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_20090 4211331_0010_r_02_0/part-2 but no matching entry in namespace. at
Re: No route to host prevents from storing files to HDFS
There is some mismatch here.. what is the expected ip address of this machine (or does it have multiple interfaces and properly routed)? Looking at the Receiving Block message DN thinks its address is 192.168.253.20 but NN thinks it is 253.32 (and client is able to connect using 253.32). If you want to find the destination ip that this DN is unable to connect to, you can check client's log for this block number. Stas Oskin wrote: Hi. 2009/4/22 jason hadoop jason.had...@gmail.com Most likely that machine is affected by some firewall somewhere that prevents traffic on port 50075. The no route to host is a strong indicator, particularly if the Datanote registered with the namenode. Yes, this was my first thought as well. But there is no firewall, and the port can be connected via netcat from any other machine. Any other idea? Thanks.
Re: NameNode Startup Problem
Hey, hadoop-site.xml from the name node is attached. I performed a cluster restart and then it would come up. Thanks in advance, Tamir On Wed, Apr 22, 2009 at 9:03 PM, Alex Loddengaard a...@cloudera.com wrote: Can you post your hadoop-site.xml? Also, what prompted this problem? Did you bounce the cluster? Alex On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com wrote: Hi, After a while working with hadoop I'm now faced with a situation where the namenode won't start up. I'm working with a patched up version of 0.19.1 with ganglia patches (3422, 4675) and with 5269 which suppose to deal with killed_unclean task status and the massive serious problem lines in the JT logs. The latest NN logs are below. Can you help me figure out what is going on ? Thanks, Tamir 2009-04-22 18:12:36,966 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = lb-emu-3/192.168.14.11 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.2-dev STARTUP_MSG: build = -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50 IDT 2009 / 2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2009-04-22 18:12:37,456 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: lb-emu-3.israel.verisign.com/192.168.14.11:54310 2009-04-22 http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-04-22 18:12:37,474 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC ontext 2009-04-22 18:12:37,627 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-04-22 18:12:37,649 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.sp i.NullContext 2009-04-22 18:12:37,651 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 3427 2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 28 2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 488333 loaded in 0 seconds. 2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of size 82110 edits # 477 loaded in 0 seconds. 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 36635 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of size 5229334 edits # 36635 l oaded in 2 seconds. 2009-04-22 18:12:41,024 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) 2009-04-22 18:12:41,038 INFO org.apache.hadoop.ipc.Server:
[ANNOUNCE] Hadoop release 0.20.0 available
Release 0.20.0 contains many improvements, new features, bug fixes and optimizations. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.20.0 Release Notes are at http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html Thanks to all who contributed to this release! Nigel
Re: [ANNOUNCE] Hadoop release 0.20.0 available
Has the release 0.19 now become a stable one? On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com wrote: Release 0.20.0 contains many improvements, new features, bug fixes and optimizations. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.20.0 Release Notes are at http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html Thanks to all who contributed to this release! Nigel
Re: No route to host prevents from storing files to HDFS
Hi. There is some mismatch here.. what is the expected ip address of this machine (or does it have multiple interfaces and properly routed)? Looking at the Receiving Block message DN thinks its address is 192.168.253.20 but NN thinks it is 253.32 (and client is able to connect using 253.32). If you want to find the destination ip that this DN is unable to connect to, you can check client's log for this block number. Hmm, .253.32 is the client workstation (has only our test application with core-hadoop.jar + configs). The expected address of the DataNode should be 192.168.253.20. According to what I seen, the problem is in DataNode itself - it just throws the Datanoderegistration every so often: 2009-04-23 00:05:05,961 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_7209884038924026671_8033 src: /192.168.253.32:42932 dest: /192.168.253.32:50010 2009-04-23 00:05:05,962 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_7209884038924026671_8033 received exception java.net.NoR outeToHostException: No route to host 2009-04-23 00:05:05,962 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(192.168.253.20:50010, storageID=DS-1790181121-127 .0.0.1-50010-1239123237447, infoPort=50075, ipcPort=50020):DataXceiver: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092) at java.lang.Thread.run(Thread.java:619) Regards.
Re: No route to host prevents from storing files to HDFS
Hi. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: No route to host prevents from storing files to HDFS
Stas- Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n -Matt On Wed, Apr 22, 2009 at 4:36 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: No route to host prevents from storing files to HDFS
Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards.
Re: No route to host prevents from storing files to HDFS
Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? -Matt On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards.
Re: How to access data node without a passphrase?
Dear Alex, Thanks for your suggestion. I would be very interested in try RPMs with DEBs, and will shoot an email to Todd soon. Best, Arber On Thu, Apr 23, 2009 at 2:01 AM, Alex Loddengaard a...@cloudera.com wrote: RPMs won't work on Ubuntu, but we're almost finished with DEBs, which will work on Ubuntu. Shoot Todd an email if you want to try out our DEBs: t...@cloudera.com Are you asking about choosing a Linux distribution? The problem with Ubuntu is that it changes very frequently and generally uses relatively new software, making it a great desktop distribution, but perhaps not as good of a stable server distribution. (Note that the LTS Ubuntu releases are supported longer but still user newer, possibly unstable software.) I think that the majority of Linux server people use Redhat derivatives, in particular RHEL and CentOS, as they're not updated frequently and use stable software (RHEL costs money; CentOS is free). That said, CentOS is annoying to administer if you're hoping to use a version of Python newer than 2.4. I'm sure that the Debian people on this list will yell at me for saying Redhat derivatives are the majority, but we'll see I guess. So anyway, give Todd a shout if you want to try DEBs out. Otherwise, if you're interested in going down the Redhat derivative route (Fedora, RHEL, CentOS), you can use the RPMs. Alex On Tue, Apr 21, 2009 at 10:04 PM, Yabo-Arber Xu arber.resea...@gmail.com wrote: Thanks for all your help, especially Asteem's detailed instruction. It works now! Alex: I did not use RPMs, but several of my existing nodes are installed with Ubuntu. Is there any diff on running Hadoop on Ubuntu? I am thinking of choosing one before I started scaling up the cluster, but not sure which one benefit from long-term, i.e. get more support etc. Best Arber On Wed, Apr 22, 2009 at 12:35 PM, Puri, Aseem aseem.p...@honeywell.com wrote: cat ~/.ssh/master-key.pub ~/.ssh/authorized_keys
Re: No route to host prevents from storing files to HDFS
I wonder if this is an obscure case of out of file descriptors. I would expect a different message out of the jvm core On Wed, Apr 22, 2009 at 5:34 PM, Matt Massie m...@cloudera.com wrote: Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? -Matt On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: NameNode Startup Problem
It looks like this is during the hdfs recovery phase of the cluster start. Perhaps a tmp cleaner has removed some of the files, and now this portion of the restart is causing a failure. I am not terribly familiar with the job recovery code. On Wed, Apr 22, 2009 at 11:44 AM, Tamir Kamara tamirkam...@gmail.comwrote: Hey, hadoop-site.xml from the name node is attached. I performed a cluster restart and then it would come up. Thanks in advance, Tamir On Wed, Apr 22, 2009 at 9:03 PM, Alex Loddengaard a...@cloudera.comwrote: Can you post your hadoop-site.xml? Also, what prompted this problem? Did you bounce the cluster? Alex On Wed, Apr 22, 2009 at 8:16 AM, Tamir Kamara tamirkam...@gmail.com wrote: Hi, After a while working with hadoop I'm now faced with a situation where the namenode won't start up. I'm working with a patched up version of 0.19.1 with ganglia patches (3422, 4675) and with 5269 which suppose to deal with killed_unclean task status and the massive serious problem lines in the JT logs. The latest NN logs are below. Can you help me figure out what is going on ? Thanks, Tamir 2009-04-22 18:12:36,966 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = lb-emu-3/192.168.14.11 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.2-dev STARTUP_MSG: build = -r ; compiled by 'tkamara' on Tue Apr 21 12:03:50 IDT 2009 / 2009-04-22 18:12:37,448 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2009-04-22 18:12:37,456 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: lb-emu-3.israel.verisign.com/192.168.14.11:54310 2009-04-22 http://lb-emu-3.israel.verisign.com/192.168.14.11:54310%0A2009-04-2218:12:37,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-04-22 18:12:37,474 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullC ontext 2009-04-22 18:12:37,627 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-04-22 18:12:37,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-04-22 18:12:37,649 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.sp i.NullContext 2009-04-22 18:12:37,651 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-04-22 18:12:37,814 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 3427 2009-04-22 18:12:38,486 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 28 2009-04-22 18:12:38,511 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 488333 loaded in 0 seconds. 2009-04-22 18:12:38,634 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits of size 82110 edits # 477 loaded in 0 seconds. 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 36635 2009-04-22 18:12:40,893 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /usr/local/hadoop-datastore/hadoop/dfs/name/current/edits.new of size 5229334 edits # 36635 l oaded in 2 seconds. 2009-04-22 18:12:41,024 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: saveLeases found path /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4608) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1010) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1031) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
Re: No route to host prevents from storing files to HDFS
Stas Oskin wrote: Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. Shouldn't you be testing connecting _from_ the datanode? The error you posted is while this DN is trying connect to another DN. Raghu. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: core-user Digest 23 Apr 2009 02:09:48 -0000 Issue 887
No, I didn't mark 0.19.1 stable. I left 0.18.3 as our most stable release. My company skipped deploying 0.19.x so I have no experience with that branch. Others? Nige Has the release 0.19 now become a stable one? On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com wrote: Release 0.20.0 contains many improvements, new features, bug fixes and optimizations. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.20.0 Release Notes are at http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html Thanks to all who contributed to this release! Nigel
RE: core-user Digest 23 Apr 2009 02:09:48 -0000 Issue 887
Nigel, When you have time, could you release 0.18.4 that contains some of the patches that make our clusters 'stable'? Koji -Original Message- From: Nigel Daley [mailto:nda...@yahoo-inc.com] Sent: Wednesday, April 22, 2009 10:31 PM To: core-user@hadoop.apache.org Subject: Re: core-user Digest 23 Apr 2009 02:09:48 - Issue 887 No, I didn't mark 0.19.1 stable. I left 0.18.3 as our most stable release. My company skipped deploying 0.19.x so I have no experience with that branch. Others? Nige Has the release 0.19 now become a stable one? On Wed, Apr 22, 2009 at 4:53 PM, Nigel Daley nda...@yahoo-inc.com wrote: Release 0.20.0 contains many improvements, new features, bug fixes and optimizations. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.20.0 Release Notes are at http://hadoop.apache.org/core/docs/r0.20.0/releasenotes.html Thanks to all who contributed to this release! Nigel