No locks available
Dear all, Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform some jobs ). But today when I start my cluster I came across a problem on one of my datanodes. Datanodes fails to start due to following error :- 2011-01-11 12:54:10,367 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = hadoop3/172.16.1.4 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) 2011-01-11 12:55:57,043 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) hadoop-hadoop-datanode-hadoop3.log 1775L, 210569C 1,1 Top Can Please is familiar with this issue. Please help. Thanks Regards Adarsh Sharma
Re: When applying a patch, which attachment should I use?
Thanks for the info. I am currently using Hadoop 0.20.2, so I guess I only need apply hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch . I wasn't familiar with the term trunk. I guess it means the latest development. Thanks again. Best Regards, Ed 2011/1/11 Konstantin Boudnik c...@apache.org Yeah, that's pretty crazy all right. In your case looks like that 3 patches on the top are the latest for 0.20-append branch, 0.21 branch and trunk (which perhaps 0.22 branch at the moment). It doesn't look like you need to apply all of them - just try the latest for your particular branch. The mess is caused by the fact the ppl are using different names for consequent patches (as in file.1.patch, file.2.patch etc) This is _very_ confusing indeed, especially when different contributors work on the same fix/feature. -- Take care, Konstantin (Cos) Boudnik On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote: Hi, For the first time I am about to apply a patch to HDFS. https://issues.apache.org/jira/browse/HDFS-630 Above is the one that I am trying to do. But there are like 15 patches and I don't know which one to use. Could anyone tell me if I need to apply them all or just the one at the top? The whole patching process is just so confusing :-( Ed
Re: TeraSort question.
I used 9500 maps. The number of maps defaulty to 2 for teragen. For terasort, it would depend on the number of input files, the dfs.block.size and number of nodes. Raj From: Phil Whelan phil...@gmail.com To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com Cc: Sent: Monday, January 10, 2011 10:39:29 PM Subject: Re: TeraSort question. Hi Raj, Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O. This process defaults to just 2 map jobs, so only 2 nodes are utilized. Did you try this option? mapred.map.tasks. I found a very similar question + answer here... http://www.mail-archive.com/common-user@hadoop.apache.org/msg5.html 1. The data is generated in a fashion to where it is not balanced across my cluster. This is because the data is generated with 2 maps. These are due to the default #maps/#reduces in Map-Reduce. Use: $ bin/hadoop jar hadoop-*-dev-examples.jar teragen - Dmapred.map.tasks=8000 100 /tera/in $ bin/hadoop jar hadoop-*-dev-examples.jar terasort - Dmapred.reduce.tasks=5300 /tera/in /tera/out Arun Hope that helps. Thanks, Phil On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote: All, I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting. I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same., Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O. Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run? I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3). I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.) I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight) I am using CDH3B3, even though I think this is not specific to CDH3B3. Sorry for the cross post. Raj
Re: When applying a patch, which attachment should I use?
You may also be interested in the append branch: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/ On Tue, Jan 11, 2011 at 3:12 AM, edward choi mp2...@gmail.com wrote: Thanks for the info. I am currently using Hadoop 0.20.2, so I guess I only need apply hdfs-630-0.20-append.patch https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch . I wasn't familiar with the term trunk. I guess it means the latest development. Thanks again. Best Regards, Ed 2011/1/11 Konstantin Boudnik c...@apache.org Yeah, that's pretty crazy all right. In your case looks like that 3 patches on the top are the latest for 0.20-append branch, 0.21 branch and trunk (which perhaps 0.22 branch at the moment). It doesn't look like you need to apply all of them - just try the latest for your particular branch. The mess is caused by the fact the ppl are using different names for consequent patches (as in file.1.patch, file.2.patch etc) This is _very_ confusing indeed, especially when different contributors work on the same fix/feature. -- Take care, Konstantin (Cos) Boudnik On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote: Hi, For the first time I am about to apply a patch to HDFS. https://issues.apache.org/jira/browse/HDFS-630 Above is the one that I am trying to do. But there are like 15 patches and I don't know which one to use. Could anyone tell me if I need to apply them all or just the one at the top? The whole patching process is just so confusing :-( Ed
Re: TeraSort question.
Ted Thanks. I have all the graphs I need that include, map reduce timeline, system activity for all the nodes when the sort was running. I will publish them once I have them in some presentable format., For legal reasons, I really don't want to send the complete job histiory files. My question is still this. When running terasort, would the CPU, disk and network utilization of all the nodes be more or less similar or completely different. Sometime during the day, I will post the system data from 5 nodes and that would probably explain my question better. Raj From: Ted Dunning tdunn...@maprtech.com To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com Cc: Sent: Tuesday, January 11, 2011 8:22:17 AM Subject: Re: TeraSort question. Raj, Do you have the job history files? That would be very useful. I would be happy to create some swimlane and related graphs for you if you can send me the history files. On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote: All, I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting. I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same., Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O. Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run? I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3). I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.) I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight) I am using CDH3B3, even though I think this is not specific to CDH3B3. Sorry for the cross post. Raj
libjars options
Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: libjars options
Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: TeraSort question.
Raj, Have a look at the graph shown here: http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1.1_--_Generating_Task_Timelines It should make clear that the number of tasks varies greatly over the lifetime of a job. Depending on the nodes available this may leave node idle. Niels 2011/1/11 Raj V rajv...@yahoo.com: Ted Thanks. I have all the graphs I need that include, map reduce timeline, system activity for all the nodes when the sort was running. I will publish them once I have them in some presentable format., For legal reasons, I really don't want to send the complete job histiory files. My question is still this. When running terasort, would the CPU, disk and network utilization of all the nodes be more or less similar or completely different. Sometime during the day, I will post the system data from 5 nodes and that would probably explain my question better. Raj From: Ted Dunning tdunn...@maprtech.com To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com Cc: Sent: Tuesday, January 11, 2011 8:22:17 AM Subject: Re: TeraSort question. Raj, Do you have the job history files? That would be very useful. I would be happy to create some swimlane and related graphs for you if you can send me the history files. On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote: All, I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting. I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same., Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O. Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run? I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3). I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.) I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight) I am using CDH3B3, even though I think this is not specific to CDH3B3. Sorry for the cross post. Raj -- Met vriendelijke groeten, Niels Basjes
Re: libjars options
Hi, I have tried that as well, using -files jar file But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: TeraSort question.
Can't attach teh pdf file that shows diffeent maps., File is too big, From: Niels Basjes ni...@basjes.nl To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com Cc: Sent: Tuesday, January 11, 2011 11:07:08 AM Subject: Re: TeraSort question. Raj, Have a look at the graph shown here: http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1.1_--_Generating_Task_Timelines It should make clear that the number of tasks varies greatly over the lifetime of a job. Depending on the nodes available this may leave node idle. Niels 2011/1/11 Raj V rajv...@yahoo.com: Ted Thanks. I have all the graphs I need that include, map reduce timeline, system activity for all the nodes when the sort was running. I will publish them once I have them in some presentable format., For legal reasons, I really don't want to send the complete job histiory files. My question is still this. When running terasort, would the CPU, disk and network utilization of all the nodes be more or less similar or completely different. Sometime during the day, I will post the system data from 5 nodes and that would probably explain my question better. Raj From: Ted Dunning tdunn...@maprtech.com To: common-user@hadoop.apache.org; Raj V rajv...@yahoo.com Cc: Sent: Tuesday, January 11, 2011 8:22:17 AM Subject: Re: TeraSort question. Raj, Do you have the job history files? That would be very useful. I would be happy to create some swimlane and related graphs for you if you can send me the history files. On Mon, Jan 10, 2011 at 9:06 PM, Raj V rajv...@yahoo.com wrote: All, I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting. I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same., Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O. Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run? I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3). I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.) I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight) I am using CDH3B3, even though I think this is not specific to CDH3B3. Sorry for the cross post. Raj -- Met vriendelijke groeten, Niels Basjes
Re: libjars options
Have you implemented GenericOptionsParser? Do you see your jar in the * mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view via a JT Web UI)? -- Alex Kozlov Solutions Architect Cloudera, Inc twitter: alexvk2009 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, I have tried that as well, using -files jar file But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: libjars options
There is also a blog that I recently wrote, if it helps http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote: Have you implemented GenericOptionsParser? Do you see your jar in the * mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view via a JT Web UI)? -- Alex Kozlov Solutions Architect Cloudera, Inc twitter: alexvk2009 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer f2004...@gmail.com wrote: Hi, I have tried that as well, using -files jar file But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: libjars options
Hi, Thanks a lot! I shall try this once and let you know! Regards, Krishna. On Jan 11, 2011, at 12:48 PM, Alex Kozlov wrote: There is also a blog that I recently wrote, if it helps http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote: Have you implemented GenericOptionsParser? Do you see your jar in the * mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view via a JT Web UI)? -- Alex Kozlov Solutions Architect Cloudera, Inc twitter: alexvk2009 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer f2004...@gmail.com wrote: Hi, I have tried that as well, using -files jar file But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: libjars options
Hi, Thanks a lot Alex! using GenericOptionsParser solved the issue. Previously I had used Tool and had assumed that it would take care of this. Regards, Krishna. On Jan 11, 2011, at 12:48 PM, Alex Kozlov wrote: There is also a blog that I recently wrote, if it helps http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job On Tue, Jan 11, 2011 at 12:33 PM, Alex Kozlov ale...@cloudera.com wrote: Have you implemented GenericOptionsParser? Do you see your jar in the * mapred.cache.files* or *tmpjars* parameter in your job.xml file (can view via a JT Web UI)? -- Alex Kozlov Solutions Architect Cloudera, Inc twitter: alexvk2009 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer f2004...@gmail.com wrote: Hi, I have tried that as well, using -files jar file But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: Refer to Alex Kozlov's answer on 12/11/10 On Tue, Jan 11, 2011 at 10:10 AM, C.V.Krishnakumar Iyer f2004...@gmail.comwrote: Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar Our Jar MainClass -libjars Dependent jars (separated by Commas) Arguments to our main class bin/hadoop jar /Users/hdp/cvk/myjob.jar mr2.mr2a.MR2ADriver -libjars /Users/hdp/hadoop/lib/hbase-0.20.6.jar inputmr2a outputmr2a Despite this, I find that I get the java.lang.ClassNotFoundException error! :( java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:841) at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:551) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:793) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833) The strange thing is that there is another MR job I have that runs perfectly with the libjars option! Could anybody tell me what I am doing wrong? One more thing - not sure if it is relevant : I am using the new Hadoop MapReduce API. Thanks in advance! Regards, Krishnakumar.
Re: No locks available
On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote: Dear all, Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform some jobs ). But today when I start my cluster I came across a problem on one of my datanodes. Are you running this on NFS? 2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No locks available
Re: When applying a patch, which attachment should I use?
I am not familiar with this whole svn and patch stuff, so please understand my asking. I was going to apply hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch only because I wanted to install HBase and the installation guide told me to. The append branch you mentioned, does that include hdfs-630-0.20-append.patchhttps://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch as well? Is it like the latest patch with all the good stuff packed in one? Regards, Ed 2011/1/12 Ted Dunning tdunn...@maprtech.com You may also be interested in the append branch: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/ On Tue, Jan 11, 2011 at 3:12 AM, edward choi mp2...@gmail.com wrote: Thanks for the info. I am currently using Hadoop 0.20.2, so I guess I only need apply hdfs-630-0.20-append.patch https://issues.apache.org/jira/secure/attachment/12446812/hdfs-630-0.20-append.patch . I wasn't familiar with the term trunk. I guess it means the latest development. Thanks again. Best Regards, Ed 2011/1/11 Konstantin Boudnik c...@apache.org Yeah, that's pretty crazy all right. In your case looks like that 3 patches on the top are the latest for 0.20-append branch, 0.21 branch and trunk (which perhaps 0.22 branch at the moment). It doesn't look like you need to apply all of them - just try the latest for your particular branch. The mess is caused by the fact the ppl are using different names for consequent patches (as in file.1.patch, file.2.patch etc) This is _very_ confusing indeed, especially when different contributors work on the same fix/feature. -- Take care, Konstantin (Cos) Boudnik On Mon, Jan 10, 2011 at 01:10, edward choi mp2...@gmail.com wrote: Hi, For the first time I am about to apply a patch to HDFS. https://issues.apache.org/jira/browse/HDFS-630 Above is the one that I am trying to do. But there are like 15 patches and I don't know which one to use. Could anyone tell me if I need to apply them all or just the one at the top? The whole patching process is just so confusing :-( Ed
Re: No locks available
Allen Wittenauer wrote: On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote: Dear all, Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform some jobs ). But today when I start my cluster I came across a problem on one of my datanodes. Are you running this on NFS? No Sir, I am running this on 3 Servers with local filesystem. Each Server contains 2 Hard Disks ( /hdd2-1, /hdd1-1 ) and on each servers there run 2 VM's and one occupy /hdd2-1 and the other /hdd1-1. My Namenode contains all the predefined Ip of VM's. Thanks 2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No locks available
Re: Application for testing
(Moving general@ to Bcc: list) Bo, you can try to run TeraSort from Hadoop examples: you'll see if the cluster is up and running and cen compare its performance between upgrades, if needed. Also, please don't use general@ for user questions: there's common-user@ list exactly for these purposes. With regards, Cos On Tue, Jan 11, 2011 at 07:50AM, Bo Sang wrote: Hi, guys: I have deployed a hadoop on our group's nodes. Could you recommend some typical applications for me? I want to test whether it can really work and observe its performance. -- Best Regards! Sincerely Bo Sang signature.asc Description: Digital signature
Re: Too-many fetch failure Reduce Error
Any update on this error. Thanks Adarsh Sharma wrote: Esteban Gutierrez Moguel wrote: Adarsh, Dou you have in /etc/hosts the hostnames for masters and slaves? Yes I know this issue. But did you think the error occurs while reading the output of map. I want to know the proper reason of below lines : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201101071129_0001/attempt_201101071129_0001_m_12_0/output/file.out.index esteban. On Fri, Jan 7, 2011 at 06:47, Adarsh Sharma adarsh.sha...@orkash.comwrote: Dear all, I am researching about the below error and could not able to find the reason : Data Size : 3.4 GB Hadoop-0.20.0 had...@ws32-test-lin:~/project/hadoop-0.20.2$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount /user/hadoop/page_content.txt page_content_output.txt 11/01/07 16:11:14 INFO input.FileInputFormat: Total input paths to process : 1 11/01/07 16:11:15 INFO mapred.JobClient: Running job: job_201101071129_0001 11/01/07 16:11:16 INFO mapred.JobClient: map 0% reduce 0% 11/01/07 16:11:41 INFO mapred.JobClient: map 1% reduce 0% 11/01/07 16:11:45 INFO mapred.JobClient: map 2% reduce 0% 11/01/07 16:11:48 INFO mapred.JobClient: map 3% reduce 0% 11/01/07 16:11:52 INFO mapred.JobClient: map 4% reduce 0% 11/01/07 16:11:56 INFO mapred.JobClient: map 5% reduce 0% 11/01/07 16:12:00 INFO mapred.JobClient: map 6% reduce 0% 11/01/07 16:12:05 INFO mapred.JobClient: map 7% reduce 0% 11/01/07 16:12:08 INFO mapred.JobClient: map 8% reduce 0% 11/01/07 16:12:11 INFO mapred.JobClient: map 9% reduce 0% 11/01/07 16:12:14 INFO mapred.JobClient: map 10% reduce 0% 11/01/07 16:12:17 INFO mapred.JobClient: map 11% reduce 0% 11/01/07 16:12:21 INFO mapred.JobClient: map 12% reduce 0% 11/01/07 16:12:24 INFO mapred.JobClient: map 13% reduce 0% 11/01/07 16:12:27 INFO mapred.JobClient: map 14% reduce 0% 11/01/07 16:12:30 INFO mapred.JobClient: map 15% reduce 0% 11/01/07 16:12:33 INFO mapred.JobClient: map 16% reduce 0% 11/01/07 16:12:36 INFO mapred.JobClient: map 17% reduce 0% 11/01/07 16:12:40 INFO mapred.JobClient: map 18% reduce 0% 11/01/07 16:12:45 INFO mapred.JobClient: map 19% reduce 0% 11/01/07 16:12:48 INFO mapred.JobClient: map 20% reduce 0% 11/01/07 16:12:54 INFO mapred.JobClient: map 21% reduce 0% 11/01/07 16:13:00 INFO mapred.JobClient: map 22% reduce 0% 11/01/07 16:13:04 INFO mapred.JobClient: map 22% reduce 1% 11/01/07 16:13:13 INFO mapred.JobClient: map 23% reduce 1% 11/01/07 16:13:19 INFO mapred.JobClient: map 24% reduce 1% 11/01/07 16:13:25 INFO mapred.JobClient: map 25% reduce 1% 11/01/07 16:13:30 INFO mapred.JobClient: map 26% reduce 1% 11/01/07 16:13:34 INFO mapred.JobClient: map 26% reduce 3% 11/01/07 16:13:36 INFO mapred.JobClient: map 27% reduce 3% 11/01/07 16:13:37 INFO mapred.JobClient: map 27% reduce 4% 11/01/07 16:13:39 INFO mapred.JobClient: map 28% reduce 4% 11/01/07 16:13:43 INFO mapred.JobClient: map 29% reduce 4% 11/01/07 16:13:46 INFO mapred.JobClient: map 30% reduce 4% 11/01/07 16:13:49 INFO mapred.JobClient: map 31% reduce 4% 11/01/07 16:13:52 INFO mapred.JobClient: map 32% reduce 4% 11/01/07 16:13:55 INFO mapred.JobClient: map 33% reduce 4% 11/01/07 16:13:58 INFO mapred.JobClient: map 34% reduce 4% 11/01/07 16:14:02 INFO mapred.JobClient: map 35% reduce 4% 11/01/07 16:14:05 INFO mapred.JobClient: map 36% reduce 4% 11/01/07 16:14:08 INFO mapred.JobClient: map 37% reduce 4% 11/01/07 16:14:11 INFO mapred.JobClient: map 38% reduce 4% 11/01/07 16:14:15 INFO mapred.JobClient: map 39% reduce 4% 11/01/07 16:14:19 INFO mapred.JobClient: map 40% reduce 4% 11/01/07 16:14:20 INFO mapred.JobClient: map 40% reduce 5% 11/01/07 16:14:25 INFO mapred.JobClient: map 41% reduce 5% 11/01/07 16:14:32 INFO mapred.JobClient: map 42% reduce 5% 11/01/07 16:14:38 INFO mapred.JobClient: map 43% reduce 5% 11/01/07 16:14:41 INFO mapred.JobClient: map 43% reduce 6% 11/01/07 16:14:43 INFO mapred.JobClient: map 44% reduce 6% 11/01/07 16:14:47 INFO mapred.JobClient: map 45% reduce 6% 11/01/07 16:14:50 INFO mapred.JobClient: map 46% reduce 6% 11/01/07 16:14:54 INFO mapred.JobClient: map 47% reduce 7% 11/01/07 16:14:59 INFO mapred.JobClient: map 48% reduce 7% 11/01/07 16:15:02 INFO mapred.JobClient: map 49% reduce 7% 11/01/07 16:15:05 INFO mapred.JobClient: map 50% reduce 7% 11/01/07 16:15:11 INFO mapred.JobClient: map 51% reduce 7% 11/01/07 16:15:14 INFO mapred.JobClient: map 52% reduce 7% 11/01/07 16:15:16 INFO mapred.JobClient: map 52% reduce 8% 11/01/07 16:15:20 INFO mapred.JobClient: map 53% reduce 8% 11/01/07 16:15:25 INFO mapred.JobClient: map 54% reduce 8% 11/01/07 16:15:29 INFO mapred.JobClient: map 55% reduce 8% 11/01/07 16:15:31 INFO mapred.JobClient: map 55% reduce 9% 11/01/07 16:15:33 INFO mapred.JobClient: map 56% reduce 9% 11/01/07 16:15:38 INFO mapred.JobClient: map 57% reduce 9% 11/01/07 16:15:42 INFO mapred.JobClient: map 58% reduce 9% 11/01/07 16:15:43 INFO