Re: hadoop jobs take long time to setup
Of course... Thanks for the help! Cheers //Marcus On Mon, Jun 29, 2009 at 12:32 AM, Mikhail Bautin mbau...@gmail.com wrote: Marcus, The code that needs to patched is in the tasktracker, because the tasktracker is what starts the child JVM that runs user code. Thanks, Mikhail On Sun, Jun 28, 2009 at 6:14 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Just to be clear. It is the jobtracker that needs the patched code right ? Or is it the tasktrackers ? Kindly //Marcus On Mon, Jun 29, 2009 at 12:08 AM, Mikhail Bautin mbau...@gmail.com wrote: Marcus, We currently use 0.20.0 but this patch just inserts 8 lines of code into TaskRunner.java, which could certainly be done with 0.18.3. Yes, this patch just appends additional jars to the child JVM classpath. I've never really used tmpjars myself, but if it involves uploading multiple jar files into HDFS every time a job is started, I see how it can be really slow. On our ~80-job workflow this would have really slowed things down. Thanks, Mikhail On Sun, Jun 28, 2009 at 5:40 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Makes sense... I will try both rsync and NFS but I think rsync will beat NFS since NFS can be slow as hell sometimes but what the heck we already have our maven2 repo on NFS so why not :) Are you saying that this patch make the client able to configure which extra local jar files to add as classpath when firing up the TaskTrackerChild ? To be explicit: Do you confirm that using tmpjars like I do is a costful slow operation ? To what branch to you apply the patch (we use 0.18.3) ? Cheers //Marcus On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin mbau...@gmail.com wrote: This is the way we deal with this problem, too. We put our jar files on NFS, and the attached patch makes possible to add those jar files to the tasktracker classpath through a configuration property. Thanks, Mikhail On Sun, Jun 28, 2009 at 5:21 PM, Stuart White stuart.whi...@gmail.com wrote: Although I've never done it, I believe you could manually copy your jar files out to your cluster somewhere in hadoop's classpath, and that would remove the need for you to copy them to your cluster at the start of each job. On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Running without a jobtracker makes the job start almost instantly. I think it is due to something with the classloader. I use a huge amount of jarfiles jobConf.set(tmpjars, jar1.jar,jar2.jar)... which need to be loaded every time I guess. By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child live forever then ? Cheers //Marcus On Sun, Jun 28, 2009 at 9:54 PM, tim robertson timrobertson...@gmail.com wrote: How long does it take to start the code locally in a single thread? Can you reuse the JVM so it only starts once per node per job? conf.setNumTasksToExecutePerJvm(-1) Cheers, Tim On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Wonder how one should improve the startup times of a hadoop job. Some of my jobs which have a lot of dependencies in terms of many jar files take a long time to start in hadoop up to 2 minutes some times. The data input amounts in these cases are neglible so it seems that Hadoop have a really high setup cost, which I can live with but this seems to much. Let's say a job takes 10 minutes to complete then it is bad if it takes 2 mins to set it up... 20-30 sec max would be a lot more reasonable. Hints ? //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
hadoop jobs take long time to setup
Hi. Wonder how one should improve the startup times of a hadoop job. Some of my jobs which have a lot of dependencies in terms of many jar files take a long time to start in hadoop up to 2 minutes some times. The data input amounts in these cases are neglible so it seems that Hadoop have a really high setup cost, which I can live with but this seems to much. Let's say a job takes 10 minutes to complete then it is bad if it takes 2 mins to set it up... 20-30 sec max would be a lot more reasonable. Hints ? //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: Scaling out/up or a mix
Hi. The crawlers are _very_ threaded but no we use our own threading framework since it was not available at the time on hadoop-core. Crawlers normally just wait a lot on clients inducing very little CPU but consumes some memory due to the parallellism. //Marcus On Sat, Jun 27, 2009 at 6:10 PM, jason hadoop jason.had...@gmail.comwrote: How about multi-threaded mappers? Multi-Threaded mappers are ideal for map tasks that are non locally io bound with many distinct endpoints. You can also control the thread count on a per job basis. On Sat, Jun 27, 2009 at 8:26 AM, Marcus Herou marcus.he...@tailsweep.com wrote: The argument currently against increasing num-mappers is that the machines will get into oom and since a lot of the jobs are crawlers I need more ip-numbers so I don't get banned :) Thing is that we currently have solr on the very same machines and data-nodes as well so I can only give the MR nodes about 1G memory since I need SOLR to have 4G... Now I see that I should get some obvious and juste critique about the layout of this arch but I'm a little limited in budget and so is then the arch :) However is it wise to have the MR tasks on the same nodes as the data-nodes or should I split the arch ? I mean the data-nodes perhaps need more disk-IO and the MR more memory and CPU ? Trying to find a sweetspot hardware spec of those two roles. //Marcus On Sat, Jun 27, 2009 at 4:24 AM, Brian Bockelman bbock...@cse.unl.edu wrote: Hey Marcus, Are you recording the data rates coming out of HDFS? Since you have such a low CPU utilizations, I'd look at boxes utterly packed with big hard drives (also, why are you using RAID1 for Hadoop??). You can get 1U boxes with 4 drive bays or 2U boxes with 12 drive bays. Based on the data rates you see, make the call. On the other hand, what's the argument against running 3x more mappers per box? It seems that your boxes still have more overhead to use -- there's no I/O wait. Brian On Jun 26, 2009, at 4:43 PM, Marcus Herou wrote: Hi. We have a deployment of 10 hadoop servers and I now need more mapping capability (no not just add more mappers per instance) since I have so many jobs running. Now I am wondering what I should aim on... Memory, cpu or disk... How long is a rope perhaps you would say ? A typical server is currently using about 15-20% cpu today on a quad-core 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks. Some specs below. mpstat 2 5 Linux 2.6.24-19-server (mapreduce2) 06/26/2009 11:36:13 PM CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 11:36:15 PM all 22.820.003.241.370.622.49 0.00 69.45 8572.50 11:36:17 PM all 13.560.001.741.990.622.61 0.00 79.48 8075.50 11:36:19 PM all 14.320.002.241.121.122.24 0.00 78.95 9219.00 11:36:21 PM all 14.710.000.871.620.251.75 0.00 80.80 8489.50 11:36:23 PM all 12.690.000.871.240.500.75 0.00 83.96 5495.00 Average: all 15.620.001.791.470.621.97 0.00 78.53 7970.30 What I am thinking is... Is it wiser to go for many of these cheap boxes with 8GB of RAM or should I for instance focus on machines which can give more I|O throughput ? I know that these things are hard but perhaps someone have draw some conclusions before the pragmatic way. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: hadoop jobs take long time to setup
Hi. Running without a jobtracker makes the job start almost instantly. I think it is due to something with the classloader. I use a huge amount of jarfiles jobConf.set(tmpjars, jar1.jar,jar2.jar)... which need to be loaded every time I guess. By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child live forever then ? Cheers //Marcus On Sun, Jun 28, 2009 at 9:54 PM, tim robertson timrobertson...@gmail.comwrote: How long does it take to start the code locally in a single thread? Can you reuse the JVM so it only starts once per node per job? conf.setNumTasksToExecutePerJvm(-1) Cheers, Tim On Sun, Jun 28, 2009 at 9:43 PM, Marcus Heroumarcus.he...@tailsweep.com wrote: Hi. Wonder how one should improve the startup times of a hadoop job. Some of my jobs which have a lot of dependencies in terms of many jar files take a long time to start in hadoop up to 2 minutes some times. The data input amounts in these cases are neglible so it seems that Hadoop have a really high setup cost, which I can live with but this seems to much. Let's say a job takes 10 minutes to complete then it is bad if it takes 2 mins to set it up... 20-30 sec max would be a lot more reasonable. Hints ? //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: hadoop jobs take long time to setup
Makes sense... I will try both rsync and NFS but I think rsync will beat NFS since NFS can be slow as hell sometimes but what the heck we already have our maven2 repo on NFS so why not :) Are you saying that this patch make the client able to configure which extra local jar files to add as classpath when firing up the TaskTrackerChild ? To be explicit: Do you confirm that using tmpjars like I do is a costful slow operation ? To what branch to you apply the patch (we use 0.18.3) ? Cheers //Marcus On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin mbau...@gmail.com wrote: This is the way we deal with this problem, too. We put our jar files on NFS, and the attached patch makes possible to add those jar files to the tasktracker classpath through a configuration property. Thanks, Mikhail On Sun, Jun 28, 2009 at 5:21 PM, Stuart White stuart.whi...@gmail.comwrote: Although I've never done it, I believe you could manually copy your jar files out to your cluster somewhere in hadoop's classpath, and that would remove the need for you to copy them to your cluster at the start of each job. On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Running without a jobtracker makes the job start almost instantly. I think it is due to something with the classloader. I use a huge amount of jarfiles jobConf.set(tmpjars, jar1.jar,jar2.jar)... which need to be loaded every time I guess. By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child live forever then ? Cheers //Marcus On Sun, Jun 28, 2009 at 9:54 PM, tim robertson timrobertson...@gmail.com wrote: How long does it take to start the code locally in a single thread? Can you reuse the JVM so it only starts once per node per job? conf.setNumTasksToExecutePerJvm(-1) Cheers, Tim On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Wonder how one should improve the startup times of a hadoop job. Some of my jobs which have a lot of dependencies in terms of many jar files take a long time to start in hadoop up to 2 minutes some times. The data input amounts in these cases are neglible so it seems that Hadoop have a really high setup cost, which I can live with but this seems to much. Let's say a job takes 10 minutes to complete then it is bad if it takes 2 mins to set it up... 20-30 sec max would be a lot more reasonable. Hints ? //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: hadoop jobs take long time to setup
Hi. Just to be clear. It is the jobtracker that needs the patched code right ? Or is it the tasktrackers ? Kindly //Marcus On Mon, Jun 29, 2009 at 12:08 AM, Mikhail Bautin mbau...@gmail.com wrote: Marcus, We currently use 0.20.0 but this patch just inserts 8 lines of code into TaskRunner.java, which could certainly be done with 0.18.3. Yes, this patch just appends additional jars to the child JVM classpath. I've never really used tmpjars myself, but if it involves uploading multiple jar files into HDFS every time a job is started, I see how it can be really slow. On our ~80-job workflow this would have really slowed things down. Thanks, Mikhail On Sun, Jun 28, 2009 at 5:40 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Makes sense... I will try both rsync and NFS but I think rsync will beat NFS since NFS can be slow as hell sometimes but what the heck we already have our maven2 repo on NFS so why not :) Are you saying that this patch make the client able to configure which extra local jar files to add as classpath when firing up the TaskTrackerChild ? To be explicit: Do you confirm that using tmpjars like I do is a costful slow operation ? To what branch to you apply the patch (we use 0.18.3) ? Cheers //Marcus On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin mbau...@gmail.com wrote: This is the way we deal with this problem, too. We put our jar files on NFS, and the attached patch makes possible to add those jar files to the tasktracker classpath through a configuration property. Thanks, Mikhail On Sun, Jun 28, 2009 at 5:21 PM, Stuart White stuart.whi...@gmail.com wrote: Although I've never done it, I believe you could manually copy your jar files out to your cluster somewhere in hadoop's classpath, and that would remove the need for you to copy them to your cluster at the start of each job. On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Running without a jobtracker makes the job start almost instantly. I think it is due to something with the classloader. I use a huge amount of jarfiles jobConf.set(tmpjars, jar1.jar,jar2.jar)... which need to be loaded every time I guess. By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child live forever then ? Cheers //Marcus On Sun, Jun 28, 2009 at 9:54 PM, tim robertson timrobertson...@gmail.com wrote: How long does it take to start the code locally in a single thread? Can you reuse the JVM so it only starts once per node per job? conf.setNumTasksToExecutePerJvm(-1) Cheers, Tim On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hi. Wonder how one should improve the startup times of a hadoop job. Some of my jobs which have a lot of dependencies in terms of many jar files take a long time to start in hadoop up to 2 minutes some times. The data input amounts in these cases are neglible so it seems that Hadoop have a really high setup cost, which I can live with but this seems to much. Let's say a job takes 10 minutes to complete then it is bad if it takes 2 mins to set it up... 20-30 sec max would be a lot more reasonable. Hints ? //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: Scaling out/up or a mix
The argument currently against increasing num-mappers is that the machines will get into oom and since a lot of the jobs are crawlers I need more ip-numbers so I don't get banned :) Thing is that we currently have solr on the very same machines and data-nodes as well so I can only give the MR nodes about 1G memory since I need SOLR to have 4G... Now I see that I should get some obvious and juste critique about the layout of this arch but I'm a little limited in budget and so is then the arch :) However is it wise to have the MR tasks on the same nodes as the data-nodes or should I split the arch ? I mean the data-nodes perhaps need more disk-IO and the MR more memory and CPU ? Trying to find a sweetspot hardware spec of those two roles. //Marcus On Sat, Jun 27, 2009 at 4:24 AM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey Marcus, Are you recording the data rates coming out of HDFS? Since you have such a low CPU utilizations, I'd look at boxes utterly packed with big hard drives (also, why are you using RAID1 for Hadoop??). You can get 1U boxes with 4 drive bays or 2U boxes with 12 drive bays. Based on the data rates you see, make the call. On the other hand, what's the argument against running 3x more mappers per box? It seems that your boxes still have more overhead to use -- there's no I/O wait. Brian On Jun 26, 2009, at 4:43 PM, Marcus Herou wrote: Hi. We have a deployment of 10 hadoop servers and I now need more mapping capability (no not just add more mappers per instance) since I have so many jobs running. Now I am wondering what I should aim on... Memory, cpu or disk... How long is a rope perhaps you would say ? A typical server is currently using about 15-20% cpu today on a quad-core 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks. Some specs below. mpstat 2 5 Linux 2.6.24-19-server (mapreduce2) 06/26/2009 11:36:13 PM CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 11:36:15 PM all 22.820.003.241.370.622.490.00 69.45 8572.50 11:36:17 PM all 13.560.001.741.990.622.610.00 79.48 8075.50 11:36:19 PM all 14.320.002.241.121.122.240.00 78.95 9219.00 11:36:21 PM all 14.710.000.871.620.251.750.00 80.80 8489.50 11:36:23 PM all 12.690.000.871.240.500.750.00 83.96 5495.00 Average: all 15.620.001.791.470.621.970.00 78.53 7970.30 What I am thinking is... Is it wiser to go for many of these cheap boxes with 8GB of RAM or should I for instance focus on machines which can give more I|O throughput ? I know that these things are hard but perhaps someone have draw some conclusions before the pragmatic way. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Scaling out/up or a mix
Hi. We have a deployment of 10 hadoop servers and I now need more mapping capability (no not just add more mappers per instance) since I have so many jobs running. Now I am wondering what I should aim on... Memory, cpu or disk... How long is a rope perhaps you would say ? A typical server is currently using about 15-20% cpu today on a quad-core 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks. Some specs below. mpstat 2 5 Linux 2.6.24-19-server (mapreduce2) 06/26/2009 11:36:13 PM CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 11:36:15 PM all 22.820.003.241.370.622.490.00 69.45 8572.50 11:36:17 PM all 13.560.001.741.990.622.610.00 79.48 8075.50 11:36:19 PM all 14.320.002.241.121.122.240.00 78.95 9219.00 11:36:21 PM all 14.710.000.871.620.251.750.00 80.80 8489.50 11:36:23 PM all 12.690.000.871.240.500.750.00 83.96 5495.00 Average: all 15.620.001.791.470.621.970.00 78.53 7970.30 What I am thinking is... Is it wiser to go for many of these cheap boxes with 8GB of RAM or should I for instance focus on machines which can give more I|O throughput ? I know that these things are hard but perhaps someone have draw some conclusions before the pragmatic way. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: Very assymetric data allocation
Great thanks for the info! Right after I finished my last question I started to think about how Hadoop measures data allocation. Are the figures presented actually the size of HDFS on each machine or the amount of disk allocated and measured by issuing something like df. The reason why I am asking is that df -h is quite close to the figures presented in the GUI but it could be a coincidence. //Marcus On Tue, Apr 7, 2009 at 4:02 PM, Koji Noguchi knogu...@yahoo-inc.com wrote: Marcus, One known issue in 0.18.3 is HADOOP-5465. CopyPaste from https://issues.apache.org/jira/browse/HADOOP-4489?focusedCommentId=12693 956page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpa nel#action_12693956https://issues.apache.org/jira/browse/HADOOP-4489?focusedCommentId=12693%0A956page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpa%0Anel#action_12693956 Hairong said: This bug might be caused by HADOOP-5465. Once a datanode hits HADOOP-5465, NameNode sends an empty replication request to the data node on every reply to a heartbeat, thus not a single scheduled block deletion request can be sent to the data node. (Also, if you're always writing from one of the nodes, that node is more likely to get full.) Nigel, not sure if this is the issue, but it would be nice to have 0.18.4 out. Koji -Original Message- From: Marcus Herou [mailto:marcus.he...@tailsweep.com] Sent: Tuesday, April 07, 2009 12:45 AM To: hadoop-u...@lucene.apache.org Subject: Very assymetric data allocation Hi. We are running Hadoop 0.18.3 and noticed a strange issue when one of our machines went out of disk yesterday. If you can see the table below it would display that the server mapredcoord is 66.91% allocated and the others are almost empty. How can that be ? Any information about this would be very helpful. mapredcoord is as well our jobtracker. //Marcus Node Last Contact Admin State Size (GB) Used (%) Used (%) Remaining (GB) Blocks mapredcoordhttp://mapredcoord:50076/browseDirectory.jsp?namenodeInfoPor t=50070dir=%2Fhttp://mapredcoord:50076/browseDirectory.jsp?namenodeInfoPor%0At=50070dir=%2F 2In Service416.6966.91 90.9419806 mapreduce2http://mapreduce2:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce2:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 2In Service416.696.71 303.54456 mapreduce3http://mapreduce3:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce3:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 2In Service416.690.44 351.693975 mapreduce4http://mapreduce4:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce4:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 0In Service416.690.25 355.821549 mapreduce5http://mapreduce5:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce5:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 2In Service416.690.42 347.683995 mapreduce6http://mapreduce6:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce6:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 0In Service416.690.43 352.73982 mapreduce7http://mapreduce7:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce7:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 0In Service416.690.5 351.914079 mapreduce8http://mapreduce8:50076/browseDirectory.jsp?namenodeInfoPort= 50070dir=%2Fhttp://mapreduce8:50076/browseDirectory.jsp?namenodeInfoPort=%0A50070dir=%2F 1In Service416.690.48 350.154169 -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Very assymetric data allocation
Hi. We are running Hadoop 0.18.3 and noticed a strange issue when one of our machines went out of disk yesterday. If you can see the table below it would display that the server mapredcoord is 66.91% allocated and the others are almost empty. How can that be ? Any information about this would be very helpful. mapredcoord is as well our jobtracker. //Marcus Node Last Contact Admin State Size (GB) Used (%) Used (%) Remaining (GB) Blocks mapredcoordhttp://mapredcoord:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F2In Service416.6966.91 90.9419806 mapreduce2http://mapreduce2:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F2In Service416.696.71 303.54456 mapreduce3http://mapreduce3:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F2In Service416.690.44 351.693975 mapreduce4http://mapreduce4:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F0In Service416.690.25 355.821549 mapreduce5http://mapreduce5:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F2In Service416.690.42 347.683995 mapreduce6http://mapreduce6:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F0In Service416.690.43 352.73982 mapreduce7http://mapreduce7:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F0In Service416.690.5 351.914079 mapreduce8http://mapreduce8:50076/browseDirectory.jsp?namenodeInfoPort=50070dir=%2F1In Service416.690.48 350.154169 -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Lingering TaskTracker$Child
Hi. Today I noticed when I ran a Solr Indexing job through our Hadoop cluster that the master MySQL database where screaming about Too Many Connections. I wondered how that could happen so I logged into my Hadoop machines and searched through the logs. Nothing strange there. Then I just did a jps: r...@mapreduce1:~# jps 10701 TaskTracker$Child 9567 NameNode 5435 TaskTracker$Child 31801 Bootstrap 7349 TaskTracker$Child 6197 TaskTracker$Child 7761 TaskTracker$Child 10453 TaskTracker$Child 11232 TaskTracker$Child 3 TaskTracker$Child 9688 DataNode 10877 TaskTracker$Child 6504 TaskTracker$Child 10236 TaskTracker$Child 9852 TaskTracker 6515 TaskTracker$Child 11396 TaskTracker$Child 11741 Jps 6191 TaskTracker$Child 10981 TaskTracker$Child 7742 TaskTracker$Child 5946 TaskTracker$Child 11315 TaskTracker$Child 8112 TaskTracker$Child 11580 TaskTracker$Child 11490 TaskTracker$Child 5687 TaskTracker$Child 5927 TaskTracker$Child 27144 WrapperSimpleApp 7368 TaskTracker$Child Damn! Each Child have it's own DataSource (dbcp pool) tweaked down so it only can have one active connection to any shard at any time. Background: I ran out of connections during the Christmas holidays since I have 60 shards (10 per MySQL machine) and each required a DB-Pool which allowed too many active+idle connections. Anyway I have no active jobs at the moment so the children should have died by themselves. Fortunately I have a little nice script which kills the bastards: jps |egrep TaskTracker.+ | awk '{print $1}'|xargs kill I will probably put that in a cronjob which kills long running children... Anyway, how can this happen ? Am I doing something really stupid along the way ? Hard facts: Ubuntu Hardy-Heron, 2.6.24-19-server java version 1.6.0_06 Hadoop-0.18.2 It's my own classes which fires the jobs through JobClient (JobClient.runJob(job)) I feed the jar to hadoop by issuing: job.setJar(jarFile); (comes from a bash script) I feed deps into hadoop by issuing: job.set(tmpjars, jarFiles); (comes by parsing external CLASSPATH ENV in bash) The client do not complain, se example output below (I write no data to HDFS ((HDFS bytes written=774)), since I mostly use it for crawling and all crawlers/indexers access my sharding db structure directly without intermediate storage): 2009-01-25 17:12:11.175 INFO main org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1 2009-01-25 17:12:11.176 INFO main org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1 2009-01-25 17:12:11.437 INFO main org.apache.hadoop.mapred.JobClient - Running job: job_200901251629_0011 2009-01-25 17:12:12.439 INFO main org.apache.hadoop.mapred.JobClient - map 0% reduce 0% 2009-01-25 17:12:35.481 INFO main org.apache.hadoop.mapred.JobClient - map 6% reduce 0% 2009-01-25 17:12:40.493 INFO main org.apache.hadoop.mapred.JobClient - map 21% reduce 0% 2009-01-25 17:12:45.502 INFO main org.apache.hadoop.mapred.JobClient - map 31% reduce 0% 2009-01-25 17:12:50.511 INFO main org.apache.hadoop.mapred.JobClient - map 51% reduce 0% 2009-01-25 17:12:55.520 INFO main org.apache.hadoop.mapred.JobClient - map 67% reduce 0% 2009-01-25 17:13:00.533 INFO main org.apache.hadoop.mapred.JobClient - map 72% reduce 0% 2009-01-25 17:13:05.543 INFO main org.apache.hadoop.mapred.JobClient - map 84% reduce 0% 2009-01-25 17:13:10.552 INFO main org.apache.hadoop.mapred.JobClient - map 95% reduce 0% 2009-01-25 17:13:15.560 INFO main org.apache.hadoop.mapred.JobClient - map 98% reduce 0% 2009-01-25 17:13:20.568 INFO main org.apache.hadoop.mapred.JobClient - Job complete: job_200901251629_0011 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - Counters: 7 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - File Systems 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - HDFS bytes read=2741143 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - HDFS bytes written=774 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - Job Counters 2009-01-25 17:13:20.570 INFO main org.apache.hadoop.mapred.JobClient - Rack-local map tasks=9 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - Launched map tasks=9 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - Map-Reduce Framework 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - Map input records=48314 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - Map input bytes=2732424 2009-01-25 17:13:20.571 INFO main org.apache.hadoop.mapred.JobClient - Map output records=0 Any suggestions or pointers would be greatly appreciated. Hmm Coming to think about something. I start X threads from inside Hadoop almost cut'n pasted from Nutch. If a thread somehow would linger, would Hadoop not be able to shutdown even though there is nothing more to read from the RecordReader ? Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312
Re: Lingering TaskTracker$Child
0.78037554% 4094 docs, 0 errors, 107.7 docs/s 2009-01-25 17:49:10.115 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 0.84660184% 4448 docs, 0 errors, 108.5 docs/s 2009-01-25 17:49:13.117 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 0.9146803% 4810 docs, 0 errors, 109.3 docs/s 2009-01-25 17:49:16.118 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 0.9521195% 5044 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:18.626 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 1.0% 5364 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:18.631 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - Task attempt_200901251629_0012_m_04_0 is done. 2009-01-25 17:49:18.631 INFO main org.apache.hadoop.mapred.TaskRunner - Task 'attempt_200901251629_0012_m_04_0' done. 2009-01-25 17:49:19.120 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 1.0% 5364 docs, 0 errors, 107.3 docs/s 2009-01-25 17:49:19.120 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0012_m_04_0 Ignoring status-update since task is 'done' 2009-01-25 17:49:35.582 INFO taskCleanup org.apache.hadoop.mapred.TaskTracker - Received 'KillJobAction' for job: job_200901251629_0012 2009-01-25 17:49:35.582 INFO taskCleanup org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0012_m_04_0 done; removing files. # Still processes left even though the TaskTracker said: Received 'KillJobAction' for job: job_200901251629_0012 r...@mapreduce2:~# jps 10732 Jps 10634 TaskTracker$Child 8660 DataNode 8824 TaskTracker 8730 SecondaryNameNode 25060 Bootstrap r...@mapreduce2:~# date Sun Jan 25 17:51:48 CET 2009 r...@mapreduce2:~# On Sun, Jan 25, 2009 at 5:42 PM, Marcus Herou marcus.he...@tailsweep.comwrote: Hi. Today I noticed when I ran a Solr Indexing job through our Hadoop cluster that the master MySQL database where screaming about Too Many Connections. I wondered how that could happen so I logged into my Hadoop machines and searched through the logs. Nothing strange there. Then I just did a jps: r...@mapreduce1:~# jps 10701 TaskTracker$Child 9567 NameNode 5435 TaskTracker$Child 31801 Bootstrap 7349 TaskTracker$Child 6197 TaskTracker$Child 7761 TaskTracker$Child 10453 TaskTracker$Child 11232 TaskTracker$Child 3 TaskTracker$Child 9688 DataNode 10877 TaskTracker$Child 6504 TaskTracker$Child 10236 TaskTracker$Child 9852 TaskTracker 6515 TaskTracker$Child 11396 TaskTracker$Child 11741 Jps 6191 TaskTracker$Child 10981 TaskTracker$Child 7742 TaskTracker$Child 5946 TaskTracker$Child 11315 TaskTracker$Child 8112 TaskTracker$Child 11580 TaskTracker$Child 11490 TaskTracker$Child 5687 TaskTracker$Child 5927 TaskTracker$Child 27144 WrapperSimpleApp 7368 TaskTracker$Child Damn! Each Child have it's own DataSource (dbcp pool) tweaked down so it only can have one active connection to any shard at any time. Background: I ran out of connections during the Christmas holidays since I have 60 shards (10 per MySQL machine) and each required a DB-Pool which allowed too many active+idle connections. Anyway I have no active jobs at the moment so the children should have died by themselves. Fortunately I have a little nice script which kills the bastards: jps |egrep TaskTracker.+ | awk '{print $1}'|xargs kill I will probably put that in a cronjob which kills long running children... Anyway, how can this happen ? Am I doing something really stupid along the way ? Hard facts: Ubuntu Hardy-Heron, 2.6.24-19-server java version 1.6.0_06 Hadoop-0.18.2 It's my own classes which fires the jobs through JobClient (JobClient.runJob(job)) I feed the jar to hadoop by issuing: job.setJar(jarFile); (comes from a bash script) I feed deps into hadoop by issuing: job.set(tmpjars, jarFiles); (comes by parsing external CLASSPATH ENV in bash) The client do not complain, se example output below (I write no data to HDFS ((HDFS bytes written=774)), since I mostly use it for crawling and all crawlers/indexers access my sharding db structure directly without intermediate storage): 2009-01-25 17:12:11.175 INFO main org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1 2009-01-25 17:12:11.176 INFO main org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1 2009-01-25 17:12:11.437 INFO main org.apache.hadoop.mapred.JobClient - Running job: job_200901251629_0011 2009-01-25 17:12:12.439 INFO main org.apache.hadoop.mapred.JobClient - map 0% reduce 0% 2009-01-25 17:12:35.481 INFO main org.apache.hadoop.mapred.JobClient - map 6% reduce 0% 2009-01-25 17:12:40.493 INFO main org.apache.hadoop.mapred.JobClient - map 21% reduce
Re: Lingering TaskTracker$Child
Thanks! So by you experience would this be good enough ? (Notice the System.exit(0)) I implement the MapRunnable interface. public void run(RecordReaderLongWritable, Text recordReader, OutputCollectorWritableComparable, WritableComparable outputCollector, Reporter reporter) throws IOException { this.recordReader = recordReader; this.outputCollector = outputCollector; this.reporter = reporter; int threads = Integer.valueOf(this.getConf().get(getClass().getName()+.threads, 10)); log.info(Starting with +threads + threads); long timeout = Long.valueOf(this.getConf().get(getClass().getName()+.timeout, 60)); for (int i = 0; i threads; i++) { // spawn threads new FetcherThread().start(); } do { // wait for threads to exit try { Thread.sleep(1000); } catch (InterruptedException e) {} reportStatus(); // some requests seem to hang, despite all intentions synchronized (this) { if ((System.currentTimeMillis() - lastRequestStart) timeout) { if (log.isWarnEnabled()) { log.warn(Aborting with +activeThreads+ hung threads.); } return; } } } while (activeThreads 0); log.info(All threads seem to be done, exiting); System.exit(0); } On Sun, Jan 25, 2009 at 5:57 PM, jason hadoop jason.had...@gmail.comwrote: We had trouble like that with some jobs, when the child ran additional threads that were not set at daemon priority. These hold the Child JVM from exiting. JMX was the cause in our case, but we have seen our JNI jobs do it also. In the end we made a local mod that forced a System.exit in the finally block of the Child main. On Sun, Jan 25, 2009 at 8:53 AM, Marcus Herou marcus.he...@tailsweep.com wrote: Some extra info, apparently the child exits with a status of 143. 2009-01-25 17:13:11.110 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_05_0 1.0% 5364 docs, 0 errors, 124.7 docs/s 2009-01-25 17:13:11.114 INFO IPC Server handler 1 on 32274 org.apache.hadoop.mapred.TaskTracker - Task attempt_200901251629_0011_m_05_0 is done. 2009-01-25 17:13:11.116 INFO main org.apache.hadoop.mapred.TaskRunner - Task 'attempt_200901251629_0011_m_05_0' done. 2009-01-25 17:13:12.644 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_05_0 1.0% 5364 docs, 0 errors, 124.7 docs/s 2009-01-25 17:13:12.644 INFO IPC Server handler 0 on 32274 org.apache.hadoop.mapred.TaskTracker - attempt_200901251629_0011_m_05_0 Ignoring status-update since task is 'done' 2009-01-25 17:13:24.996 INFO taskCleanup org.apache.hadoop.mapred.TaskTracker - Received 'KillJobAction' for job: job_200901251629_0011 2009-01-25 17:13:24.996 INFO taskCleanup org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0011_m_05_0 done; removing files. 2009-01-25 17:47:22.668 WARN Thread-23 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0001_m_06_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.669 WARN Thread-23 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0001_m_06_0. Ignored. 2009-01-25 17:47:22.671 WARN Thread-23 org.apache.hadoop.mapred.TaskTracker - Unknown child task finshed: attempt_200901251629_0001_m_06_0. Ignored. 2009-01-25 17:47:22.713 WARN Thread-79 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0002_m_07_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.713 WARN Thread-159 org.apache.hadoop.mapred.TaskRunner - attempt_200901251629_0011_m_05_0 Child Error java.io.IOException: Task process exit with nonzero status of 143. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) 2009-01-25 17:47:22.713 WARN Thread-79 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0002_m_07_0. Ignored. 2009-01-25 17:47:22.714 WARN Thread-159 org.apache.hadoop.mapred.TaskTracker - Error from unknown child task: attempt_200901251629_0011_m_05_0. Ignored. 2009
DataNode/TaskTracker memory constraints.
Hi. All Hadoop components are started with -Xmx1000M as per default. I am planning to throw in some data/task nodes here and there in my arch. However most machines have only 4G physical RAM so allocating 2G + overhead ~2.5G to hadoop is a little risky since they could very well become inaccessible if it needs to compete with other processes for RAM. I have experienced this many times with java processes going haywire where I run other services in parallell. Anyway I would like to understand the reasoning about having 1G allocated per process. I figure that the DataNode could survive with a little less as well the TaskTracker if the jobs running in it do not consume so much memory. Of course each process would like to have even more memory than 1G but if I need to cut down I would like to know which to cut and what I loose by doing so. Any thoughts? Trial and error is of course an option but I would like to hear the basic thoughts about how memory should be utilized to gain max out of the boxes. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
NumMapTasks and NumReduceTasks with MapRunnable
Hi. We are finally in the beta stage with our crawler and have tested it with a few hundred thousand urls. However it performs worse than if we would run it on a local machine without connecting to a hadoop JobTracker. Each crawl Job is fairly alike a Nutch Fetcher job which spawns X threads which all read the same RecordReader and starts to fetch the current url assigned. However I am not able to utilize all our 9 machines at the same time which is really preferable since this is an external IO bound job (remote servers). How can I with a crawl list of just 9 urls (stupidly small I know) make sure that all machines is used at least once ? With a crawl list of 900 how can i make sure at least 100 are crawled at the same time on all machines ? And so on with much bigger crawl lists (which is why need hadoop anyway). Just as I write this I launched a job where i manually set the numMapTasks to 9 and it seems to be fruitful, quite fast crawl actually :) however I wonder if this is how I should think with all MapRunnables ? Next Job we call is PersistOutLinks and yep it goes through a massive list of source-target links and saves them in a DB. This list is of a magnitude of at least a 100 times larger than the Fetcher list. Is it still smart to hardcode a value 9 to numMapTasks for this MapRunnable job ? Or should I create some form of InputFormat.getInputSplits based on the crawl/outlink sizes ? Of course the numMapTasks are not hardcoded but they are injected into the Configuration based on a properties file. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Parhely (ORM for HBase) released!
Hi guys. Finally I released the first draft of an ORM for HBase named Parhely. Check it out at http://dev.tailsweep.com/ Kindly //Marcus
Perfect sysctl
Hi. Just wondering if someone have found some good Linux settings for I|O intensive workload with Hadoop. Since most usecases with Hadoop is I|O-bound and since it uses the network frequently I guess that the tcp buffers and kernel buffers should be tweaked. (Even with CPU-bound load). I as well guess that you should choose an I|O scheduler like deadline or perhaps cfq. We will use many of the tricks found here: http://www.gluster.org/docs/index.php/Guide_to_Optimizing_GlusterFS Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com/ http://blogg.tailsweep.com/