Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Todd Lipcon wrote: Yes, it looks like it is a kernel bug alright (see thread on kernel netdev at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair, I don't think these bugs are confined to Debian - I did some initial testing with Scientific Linux and also ran into problems with forcedeth. Interesting, good find. I try to avoid forcedeth now and have heard the same from ops people at various large linux deployments. Not sure why, but it's traditionally had a lot of bugs/regressions. FYI, the netdev guys have proposed a patch and initial testing indicates it fixes the problem (and brings the TeraSort down to about 18 minutes, so win win :) I share similar feelings about forcedeth, particularly after this, but then I'm also dubious about at least some broadcom chipsets and even Intel have had their issues (https://bugzilla.kernel.org/show_bug.cgi?id=11382) so maybe it's just that all nic's suck. Finally, I figured burning in our cluster was a good opportunity to give back to the community and do some testing on their behalf. Very admirable of you :) It is good to have some people running new kernels to suss these issues out before the rest of us check out modern technology ;-) It also means there aren't problems lurking for us in the future when we get forced to newer kernels for support/maintenance issues. I also ran into http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030 while testing a 2.6.30 kernel which may be lurking in older kernels too (and seems to have been fixed in 2.6.32) so there are perils to staying back and going forward. With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? Yep, sounds about the right ballpark. Cool, thanks for the feedback. I'm surprised that others didn't comment on the TeraSort result - perhaps others use something else for smoke-testing/benchmarking their Hadoop clusters? If so, anyone want to suggest what they do use? It'd be nice to see a collection of TeraSort results somewhere to get an idea of what cluster configs work well and for people who want to sanity check a new cluster. -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Todd Lipcon wrote: On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: Sure, but I figured I'd go with a distro now that can be largely left untouched for the next 2-3 years and Debian lenny felt that bit old for that. I know RHEL/CentOS would fit that requirement also, will see. I'm also interested in using DRBD in some of our nodes for redundancy, again, running with a newer distro should reduce the pain of configuring that. Finally, I figured burning in our cluster was a good opportunity to give back to the community and do some testing on their behalf. Very admirable of you :) It is good to have some people running new kernels to suss these issues out before the rest of us check out modern technology ;-) Tom White is planning to split off a Hadoop 0.21 branch from SVN_TRUNK at the end of the month, so if you still want to do some cluster testing, he'd be grateful for that being tested on debian too With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? #of HDDs/server will be a factor too, and no, I don't know how to predict it.
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Todd Lipcon wrote: Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. Yes, it looks like it is a kernel bug alright (see thread on kernel netdev at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair, I don't think these bugs are confined to Debian - I did some initial testing with Scientific Linux and also ran into problems with forcedeth. The overwhelming majority of production clusters run on RHEL 5.3 or RHEL 5.4 in my experience (I'm lumping CentOS 5.3/5.4 in with RHEL here). I know one or two production clusters running Debian Lenny, but none running something as new as what you're talking about. This is useful info - much appreciated. I guess if we don't manage to stabilise the current config we'll look at moving to one of those. Hadoop doesn't exercise the new features in very recent kernels, so there's no sense accepting instability - just go with something old that works! Sure, but I figured I'd go with a distro now that can be largely left untouched for the next 2-3 years and Debian lenny felt that bit old for that. I know RHEL/CentOS would fit that requirement also, will see. I'm also interested in using DRBD in some of our nodes for redundancy, again, running with a newer distro should reduce the pain of configuring that. Finally, I figured burning in our cluster was a good opportunity to give back to the community and do some testing on their behalf. With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: Todd Lipcon wrote: Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. Yes, it looks like it is a kernel bug alright (see thread on kernel netdev at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair, I don't think these bugs are confined to Debian - I did some initial testing with Scientific Linux and also ran into problems with forcedeth. Interesting, good find. I try to avoid forcedeth now and have heard the same from ops people at various large linux deployments. Not sure why, but it's traditionally had a lot of bugs/regressions. Sure, but I figured I'd go with a distro now that can be largely left untouched for the next 2-3 years and Debian lenny felt that bit old for that. I know RHEL/CentOS would fit that requirement also, will see. I'm also interested in using DRBD in some of our nodes for redundancy, again, running with a newer distro should reduce the pain of configuring that. Finally, I figured burning in our cluster was a good opportunity to give back to the community and do some testing on their behalf. Very admirable of you :) It is good to have some people running new kernels to suss these issues out before the rest of us check out modern technology ;-) With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? Yep, sounds about the right ballpark. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Allen Wittenauer wrote: On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote: When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2 datanodes of the cluster enter a state whereby they are no longer responsive to network traffic. How much free memory do you have? Lots, a few GB How many tasks per node do you have? I left this at the default. What are the service times, etc, on your IO system? Can you clarify this query? Has anyone run into similar problems with their environments? I noticed that the when the nodes become unresponsive, it often happens when the TeraSort is at I've always seen Linux nodes go unresponsive when they get memory starved to the point that the OOM can't function because it can't allocate enough mem. Sure, but I can login to the unresponsive nodes via the console - it's just the network that has become responsive. To be clear here, I don't suspect Hadoop is the root cause of the problem - I suspect either a kernel bug or some other operating system level bug. I was wondering if others had run into similar problems. I was also wondering in general what kernel versions and distros people are using, especially for larger production clusters. Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
On Fri, Apr 9, 2010 at 8:18 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: Allen Wittenauer wrote: On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote: When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2 datanodes of the cluster enter a state whereby they are no longer responsive to network traffic. How much free memory do you have? Lots, a few GB How many tasks per node do you have? I left this at the default. What are the service times, etc, on your IO system? Can you clarify this query? Has anyone run into similar problems with their environments? I noticed that the when the nodes become unresponsive, it often happens when the TeraSort is at I've always seen Linux nodes go unresponsive when they get memory starved to the point that the OOM can't function because it can't allocate enough mem. Sure, but I can login to the unresponsive nodes via the console - it's just the network that has become responsive. To be clear here, I don't suspect Hadoop is the root cause of the problem - I suspect either a kernel bug or some other operating system level bug. I was wondering if others had run into similar problems. Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. I was also wondering in general what kernel versions and distros people are using, especially for larger production clusters. The overwhelming majority of production clusters run on RHEL 5.3 or RHEL 5.4 in my experience (I'm lumping CentOS 5.3/5.4 in with RHEL here). I know one or two production clusters running Debian Lenny, but none running something as new as what you're talking about. Hadoop doesn't exercise the new features in very recent kernels, so there's no sense accepting instability - just go with something old that works! -Todd -- Todd Lipcon Software Engineer, Cloudera
Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Hi, I'm commissioning a new Hadoop cluster with the following spec. 45 x data nodes: - 2 x Quad-Core AMD Opteron(tm) Processor 2378 - 16GB ram - 4 x WDC WD1002FBYS 1TB SATA drives (configured as separate ext4 filesystems) 3 x name nodes: - 2 x Quad-Core AMD Opteron(tm) Processor 2378 - 32GB ram - 2 x WDC WD1002FBYS 1TB SATA drives (in software RAID1 config and ext4 filesystem) All nodes are running Debian testing/squeeze. I'm doing my benchmarking with TeraSort running as follows hadoop jar hadoop-0.20.2-examples.jar teragen -Dmapred.map.tasks=8000 100 /terasort/in hadoop jar hadoop-0.20.2-examples.jar terasort -Dmapred.reduce.tasks=530 /terasort/in /terasort/out When I run this on the Debian 2.6.30 kernel - it runs to completion in about 23 minutes (occasionally running into the cpu soft lockups problems described in [1]). I assume that is a reasonable time for this benchmark to complete in? When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2 datanodes of the cluster enter a state whereby they are no longer responsive to network traffic. Logging into these nodes via the console reveals no messages in the log-files. Running ifdown eth0 followed by ifup eth0 brings these systems back online. The systems that become unresponsive vary from run to run suggesting this is not a h/w problem specific to certain nodes. I have raised this issue with the Debian kernel team[2] and have tested various system and switch changes in an attempt to identify the cause - but without success. Has anyone run into similar problems with their environments? I noticed that the when the nodes become unresponsive, it often happens when the TeraSort is at map 100%, reduce 78% Is there any significance to that? Any feedback welcome (including comments on what distro/kernel combinations others are using). Thanks, -stephen [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572201 -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com