Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-15 Thread stephen mulcahy

Todd Lipcon wrote:

Yes, it looks like it is a kernel bug alright (see thread on kernel netdev
at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair,
I don't think these bugs are confined to Debian - I did some initial testing
with Scientific Linux and also ran into problems with forcedeth.



Interesting, good find. I try to avoid forcedeth now and have heard the same
from ops people at various large linux deployments. Not sure why, but it's
traditionally had a lot of bugs/regressions.


FYI, the netdev guys have proposed a patch and initial testing indicates 
it fixes the problem (and brings the TeraSort down to about 18 minutes, 
so win win :)


I share similar feelings about forcedeth, particularly after this, but 
then I'm also dubious about at least some broadcom chipsets and even 
Intel have had their issues 
(https://bugzilla.kernel.org/show_bug.cgi?id=11382) so maybe it's just 
that all nic's suck.



Finally, I figured burning in our cluster was a good opportunity to give
back to the community and do some testing on their behalf.


Very admirable of you :) It is good to have some people running new kernels
to suss these issues out before the rest of us check out modern technology
;-)


It also means there aren't problems lurking for us in the future when we 
get forced to newer kernels for support/maintenance issues. I also ran 
into http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030 while 
testing a 2.6.30 kernel which may be lurking in older kernels too (and 
seems to have been fixed in 2.6.32) so there are perils to staying back 
and going forward.



With regard to our TeraSort benchmark time of ~23 minutes - is that in the
right ballpark for a cluster of 45 data nodes and a nn and 2nn?



Yep, sounds about the right ballpark.


Cool, thanks for the feedback. I'm surprised that others didn't comment 
on the TeraSort result - perhaps others use something else for 
smoke-testing/benchmarking their Hadoop clusters? If so, anyone want to 
suggest what they do use? It'd be nice to see a collection of TeraSort 
results somewhere to get an idea of what cluster configs work well and 
for people who want to sanity check a new cluster.


-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-15 Thread Steve Loughran

Todd Lipcon wrote:

On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy
stephen.mulc...@deri.orgwrote:



Sure, but I figured I'd go with a distro now that can be largely left
untouched for the next 2-3 years and Debian lenny felt that bit old for
that. I know RHEL/CentOS would fit that requirement also, will see. I'm also
interested in using DRBD in some of our nodes for redundancy, again, running
with a newer distro should reduce the pain of configuring that.

Finally, I figured burning in our cluster was a good opportunity to give
back to the community and do some testing on their behalf.



Very admirable of you :) It is good to have some people running new kernels
to suss these issues out before the rest of us check out modern technology
;-)


Tom White is planning to split off a Hadoop 0.21 branch from SVN_TRUNK 
at the end of the month, so if you still want to do some cluster 
testing, he'd be grateful for that being tested on debian too






With regard to our TeraSort benchmark time of ~23 minutes - is that in the
right ballpark for a cluster of 45 data nodes and a nn and 2nn?


#of HDDs/server will be a factor too, and no, I don't know how to 
predict it.





Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-13 Thread stephen mulcahy

Todd Lipcon wrote:

Most likely a kernel bug. In previous versions of Debian there was a buggy
forcedeth driver, for example, that caused it to drop off the network in
high load. Who knows what new bug is in 2.6.32 which is brand spanking new.


Yes, it looks like it is a kernel bug alright (see thread on kernel 
netdev at http://marc.info/?t=12709428891r=1w=2 if interested). To 
be fair, I don't think these bugs are confined to Debian - I did some 
initial testing with Scientific Linux and also ran into problems with 
forcedeth.



The overwhelming majority of production clusters run on RHEL 5.3 or RHEL 5.4
in my experience (I'm lumping CentOS 5.3/5.4 in with RHEL here). I know one
or two production clusters running Debian Lenny, but none running something
as new as what you're talking about. 


This is useful info - much appreciated. I guess if we don't manage to 
stabilise the current config we'll look at moving to one of those.



Hadoop doesn't exercise the new
features in very recent kernels, so there's no sense accepting instability -
just go with something old that works!


Sure, but I figured I'd go with a distro now that can be largely left 
untouched for the next 2-3 years and Debian lenny felt that bit old for 
that. I know RHEL/CentOS would fit that requirement also, will see. I'm 
also interested in using DRBD in some of our nodes for redundancy, 
again, running with a newer distro should reduce the pain of configuring 
that.


Finally, I figured burning in our cluster was a good opportunity to give 
back to the community and do some testing on their behalf.


With regard to our TeraSort benchmark time of ~23 minutes - is that in 
the right ballpark for a cluster of 45 data nodes and a nn and 2nn?


Thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-13 Thread Todd Lipcon
On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy
stephen.mulc...@deri.orgwrote:

 Todd Lipcon wrote:

 Most likely a kernel bug. In previous versions of Debian there was a buggy
 forcedeth driver, for example, that caused it to drop off the network in
 high load. Who knows what new bug is in 2.6.32 which is brand spanking
 new.


 Yes, it looks like it is a kernel bug alright (see thread on kernel netdev
 at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair,
 I don't think these bugs are confined to Debian - I did some initial testing
 with Scientific Linux and also ran into problems with forcedeth.


Interesting, good find. I try to avoid forcedeth now and have heard the same
from ops people at various large linux deployments. Not sure why, but it's
traditionally had a lot of bugs/regressions.


 Sure, but I figured I'd go with a distro now that can be largely left
 untouched for the next 2-3 years and Debian lenny felt that bit old for
 that. I know RHEL/CentOS would fit that requirement also, will see. I'm also
 interested in using DRBD in some of our nodes for redundancy, again, running
 with a newer distro should reduce the pain of configuring that.

 Finally, I figured burning in our cluster was a good opportunity to give
 back to the community and do some testing on their behalf.


Very admirable of you :) It is good to have some people running new kernels
to suss these issues out before the rest of us check out modern technology
;-)



 With regard to our TeraSort benchmark time of ~23 minutes - is that in the
 right ballpark for a cluster of 45 data nodes and a nn and 2nn?


Yep, sounds about the right ballpark.

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-09 Thread stephen mulcahy

Allen Wittenauer wrote:

On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:

When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 
2 datanodes of the cluster enter a state whereby they are no longer responsive 
to network traffic.


How much free memory do you have?


Lots, a few GB



How many tasks per node do you have?


I left this at the default.



What are the service times, etc, on your IO system?  


Can you clarify this query?




Has anyone run into similar problems with their environments? I noticed that 
the when the nodes become unresponsive, it often happens when the TeraSort is at


I've always seen Linux nodes go unresponsive when they get memory starved to 
the point that the OOM can't function because it can't allocate enough mem.


Sure, but I can login to the unresponsive nodes via the console - it's 
just the network that has become responsive. To be clear here, I don't 
suspect Hadoop is the root cause of the problem - I suspect either a 
kernel bug or some other operating system level bug. I was wondering if 
others had run into similar problems.


I was also wondering in general what kernel versions and distros people 
are using, especially for larger production clusters.


Thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-09 Thread Todd Lipcon
On Fri, Apr 9, 2010 at 8:18 AM, stephen mulcahy stephen.mulc...@deri.orgwrote:

 Allen Wittenauer wrote:

 On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:

 When I run this on the Debian 2.6.32 kernel - over the course of the run,
 1 or 2 datanodes of the cluster enter a state whereby they are no longer
 responsive to network traffic.


 How much free memory do you have?


 Lots, a few GB



 How many tasks per node do you have?


 I left this at the default.



 What are the service times, etc, on your IO system?


 Can you clarify this query?



  Has anyone run into similar problems with their environments? I noticed
 that the when the nodes become unresponsive, it often happens when the
 TeraSort is at


 I've always seen Linux nodes go unresponsive when they get memory starved
 to the point that the OOM can't function because it can't allocate enough
 mem.


 Sure, but I can login to the unresponsive nodes via the console - it's just
 the network that has become responsive. To be clear here, I don't suspect
 Hadoop is the root cause of the problem - I suspect either a kernel bug or
 some other operating system level bug. I was wondering if others had run
 into similar problems.


Most likely a kernel bug. In previous versions of Debian there was a buggy
forcedeth driver, for example, that caused it to drop off the network in
high load. Who knows what new bug is in 2.6.32 which is brand spanking new.



 I was also wondering in general what kernel versions and distros people are
 using, especially for larger production clusters.


The overwhelming majority of production clusters run on RHEL 5.3 or RHEL 5.4
in my experience (I'm lumping CentOS 5.3/5.4 in with RHEL here). I know one
or two production clusters running Debian Lenny, but none running something
as new as what you're talking about. Hadoop doesn't exercise the new
features in very recent kernels, so there's no sense accepting instability -
just go with something old that works!

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-08 Thread stephen mulcahy

Hi,

I'm commissioning a new Hadoop cluster with the following spec.

45 x data nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 16GB ram
- 4 x WDC WD1002FBYS 1TB SATA drives (configured as separate ext4 
filesystems)


3 x name nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 32GB ram
- 2 x WDC WD1002FBYS 1TB SATA drives (in software RAID1 config and ext4 
filesystem)


All nodes are running Debian testing/squeeze.

I'm doing my benchmarking with TeraSort running as follows

hadoop jar hadoop-0.20.2-examples.jar teragen -Dmapred.map.tasks=8000 
100 /terasort/in


hadoop jar hadoop-0.20.2-examples.jar terasort -Dmapred.reduce.tasks=530 
/terasort/in /terasort/out


When I run this on the Debian 2.6.30 kernel - it runs to completion in 
about 23 minutes (occasionally running into the cpu soft lockups 
problems described in [1]). I assume that is a reasonable time for this 
benchmark to complete in?


When I run this on the Debian 2.6.32 kernel - over the course of the 
run, 1 or 2 datanodes of the cluster enter a state whereby they are no 
longer responsive to network traffic.


Logging into these nodes via the console reveals no messages in the 
log-files. Running ifdown eth0 followed by ifup eth0 brings these 
systems back online. The systems that become unresponsive vary from run 
to run suggesting this is not a h/w problem specific to certain nodes.


I have raised this issue with the Debian kernel team[2] and have tested
various system and switch changes in an attempt to identify the cause -
but without success.

Has anyone run into similar problems with their environments? I noticed 
that the when the nodes become unresponsive, it often happens when the 
TeraSort is at


map 100%, reduce 78%

Is there any significance to that?

Any feedback welcome (including comments on what distro/kernel 
combinations others are using).


Thanks,

-stephen

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556030
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572201

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com