On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy <stephen.mulc...@deri.org>wrote:
> Todd Lipcon wrote: > >> Most likely a kernel bug. In previous versions of Debian there was a buggy >> forcedeth driver, for example, that caused it to drop off the network in >> high load. Who knows what new bug is in 2.6.32 which is brand spanking >> new. >> > > Yes, it looks like it is a kernel bug alright (see thread on kernel netdev > at http://marc.info/?t=127094288900001&r=1&w=2 if interested). To be fair, > I don't think these bugs are confined to Debian - I did some initial testing > with Scientific Linux and also ran into problems with forcedeth. Interesting, good find. I try to avoid forcedeth now and have heard the same from ops people at various large linux deployments. Not sure why, but it's traditionally had a lot of bugs/regressions. > Sure, but I figured I'd go with a distro now that can be largely left > untouched for the next 2-3 years and Debian lenny felt that bit old for > that. I know RHEL/CentOS would fit that requirement also, will see. I'm also > interested in using DRBD in some of our nodes for redundancy, again, running > with a newer distro should reduce the pain of configuring that. > > Finally, I figured burning in our cluster was a good opportunity to give > back to the community and do some testing on their behalf. > Very admirable of you :) It is good to have some people running new kernels to suss these issues out before the rest of us check out modern technology ;-) > > With regard to our TeraSort benchmark time of ~23 minutes - is that in the > right ballpark for a cluster of 45 data nodes and a nn and 2nn? > > Yep, sounds about the right ballpark. -Todd -- Todd Lipcon Software Engineer, Cloudera