On Mon, 2009-02-02 at 20:06 -0800, jason hadoop wrote: > This can be made significantly worse by your underlying host file > system and the disks that support it.
Oh, yes, we know... It was a late-realized mistake just yesterday that we weren't using noatime on that cluster's slaves. The attached graph is instructive. We have our nightly-rotated logs for DataNode all the way back to when this test cluster was created in November. This morning on one node, I sampled the first 10 BlockReport scan lines from each day's log, up through the current hour today, and handed it to gnuplot to graph. The seriously erratic behavior that begins around the 900K-1M point is very disturbing. Immediate solutions for us include noatime, nodiratime, BIOS upgrade on the discs, and eliminating enough small files (blocks) in DFS to get the total count below 400K.