On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote: > The Datanode's use multiple threads with locking and one of the > assumptions is that the block report (1ce per hour by default) takes > little time. The datanode will pause while the block report is running > and if it happens to take a while weird things start to happen.
Thank you for responding, this is very informative for us. Having looked through the source code with a co-worker regarding periodic scan and then checking the logs once again, we find that we are finding reports of this sort: BlockReport of 1158499 blocks got processed in 308860 msecs BlockReport of 1159840 blocks got processed in 237925 msecs BlockReport of 1161274 blocks got processed in 177853 msecs BlockReport of 1162408 blocks got processed in 285094 msecs BlockReport of 1164194 blocks got processed in 184478 msecs BlockReport of 1165673 blocks got processed in 226401 msecs The 3rd of these exactly straddles the particular example timeline I discussed in my original email about this question. I suspect I'll find more of the same as I look through other related errors. --karl