The /tmp default has caught us once or twice too. Now we put the files elsewhere.

The DFS is stored in /tmp on each box. The developers who own the machines occasionally reboot and reprofile them

Wont you lose your blocks after reboot since /tmp gets cleaned up? Could this 
be the reason you see data corruption?
Good idea is to configure DFS to be any place other than /tmp
----- Original Message ----
From: Jeff Eastman <[EMAIL PROTECTED]>
Sent: Wednesday, January 16, 2008 9:32:41 AM
Subject: Platform reliability with Hadoop

I've been running Hadoop 0.14.4 and, more recently, 0.15.2 on a dozen
machines in our CUBiT array for the last month. During this time I have
experienced two major data corruption losses on relatively small
of data (<50gb) that make me wonder about the suitability of this
platform for hosting Hadoop. CUBiT is one of our products for managing
pool of development servers, allowing developers to check out machines,
install various OS profiles on them and monitor their utilization via
the web. With most machines reporting very low utilization it seemed a
natural place to run Hadoop in the background. I have an NFS-mounted
account on all of the machines and have installed Hadoop there. The DFS
is stored in /tmp on each box. The developers who own the machines
occasionally reboot and reprofile them, but this occurs infrequently
does not clobber /tmp. Hadoop is designed to deal with slave failures
this nature, though this platform may well be an acid test.

My initial cloud was configured for replication factor of 3 and I have
increased that now to 4 in hopes of improving data reliability in the
face of these more-prevalent slave outages. Ted Dunning has suggested
aggressive rebalancing in his recent posts and I have done this by
increasing replication to 5 (from 3) and then dropping it to 4. Are
there other rebalancing or configuration techniques that might improve
my data reliability? Or, is this platform just too unstable to be a
fit for Hadoop?


Reply via email to