The easiest way would be to not use anything but your reliable machines
as datanodes. Alternately, for better performance, you could run two
DFS systems, one on all machines, and one on just the reliable machines,
and back one up to the other before you shutdown the "unreliable" nodes
each night. Then, in the morning, restore things.
Long-term, we hope to add a feature that permits one to remove a number
of nodes from DFS at once, forcing all of the blocks stored on these
nodes to migrate to other nodes. But that feature has not yet been
implemented.
Doug
Mikkel Kamstrup Erlandsen wrote:
I will be running a cluster with 100-200 nodes, most of which will be
shut down at night. For the sake of example lets say that 4 'reliable
slaves' will remain turned on continuously, and let me call the rest
'unreliable slaves'.
Storage wise, how would I go about this (using HDFS)? I figure that it
would be a bad idea to put persistent data on the unreliable slaves,
since turning ~100 computers of simultaneously might wreck havoc to the
hdfs(?). So the idea would be to let persistent data only reside on
reliable slaves.
Would setting dfs.datanode.du.pct=0 on the unreliable slaves do the
trick?
Cheers,
Mikkel