Thanks for listing this out Adam. Data Residency: > - Should we destroy the sandbox/hdfs-data when shutting down a DN? > - If starting DN on node that was previously running a DN, can/should we > try to revive the existing data? >
I think this is one of the key challenges for a production quality HDFS on Mesos. Currently, since sandbox is deleted after a task exits, if all the data nodes that hold a block (and its replicas) get lost/killed for whatever reason there would be data loss. A short terms solution would be to write outside sandbox and use slave attributes to track where to re-launch data node tasks.
