As Maxime mentioned, the long term solution is for Mesos to support the notion of "persistent resources" i.e., resources that stay (and accounted for) after the life cycle of task/executor. The idea still needs fleshing out.
On Thu, Jun 26, 2014 at 8:23 AM, Vetoshkin Nikita < [email protected]> wrote: > What about long term solution? Any ideas? Twitter's Manhattan database > claims to use Mesos for scaling up and down. Can you shed some light how do > they deal with the situation like this? > On Jun 26, 2014 5:01 AM, "Vinod Kone" <[email protected]> wrote: > > > Thanks for listing this out Adam. > > > > Data Residency: > > > - Should we destroy the sandbox/hdfs-data when shutting down a DN? > > > - If starting DN on node that was previously running a DN, can/should > we > > > try to revive the existing data? > > > > > > > I think this is one of the key challenges for a production quality HDFS > on > > Mesos. Currently, since sandbox is deleted after a task exits, if all the > > data nodes that hold a block (and its replicas) get lost/killed for > > whatever reason there would be data loss. A short terms solution would be > > to write outside sandbox and use slave attributes to track where to > > re-launch data node tasks. > > >
