[jira] [Commented] (MESOS-1554) Persistent resources support for storage-like services
[ https://issues.apache.org/jira/browse/MESOS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289835#comment-15289835 ] Adam B commented on MESOS-1554: --- [~mcypark], [~jieyu], what's left before we can say that "Persistent Volumes" has shipped? Can we move the unresolved tasks from this JIRA into a Persistent Volumes v2 Epic, so we can close this one out? > Persistent resources support for storage-like services > -- > > Key: MESOS-1554 > URL: https://issues.apache.org/jira/browse/MESOS-1554 > Project: Mesos > Issue Type: Epic > Components: general, hadoop >Reporter: Nikita Vetoshkin >Assignee: Michael Park >Priority: Critical > Labels: mesosphere, twitter > > This question came up in [dev mailing > list|http://mail-archives.apache.org/mod_mbox/mesos-dev/201406.mbox/%3CCAK8jAgNDs9Fe011Sq1jeNr0h%3DE-tDD9rak6hAsap3PqHx1y%3DKQ%40mail.gmail.com%3E]. > It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use > Mesos to manage it's instances. But right now if we'd like to restart > instance (e.g. to spin up a new version) - all previous instance version > sandbox filesystem resources will be recycled by slave's garbage collector. > At the moment filesystem resources can be managed out of band - i.e. > instances can save their data in some database specific placed, that various > instances can share (e.g. {{/var/lib/cassandra}}). > [~benjaminhindman] suggested an idea in the mailing list (though it still > needs some fleshing out): > {quote} > The idea originally came about because, even today, if we allocate some > file system space to a task/executor, and then that task/executor > terminates, we haven't officially "freed" those file system resources until > after we garbage collect the task/executor sandbox! (We keep the sandbox > around so a user/operator can get the stdout/stderr or anything else left > around from their task/executor.) > To solve this problem we wanted to be able to let a task/executor terminate > but not *give up* all of it's resources, hence: persistent resources. > Pushing this concept even further you could imagine always reallocating > resources to a framework that had already been allocated those resources > for a previous task/executor. Looked at from another perspective, these are > "late-binding", or "lazy", resource reservations. > At one point in time we had considered just doing 'right-of-first-refusal' > for allocations after a task/executor terminate. But this is really > insufficient for supporting storage-like frameworks well (and likely even > harder to reliably implement then 'persistent resources' IMHO). > There are a ton of things that need to get worked out in this model, > including (but not limited to), how should a file system (or disk) be > exposed in order to be made persistent? How should persistent resources be > returned to a master? How many persistent resources can a framework get > allocated? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1554) Persistent resources support for storage-like services
[ https://issues.apache.org/jira/browse/MESOS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542968#comment-14542968 ] Adam B commented on MESOS-1554: --- This Epic/feature is critical for stateful frameworks in Mesos 0.23 and beyond. Upgraded Priority to Critical. Persistent resources support for storage-like services -- Key: MESOS-1554 URL: https://issues.apache.org/jira/browse/MESOS-1554 Project: Mesos Issue Type: Epic Components: general, hadoop Reporter: Nikita Vetoshkin Priority: Critical Labels: twitter This question came up in [dev mailing list|http://mail-archives.apache.org/mod_mbox/mesos-dev/201406.mbox/%3CCAK8jAgNDs9Fe011Sq1jeNr0h%3DE-tDD9rak6hAsap3PqHx1y%3DKQ%40mail.gmail.com%3E]. It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector. At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. {{/var/lib/cassandra}}). [~benjaminhindman] suggested an idea in the mailing list (though it still needs some fleshing out): {quote} The idea originally came about because, even today, if we allocate some file system space to a task/executor, and then that task/executor terminates, we haven't officially freed those file system resources until after we garbage collect the task/executor sandbox! (We keep the sandbox around so a user/operator can get the stdout/stderr or anything else left around from their task/executor.) To solve this problem we wanted to be able to let a task/executor terminate but not *give up* all of it's resources, hence: persistent resources. Pushing this concept even further you could imagine always reallocating resources to a framework that had already been allocated those resources for a previous task/executor. Looked at from another perspective, these are late-binding, or lazy, resource reservations. At one point in time we had considered just doing 'right-of-first-refusal' for allocations after a task/executor terminate. But this is really insufficient for supporting storage-like frameworks well (and likely even harder to reliably implement then 'persistent resources' IMHO). There are a ton of things that need to get worked out in this model, including (but not limited to), how should a file system (or disk) be exposed in order to be made persistent? How should persistent resources be returned to a master? How many persistent resources can a framework get allocated? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1554) Persistent resources support for storage-like services
[ https://issues.apache.org/jira/browse/MESOS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120041#comment-14120041 ] Steven Schlansker commented on MESOS-1554: -- It would be nice to be able to manage e.g. Amazon EBS (or generic SAN) volumes in this way. That would be very powerful indeed. Persistent resources support for storage-like services -- Key: MESOS-1554 URL: https://issues.apache.org/jira/browse/MESOS-1554 Project: Mesos Issue Type: Epic Components: general, hadoop Reporter: Nikita Vetoshkin Priority: Minor This question came up in [dev mailing list|http://mail-archives.apache.org/mod_mbox/mesos-dev/201406.mbox/%3CCAK8jAgNDs9Fe011Sq1jeNr0h%3DE-tDD9rak6hAsap3PqHx1y%3DKQ%40mail.gmail.com%3E]. It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector. At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. {{/var/lib/cassandra}}). [~benjaminhindman] suggested an idea in the mailing list (though it still needs some fleshing out): {quote} The idea originally came about because, even today, if we allocate some file system space to a task/executor, and then that task/executor terminates, we haven't officially freed those file system resources until after we garbage collect the task/executor sandbox! (We keep the sandbox around so a user/operator can get the stdout/stderr or anything else left around from their task/executor.) To solve this problem we wanted to be able to let a task/executor terminate but not *give up* all of it's resources, hence: persistent resources. Pushing this concept even further you could imagine always reallocating resources to a framework that had already been allocated those resources for a previous task/executor. Looked at from another perspective, these are late-binding, or lazy, resource reservations. At one point in time we had considered just doing 'right-of-first-refusal' for allocations after a task/executor terminate. But this is really insufficient for supporting storage-like frameworks well (and likely even harder to reliably implement then 'persistent resources' IMHO). There are a ton of things that need to get worked out in this model, including (but not limited to), how should a file system (or disk) be exposed in order to be made persistent? How should persistent resources be returned to a master? How many persistent resources can a framework get allocated? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1554) Persistent resources support for storage-like services
[ https://issues.apache.org/jira/browse/MESOS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065785#comment-14065785 ] Wolfram Arnold commented on MESOS-1554: --- I'd love to see this feature for running Elasticsearch instances on mesos. Persistent resources support for storage-like services -- Key: MESOS-1554 URL: https://issues.apache.org/jira/browse/MESOS-1554 Project: Mesos Issue Type: Epic Components: general, hadoop Reporter: Nikita Vetoshkin Priority: Minor This question came up in [dev mailing list|http://mail-archives.apache.org/mod_mbox/mesos-dev/201406.mbox/%3CCAK8jAgNDs9Fe011Sq1jeNr0h%3DE-tDD9rak6hAsap3PqHx1y%3DKQ%40mail.gmail.com%3E]. It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector. At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. {{/var/lib/cassandra}}). [~benjaminhindman] suggested an idea in the mailing list (though it still needs some fleshing out): {quote} The idea originally came about because, even today, if we allocate some file system space to a task/executor, and then that task/executor terminates, we haven't officially freed those file system resources until after we garbage collect the task/executor sandbox! (We keep the sandbox around so a user/operator can get the stdout/stderr or anything else left around from their task/executor.) To solve this problem we wanted to be able to let a task/executor terminate but not *give up* all of it's resources, hence: persistent resources. Pushing this concept even further you could imagine always reallocating resources to a framework that had already been allocated those resources for a previous task/executor. Looked at from another perspective, these are late-binding, or lazy, resource reservations. At one point in time we had considered just doing 'right-of-first-refusal' for allocations after a task/executor terminate. But this is really insufficient for supporting storage-like frameworks well (and likely even harder to reliably implement then 'persistent resources' IMHO). There are a ton of things that need to get worked out in this model, including (but not limited to), how should a file system (or disk) be exposed in order to be made persistent? How should persistent resources be returned to a master? How many persistent resources can a framework get allocated? {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)