RE: AW: Feature request: move in-flight containers w/o stopping them
If this is of any use to anyone: There is also an outstanding branch of Docker which has checkpoint/restore functionality in it (based on CRIU I believe) which is hopefully being merged into experimental soon. From: Sharma Podila [spod...@netflix.com] Sent: 19 February 2016 14:49 To: user@mesos.apache.org Subject: Re: AW: Feature request: move in-flight containers w/o stopping them Moving stateless services can be trivial or a non problem, as others have suggested. Migrating state full services becomes a function of migrating the state, including any network conx, etc. To think aloud, from a bit of past considerations in hpc like systems, some systems relied upon the underlying systems to support migration (vMotion, etc.), to 3rd party libraries (was that Meiosys) that could work on existing application binaries, to libraries (BLCR<http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>) that need support from application developer. I was involved with providing support for BLCR based applications. One of the challenges was the time to checkpoint an application with large memory footprint, say, 100 GB or more, which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at least at that point. Regardless, Mesos' support for checkpoint-restore would have to consider the type of checkpoint-restore being used. I would imagine that the core part of the solution would be simple'ish, in providing a "workflow" for the checkpoint-restore system (sort of send signal to start checkpoint, wait certain time to complete or timeout). Relatively less simple would be the actual integration of the checkpoint-restore system and dealing with its constraints and idiosyncrasies. On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies mailto:d...@hellooperator.net>> wrote: Agreed, vMotion always struck me as something for those monolithic apps with a lot of local state. The industry seems to be moving away from that as fast as its little legs will carry it. On 19 February 2016 at 11:35, Jason Giedymin mailto:jason.giedy...@gmail.com>> wrote: > Food for thought: > > One should refrain from monolithic apps. If they're small and stateless you > should be doing rolling upgrades. > > If you find yourself with one container and you can't easily distribute that > work load by just scaling and load balancing then you have a monolith. Time > to enhance it. > > Containers should not be treated like VMs. > > -Jason > > On Feb 19, 2016, at 6:05 AM, Mike Michel > mailto:mike.mic...@mmbash.de>> wrote: > > Question is if you really need this when you are moving in the world of > containers/microservices where it is about building stateless 12factor apps > except databases. Why moving a service when you can just kill it and let the > work be done by 10 other containers doing the same? I remember a talk on > dockercon about containers and live migration. It was like: „And now where > you know how to do it, dont’t do it!“ > > > > Von: Avinash Sridharan > [mailto:avin...@mesosphere.io<mailto:avin...@mesosphere.io>] > Gesendet: Freitag, 19. Februar 2016 05:48 > An: user@mesos.apache.org<mailto:user@mesos.apache.org> > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > > > One problem with implementing something like vMotion for Mesos is to address > seamless movement of network connectivity as well. This effectively requires > moving the IP address of the container across hosts. If the container shares > host network stack, this won't be possible since this would imply moving the > host IP address from one host to another. When a container has its network > namespace, attached to the host, using a bridge, moving across L2 segments > might be a possibility. To move across L3 segments you will need some form > of overlay (VxLAN maybe ?) . > > > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor > mailto:outtat...@gmail.com>> wrote: > > Is this theoretically feasible with Linux checkpoint and restore, perhaps > via CRIU?http://criu.org/Main_Page > > > On Feb 18, 2016, at 4:35 AM, Paul Bell > mailto:arach...@gmail.com>> wrote: > > Hello All, > > > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > > > I suppose something like this could be useful from a load-balancing > perspective. > > > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > > > Thanks. > > > > -Paul > > > > > > > > > > -- > > Avinash Sridharan, Mesosphere > > +1 (323) 702 5245
RE: AW: Feature request: move in-flight containers w/o stopping them
Would you be able to elaborate a bit more on how you did this? From: Mauricio Garavaglia [mauri...@medallia.com] Sent: 19 February 2016 19:20 To: user@mesos.apache.org Subject: Re: AW: Feature request: move in-flight containers w/o stopping them Mesos is not only about running stateless microservices to handle http requests. There are long duration workloads that would benefit from being rescheduled to a different host and not being interrupted; i.e. to implement dynamic bin packing in the cluster. The networking issues has been proved through CRIU that is possible even at the socket level. Regarding IP moving around, Project Calico<https://www.projectcalico.org/> offers a way to do that; We tried with a homemade modifications to do it using docker and OSPF and it works very well. On Fri, Feb 19, 2016 at 11:49 AM, Sharma Podila mailto:spod...@netflix.com>> wrote: Moving stateless services can be trivial or a non problem, as others have suggested. Migrating state full services becomes a function of migrating the state, including any network conx, etc. To think aloud, from a bit of past considerations in hpc like systems, some systems relied upon the underlying systems to support migration (vMotion, etc.), to 3rd party libraries (was that Meiosys) that could work on existing application binaries, to libraries (BLCR<http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>) that need support from application developer. I was involved with providing support for BLCR based applications. One of the challenges was the time to checkpoint an application with large memory footprint, say, 100 GB or more, which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at least at that point. Regardless, Mesos' support for checkpoint-restore would have to consider the type of checkpoint-restore being used. I would imagine that the core part of the solution would be simple'ish, in providing a "workflow" for the checkpoint-restore system (sort of send signal to start checkpoint, wait certain time to complete or timeout). Relatively less simple would be the actual integration of the checkpoint-restore system and dealing with its constraints and idiosyncrasies. On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies mailto:d...@hellooperator.net>> wrote: Agreed, vMotion always struck me as something for those monolithic apps with a lot of local state. The industry seems to be moving away from that as fast as its little legs will carry it. On 19 February 2016 at 11:35, Jason Giedymin mailto:jason.giedy...@gmail.com>> wrote: > Food for thought: > > One should refrain from monolithic apps. If they're small and stateless you > should be doing rolling upgrades. > > If you find yourself with one container and you can't easily distribute that > work load by just scaling and load balancing then you have a monolith. Time > to enhance it. > > Containers should not be treated like VMs. > > -Jason > > On Feb 19, 2016, at 6:05 AM, Mike Michel > mailto:mike.mic...@mmbash.de>> wrote: > > Question is if you really need this when you are moving in the world of > containers/microservices where it is about building stateless 12factor apps > except databases. Why moving a service when you can just kill it and let the > work be done by 10 other containers doing the same? I remember a talk on > dockercon about containers and live migration. It was like: „And now where > you know how to do it, dont’t do it!“ > > > > Von: Avinash Sridharan > [mailto:avin...@mesosphere.io<mailto:avin...@mesosphere.io>] > Gesendet: Freitag, 19. Februar 2016 05:48 > An: user@mesos.apache.org<mailto:user@mesos.apache.org> > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > > > One problem with implementing something like vMotion for Mesos is to address > seamless movement of network connectivity as well. This effectively requires > moving the IP address of the container across hosts. If the container shares > host network stack, this won't be possible since this would imply moving the > host IP address from one host to another. When a container has its network > namespace, attached to the host, using a bridge, moving across L2 segments > might be a possibility. To move across L3 segments you will need some form > of overlay (VxLAN maybe ?) . > > > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor > mailto:outtat...@gmail.com>> wrote: > > Is this theoretically feasible with Linux checkpoint and restore, perhaps > via CRIU?http://criu.org/Main_Page > > > On Feb 18, 2016, at 4:35 AM, Paul Bell > mailto:arach...@gmail.com>> wrote: > > Hello All, > > > > Has there ever been any considerati
Re: AW: Feature request: move in-flight containers w/o stopping them
Mesos is not only about running stateless microservices to handle http requests. There are long duration workloads that would benefit from being rescheduled to a different host and not being interrupted; i.e. to implement dynamic bin packing in the cluster. The networking issues has been proved through CRIU that is possible even at the socket level. Regarding IP moving around, Project Calico <https://www.projectcalico.org/> offers a way to do that; We tried with a homemade modifications to do it using docker and OSPF and it works very well. On Fri, Feb 19, 2016 at 11:49 AM, Sharma Podila wrote: > Moving stateless services can be trivial or a non problem, as others have > suggested. > Migrating state full services becomes a function of migrating the state, > including any network conx, etc. To think aloud, from a bit of past > considerations in hpc like systems, some systems relied upon the underlying > systems to support migration (vMotion, etc.), to 3rd party libraries (was > that Meiosys) that could work on existing application binaries, to > libraries (BLCR > <http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>) > that need support from application developer. I was involved with providing > support for BLCR based applications. One of the challenges was the time to > checkpoint an application with large memory footprint, say, 100 GB or more, > which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at > least at that point. > Regardless, Mesos' support for checkpoint-restore would have to consider > the type of checkpoint-restore being used. I would imagine that the core > part of the solution would be simple'ish, in providing a "workflow" for the > checkpoint-restore system (sort of send signal to start checkpoint, wait > certain time to complete or timeout). Relatively less simple would be the > actual integration of the checkpoint-restore system and dealing with its > constraints and idiosyncrasies. > > > On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies > wrote: > >> Agreed, vMotion always struck me as something for those monolithic >> apps with a lot of local state. >> >> The industry seems to be moving away from that as fast as its little >> legs will carry it. >> >> On 19 February 2016 at 11:35, Jason Giedymin >> wrote: >> > Food for thought: >> > >> > One should refrain from monolithic apps. If they're small and stateless >> you >> > should be doing rolling upgrades. >> > >> > If you find yourself with one container and you can't easily distribute >> that >> > work load by just scaling and load balancing then you have a monolith. >> Time >> > to enhance it. >> > >> > Containers should not be treated like VMs. >> > >> > -Jason >> > >> > On Feb 19, 2016, at 6:05 AM, Mike Michel wrote: >> > >> > Question is if you really need this when you are moving in the world of >> > containers/microservices where it is about building stateless 12factor >> apps >> > except databases. Why moving a service when you can just kill it and >> let the >> > work be done by 10 other containers doing the same? I remember a talk on >> > dockercon about containers and live migration. It was like: „And now >> where >> > you know how to do it, dont’t do it!“ >> > >> > >> > >> > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] >> > Gesendet: Freitag, 19. Februar 2016 05:48 >> > An: user@mesos.apache.org >> > Betreff: Re: Feature request: move in-flight containers w/o stopping >> them >> > >> > >> > >> > One problem with implementing something like vMotion for Mesos is to >> address >> > seamless movement of network connectivity as well. This effectively >> requires >> > moving the IP address of the container across hosts. If the container >> shares >> > host network stack, this won't be possible since this would imply >> moving the >> > host IP address from one host to another. When a container has its >> network >> > namespace, attached to the host, using a bridge, moving across L2 >> segments >> > might be a possibility. To move across L3 segments you will need some >> form >> > of overlay (VxLAN maybe ?) . >> > >> > >> > >> > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor >> wrote: >> > >> > Is this theoretically feasible with Linux checkpoint and restore, >&g
Re: AW: Feature request: move in-flight containers w/o stopping them
Moving stateless services can be trivial or a non problem, as others have suggested. Migrating state full services becomes a function of migrating the state, including any network conx, etc. To think aloud, from a bit of past considerations in hpc like systems, some systems relied upon the underlying systems to support migration (vMotion, etc.), to 3rd party libraries (was that Meiosys) that could work on existing application binaries, to libraries (BLCR <http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>) that need support from application developer. I was involved with providing support for BLCR based applications. One of the challenges was the time to checkpoint an application with large memory footprint, say, 100 GB or more, which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at least at that point. Regardless, Mesos' support for checkpoint-restore would have to consider the type of checkpoint-restore being used. I would imagine that the core part of the solution would be simple'ish, in providing a "workflow" for the checkpoint-restore system (sort of send signal to start checkpoint, wait certain time to complete or timeout). Relatively less simple would be the actual integration of the checkpoint-restore system and dealing with its constraints and idiosyncrasies. On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies wrote: > Agreed, vMotion always struck me as something for those monolithic > apps with a lot of local state. > > The industry seems to be moving away from that as fast as its little > legs will carry it. > > On 19 February 2016 at 11:35, Jason Giedymin > wrote: > > Food for thought: > > > > One should refrain from monolithic apps. If they're small and stateless > you > > should be doing rolling upgrades. > > > > If you find yourself with one container and you can't easily distribute > that > > work load by just scaling and load balancing then you have a monolith. > Time > > to enhance it. > > > > Containers should not be treated like VMs. > > > > -Jason > > > > On Feb 19, 2016, at 6:05 AM, Mike Michel wrote: > > > > Question is if you really need this when you are moving in the world of > > containers/microservices where it is about building stateless 12factor > apps > > except databases. Why moving a service when you can just kill it and let > the > > work be done by 10 other containers doing the same? I remember a talk on > > dockercon about containers and live migration. It was like: „And now > where > > you know how to do it, dont’t do it!“ > > > > > > > > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] > > Gesendet: Freitag, 19. Februar 2016 05:48 > > An: user@mesos.apache.org > > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > > > > > > > One problem with implementing something like vMotion for Mesos is to > address > > seamless movement of network connectivity as well. This effectively > requires > > moving the IP address of the container across hosts. If the container > shares > > host network stack, this won't be possible since this would imply moving > the > > host IP address from one host to another. When a container has its > network > > namespace, attached to the host, using a bridge, moving across L2 > segments > > might be a possibility. To move across L3 segments you will need some > form > > of overlay (VxLAN maybe ?) . > > > > > > > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor wrote: > > > > Is this theoretically feasible with Linux checkpoint and restore, perhaps > > via CRIU?http://criu.org/Main_Page > > > > > > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > > > Hello All, > > > > > > > > Has there ever been any consideration of the ability to move in-flight > > containers from one Mesos host node to another? > > > > > > > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > > moved from one ESXi host to another. > > > > > > > > I suppose something like this could be useful from a load-balancing > > perspective. > > > > > > > > Just curious if it's ever been considered and if so - and rejected - why > > rejected? > > > > > > > > Thanks. > > > > > > > > -Paul > > > > > > > > > > > > > > > > > > > > -- > > > > Avinash Sridharan, Mesosphere > > > > +1 (323) 702 5245 >
Re: AW: Feature request: move in-flight containers w/o stopping them
Agreed, vMotion always struck me as something for those monolithic apps with a lot of local state. The industry seems to be moving away from that as fast as its little legs will carry it. On 19 February 2016 at 11:35, Jason Giedymin wrote: > Food for thought: > > One should refrain from monolithic apps. If they're small and stateless you > should be doing rolling upgrades. > > If you find yourself with one container and you can't easily distribute that > work load by just scaling and load balancing then you have a monolith. Time > to enhance it. > > Containers should not be treated like VMs. > > -Jason > > On Feb 19, 2016, at 6:05 AM, Mike Michel wrote: > > Question is if you really need this when you are moving in the world of > containers/microservices where it is about building stateless 12factor apps > except databases. Why moving a service when you can just kill it and let the > work be done by 10 other containers doing the same? I remember a talk on > dockercon about containers and live migration. It was like: „And now where > you know how to do it, dont’t do it!“ > > > > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] > Gesendet: Freitag, 19. Februar 2016 05:48 > An: user@mesos.apache.org > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > > > One problem with implementing something like vMotion for Mesos is to address > seamless movement of network connectivity as well. This effectively requires > moving the IP address of the container across hosts. If the container shares > host network stack, this won't be possible since this would imply moving the > host IP address from one host to another. When a container has its network > namespace, attached to the host, using a bridge, moving across L2 segments > might be a possibility. To move across L3 segments you will need some form > of overlay (VxLAN maybe ?) . > > > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor wrote: > > Is this theoretically feasible with Linux checkpoint and restore, perhaps > via CRIU?http://criu.org/Main_Page > > > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > Hello All, > > > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > > > I suppose something like this could be useful from a load-balancing > perspective. > > > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > > > Thanks. > > > > -Paul > > > > > > > > > > -- > > Avinash Sridharan, Mesosphere > > +1 (323) 702 5245
Re: AW: Feature request: move in-flight containers w/o stopping them
Food for thought: One should refrain from monolithic apps. If they're small and stateless you should be doing rolling upgrades. If you find yourself with one container and you can't easily distribute that work load by just scaling and load balancing then you have a monolith. Time to enhance it. Containers should not be treated like VMs. -Jason > On Feb 19, 2016, at 6:05 AM, Mike Michel wrote: > > Question is if you really need this when you are moving in the world of > containers/microservices where it is about building stateless 12factor apps > except databases. Why moving a service when you can just kill it and let the > work be done by 10 other containers doing the same? I remember a talk on > dockercon about containers and live migration. It was like: „And now where > you know how to do it, dont’t do it!“ > > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] > Gesendet: Freitag, 19. Februar 2016 05:48 > An: user@mesos.apache.org > Betreff: Re: Feature request: move in-flight containers w/o stopping them > > One problem with implementing something like vMotion for Mesos is to address > seamless movement of network connectivity as well. This effectively requires > moving the IP address of the container across hosts. If the container shares > host network stack, this won't be possible since this would imply moving the > host IP address from one host to another. When a container has its network > namespace, attached to the host, using a bridge, moving across L2 segments > might be a possibility. To move across L3 segments you will need some form of > overlay (VxLAN maybe ?) . > > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor wrote: > Is this theoretically feasible with Linux checkpoint and restore, perhaps via > CRIU?http://criu.org/Main_Page > > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > Hello All, > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > I suppose something like this could be useful from a load-balancing > perspective. > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > Thanks. > > -Paul > > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245
AW: Feature request: move in-flight containers w/o stopping them
Question is if you really need this when you are moving in the world of containers/microservices where it is about building stateless 12factor apps except databases. Why moving a service when you can just kill it and let the work be done by 10 other containers doing the same? I remember a talk on dockercon about containers and live migration. It was like: „And now where you know how to do it, dont’t do it!“ Von: Avinash Sridharan [mailto:avin...@mesosphere.io] Gesendet: Freitag, 19. Februar 2016 05:48 An: user@mesos.apache.org Betreff: Re: Feature request: move in-flight containers w/o stopping them One problem with implementing something like vMotion for Mesos is to address seamless movement of network connectivity as well. This effectively requires moving the IP address of the container across hosts. If the container shares host network stack, this won't be possible since this would imply moving the host IP address from one host to another. When a container has its network namespace, attached to the host, using a bridge, moving across L2 segments might be a possibility. To move across L3 segments you will need some form of overlay (VxLAN maybe ?) . On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor mailto:outtat...@gmail.com> > wrote: Is this theoretically feasible with Linux checkpoint and restore, perhaps via CRIU?http://criu.org/Main_Page On Feb 18, 2016, at 4:35 AM, Paul Bell mailto:arach...@gmail.com> > wrote: Hello All, Has there ever been any consideration of the ability to move in-flight containers from one Mesos host node to another? I see this as analogous to VMware's "vMotion" facility wherein VMs can be moved from one ESXi host to another. I suppose something like this could be useful from a load-balancing perspective. Just curious if it's ever been considered and if so - and rejected - why rejected? Thanks. -Paul -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
Re: Feature request: move in-flight containers w/o stopping them
One problem with implementing something like vMotion for Mesos is to address seamless movement of network connectivity as well. This effectively requires moving the IP address of the container across hosts. If the container shares host network stack, this won't be possible since this would imply moving the host IP address from one host to another. When a container has its network namespace, attached to the host, using a bridge, moving across L2 segments might be a possibility. To move across L3 segments you will need some form of overlay (VxLAN maybe ?) . On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor wrote: > Is this theoretically feasible with Linux checkpoint and restore, perhaps > via CRIU?http://criu.org/Main_Page > > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > Hello All, > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > I suppose something like this could be useful from a load-balancing > perspective. > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > Thanks. > > -Paul > > > -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
Re: Feature request: move in-flight containers w/o stopping them
Is this theoretically feasible with Linux checkpoint and restore, perhaps via CRIU?http://criu.org/Main_Page > On Feb 18, 2016, at 4:35 AM, Paul Bell wrote: > > Hello All, > > Has there ever been any consideration of the ability to move in-flight > containers from one Mesos host node to another? > > I see this as analogous to VMware's "vMotion" facility wherein VMs can be > moved from one ESXi host to another. > > I suppose something like this could be useful from a load-balancing > perspective. > > Just curious if it's ever been considered and if so - and rejected - why > rejected? > > Thanks. > > -Paul > >
Feature request: move in-flight containers w/o stopping them
Hello All, Has there ever been any consideration of the ability to move in-flight containers from one Mesos host node to another? I see this as analogous to VMware's "vMotion" facility wherein VMs can be moved from one ESXi host to another. I suppose something like this could be useful from a load-balancing perspective. Just curious if it's ever been considered and if so - and rejected - why rejected? Thanks. -Paul