Re: Elvis upstreaming plan
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote: > > > Stefan Hajnoczi wrote on 27/11/2013 05:00:53 PM: > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > > Hi, > > > > > > Razya is out for a few days, so I will try to answer the questions as > well > > > as I can: > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > > > From: "Michael S. Tsirkin" > > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > > Haifa/IBM@IBMIL > > > > Date: 27/11/2013 01:08 AM > > > > Subject: Re: Elvis upstreaming plan > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > 08:05:00 > > > PM: > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > enable > > > > > the management stack to configure 1 thread per I/O device (as it is > > > today) > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, > you > > > > > > now create a whole new class of DoS attacks in the best case > > > scenario. > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > > vhost thread. We are proposing to schedule multiple devices > belonging > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > If you mean "why serve multiple devices from a single thread" the > answer is > > > that we cannot rely on the Linux scheduler which has no knowledge of > I/O > > > queues to do a decent job of scheduling I/O. The idea is to take over > the > > > I/O scheduling responsibilities from the kernel's thread scheduler with > a > > > more efficient I/O scheduler inside each vhost thread. So by combining > all > > > of the I/O devices from the same guest (disks, network cards, etc) in a > > > single I/O thread, it allows us to provide better scheduling by giving > us > > > more knowledge of the nature of the work. So now instead of relying on > the > > > linux scheduler to perform context switches between multiple vhost > threads, > > > we have a single thread context in which we can do the I/O scheduling > more > > > efficiently. We can closely monitor the performance needs of each > queue of > > > each device inside the vhost thread which gives us much more > information > > > than relying on the kernel's thread scheduler. > > > > And now there are 2 performance-critical pieces that need to be > > optimized/tuned instead of just 1: > > > > 1. Kernel infrastructure that QEMU and vhost use today but you decided > > to bypass. > > We are NOT bypassing existing components. We are just changing the > threading > model: instead of having one vhost-thread per virtio device, we propose to > use > 1 vhost thread to server devices belonging to the same VM. In addition, we > propose to add new features such as polling. What I meant with "bypassing" is that reducing scope to single VMs leaves multi-VM performance unchanged. I know the original aim was to improve multi-VM performance too and I hope that will be possible by extending the current approach. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote: > Isolation is important but the question is what isolation means ? Mostly two things: - Count resource usage against the correct cgroups, and limit it as appropriate - If one user does something silly and is blocked, another user isn't affected -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Anthony Liguori wrote on 28/11/2013 12:33:36 AM: > From: Anthony Liguori > To: Abel Gordon/Haifa/IBM@IBMIL, "Michael S. Tsirkin" , > Cc: abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, > Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com, > jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL, > kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL > Date: 28/11/2013 12:33 AM > Subject: Re: Elvis upstreaming plan > > Abel Gordon writes: > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > > > >> > >> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > >> > Hi, > >> > > >> > Razya is out for a few days, so I will try to answer the questions as > > well > >> > as I can: > >> > > >> > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > >> > > >> > > From: "Michael S. Tsirkin" > >> > > To: Abel Gordon/Haifa/IBM@IBMIL, > >> > > Cc: Anthony Liguori , abel.gor...@gmail.com, > >> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > >> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > >> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > >> > > Haifa/IBM@IBMIL > >> > > Date: 27/11/2013 01:08 AM > >> > > Subject: Re: Elvis upstreaming plan > >> > > > >> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > >> > > > > >> > > > > >> > > > Anthony Liguori wrote on 26/11/2013 > > 08:05:00 > >> > PM: > >> > > > > >> > > > > > >> > > > > Razya Ladelsky writes: > >> > > > > > >> > > >> > > > > >> > > > That's why we are proposing to implement a mechanism that will > > enable > >> > > > the management stack to configure 1 thread per I/O device (as it is > >> > today) > >> > > > or 1 thread for many I/O devices (belonging to the same VM). > >> > > > > >> > > > > Once you are scheduling multiple guests in a single vhost device, > > you > >> > > > > now create a whole new class of DoS attacks in the best case > >> > scenario. > >> > > > > >> > > > Again, we are NOT proposing to schedule multiple guests in a single > >> > > > vhost thread. We are proposing to schedule multiple devices > > belonging > >> > > > to the same guest in a single (or multiple) vhost thread/s. > >> > > > > >> > > > >> > > I guess a question then becomes why have multiple devices? > >> > > >> > If you mean "why serve multiple devices from a single thread" the > > answer is > >> > that we cannot rely on the Linux scheduler which has no knowledge of > > I/O > >> > queues to do a decent job of scheduling I/O. The idea is to take over > > the > >> > I/O scheduling responsibilities from the kernel's thread scheduler with > > a > >> > more efficient I/O scheduler inside each vhost thread. So by combining > > all > >> > of the I/O devices from the same guest (disks, network cards, etc) in a > >> > single I/O thread, it allows us to provide better scheduling by giving > > us > >> > more knowledge of the nature of the work. So now instead of relying on > > the > >> > linux scheduler to perform context switches between multiple vhost > > threads, > >> > we have a single thread context in which we can do the I/O scheduling > > more > >> > efficiently. We can closely monitor the performance needs of each > > queue of > >> > each device inside the vhost thread which gives us much more > > information > >> > than relying on the kernel's thread scheduler. > >> > This does not expose any additional opportunities for attacks (DoS or > >> > other) than are already available since all of the I/O traffic belongs > > to a > >> > single guest. > >> > You can make the argument that with low I/O loads this mechanism may > > not > >> > make much difference. However when you try to maximize the utilization > > of > >> > your hardware (such as in a commercial scenario) this technique can >
Re: Elvis upstreaming plan
Stefan Hajnoczi wrote on 27/11/2013 05:00:53 PM: > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > Hi, > > > > Razya is out for a few days, so I will try to answer the questions as well > > as I can: > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > From: "Michael S. Tsirkin" > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > Haifa/IBM@IBMIL > > > Date: 27/11/2013 01:08 AM > > > Subject: Re: Elvis upstreaming plan > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > > PM: > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > > the management stack to configure 1 thread per I/O device (as it is > > today) > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > > now create a whole new class of DoS attacks in the best case > > scenario. > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > vhost thread. We are proposing to schedule multiple devices belonging > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > If you mean "why serve multiple devices from a single thread" the answer is > > that we cannot rely on the Linux scheduler which has no knowledge of I/O > > queues to do a decent job of scheduling I/O. The idea is to take over the > > I/O scheduling responsibilities from the kernel's thread scheduler with a > > more efficient I/O scheduler inside each vhost thread. So by combining all > > of the I/O devices from the same guest (disks, network cards, etc) in a > > single I/O thread, it allows us to provide better scheduling by giving us > > more knowledge of the nature of the work. So now instead of relying on the > > linux scheduler to perform context switches between multiple vhost threads, > > we have a single thread context in which we can do the I/O scheduling more > > efficiently. We can closely monitor the performance needs of each queue of > > each device inside the vhost thread which gives us much more information > > than relying on the kernel's thread scheduler. > > And now there are 2 performance-critical pieces that need to be > optimized/tuned instead of just 1: > > 1. Kernel infrastructure that QEMU and vhost use today but you decided > to bypass. We are NOT bypassing existing components. We are just changing the threading model: instead of having one vhost-thread per virtio device, we propose to use 1 vhost thread to server devices belonging to the same VM. In addition, we propose to add new features such as polling. > 2. The new ELVIS code which only affects vhost devices in the same VM. Also existent vhost code (or any other user-space back-end) should be optimized/tuned if you care about performance. > > If you split the code paths it results in more effort in the long run > and the benefit seems quite limited once you acknowledge that isolation > is important. Isolation is important but the question is what isolation means ? I personally don't believe that 2 kernel threads provide more isolation than 1 kernel threat that changes the mm (use_mm) and avoids queue starvation. Anyway, we propose to start with the simple approach (not sharing threads across VMs) but once we show the value for this case we can discuss if it makes sense to extend the approach and share threads between different VMs. > Isn't the sane thing to do taking lessons from ELVIS improving existing > pieces instead of bypassing them? That way both the single VM and > host-wide performance improves. And as a bonus non-virtualization use > cases may also benefit. The model we are proposing are specific for I/O virtualization... not sure if they are applicable to bare-metal. > > Stefan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Stefan Hajnoczi wrote on 27/11/2013 05:00:53 PM: > From: Stefan Hajnoczi > To: Joel Nider/Haifa/IBM@IBMIL, > Cc: "Michael S. Tsirkin" , Abel Gordon/Haifa/ > IBM@IBMIL, abel.gor...@gmail.com, Anthony Liguori > , as...@redhat.com, digitale...@google.com, > Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com, > jasow...@redhat.com, kvm@vger.kernel.org, pbonz...@redhat.com, Razya > Ladelsky/Haifa/IBM@IBMIL > Date: 27/11/2013 05:00 PM > Subject: Re: Elvis upstreaming plan > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > Hi, > > > > Razya is out for a few days, so I will try to answer the questions as well > > as I can: > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > From: "Michael S. Tsirkin" > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > Haifa/IBM@IBMIL > > > Date: 27/11/2013 01:08 AM > > > Subject: Re: Elvis upstreaming plan > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > > PM: > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > > the management stack to configure 1 thread per I/O device (as it is > > today) > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > > now create a whole new class of DoS attacks in the best case > > scenario. > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > vhost thread. We are proposing to schedule multiple devices belonging > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > If you mean "why serve multiple devices from a single thread" the answer is > > that we cannot rely on the Linux scheduler which has no knowledge of I/O > > queues to do a decent job of scheduling I/O. The idea is to take over the > > I/O scheduling responsibilities from the kernel's thread scheduler with a > > more efficient I/O scheduler inside each vhost thread. So by combining all > > of the I/O devices from the same guest (disks, network cards, etc) in a > > single I/O thread, it allows us to provide better scheduling by giving us > > more knowledge of the nature of the work. So now instead of relying on the > > linux scheduler to perform context switches between multiple vhost threads, > > we have a single thread context in which we can do the I/O scheduling more > > efficiently. We can closely monitor the performance needs of each queue of > > each device inside the vhost thread which gives us much more information > > than relying on the kernel's thread scheduler. > > And now there are 2 performance-critical pieces that need to be > optimized/tuned instead of just 1: > > 1. Kernel infrastructure that QEMU and vhost use today but you decided > to bypass. > 2. The new ELVIS code which only affects vhost devices in the same VM. > > If you split the code paths it results in more effort in the long run > and the benefit seems quite limited once you acknowledge that isolation > is important. Yes you are correct that there are now 2 performance-critical pieces of code. However what we are proposing is just proper module decoupling. I believe you will be hard pressed to make a good case that all of this logic could be integrated into the Linux thread scheduler more efficiently. Think of this as an I/O scheduler for virtualized guests. I don't believe anyone would try to integrate the Linux I/O schedulers with the Linux thread scheduler, even though they are both performance-critical modules? Even if we were to take the route of using these principles to improve the existing scheduler, I have to ask: which scheduler? If we spend this effort on CFS (completely fair scheduler) but then someone switches their thread scheduler to O(1) or some other scheduler, all of our advantage would be lost. We would then have to
Re: Elvis upstreaming plan
Abel Gordon writes: > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > >> >> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: >> > Hi, >> > >> > Razya is out for a few days, so I will try to answer the questions as > well >> > as I can: >> > >> > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: >> > >> > > From: "Michael S. Tsirkin" >> > > To: Abel Gordon/Haifa/IBM@IBMIL, >> > > Cc: Anthony Liguori , abel.gor...@gmail.com, >> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ >> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ >> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ >> > > Haifa/IBM@IBMIL >> > > Date: 27/11/2013 01:08 AM >> > > Subject: Re: Elvis upstreaming plan >> > > >> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: >> > > > >> > > > >> > > > Anthony Liguori wrote on 26/11/2013 > 08:05:00 >> > PM: >> > > > >> > > > > >> > > > > Razya Ladelsky writes: >> > > > > >> > >> > > > >> > > > That's why we are proposing to implement a mechanism that will > enable >> > > > the management stack to configure 1 thread per I/O device (as it is >> > today) >> > > > or 1 thread for many I/O devices (belonging to the same VM). >> > > > >> > > > > Once you are scheduling multiple guests in a single vhost device, > you >> > > > > now create a whole new class of DoS attacks in the best case >> > scenario. >> > > > >> > > > Again, we are NOT proposing to schedule multiple guests in a single >> > > > vhost thread. We are proposing to schedule multiple devices > belonging >> > > > to the same guest in a single (or multiple) vhost thread/s. >> > > > >> > > >> > > I guess a question then becomes why have multiple devices? >> > >> > If you mean "why serve multiple devices from a single thread" the > answer is >> > that we cannot rely on the Linux scheduler which has no knowledge of > I/O >> > queues to do a decent job of scheduling I/O. The idea is to take over > the >> > I/O scheduling responsibilities from the kernel's thread scheduler with > a >> > more efficient I/O scheduler inside each vhost thread. So by combining > all >> > of the I/O devices from the same guest (disks, network cards, etc) in a >> > single I/O thread, it allows us to provide better scheduling by giving > us >> > more knowledge of the nature of the work. So now instead of relying on > the >> > linux scheduler to perform context switches between multiple vhost > threads, >> > we have a single thread context in which we can do the I/O scheduling > more >> > efficiently. We can closely monitor the performance needs of each > queue of >> > each device inside the vhost thread which gives us much more > information >> > than relying on the kernel's thread scheduler. >> > This does not expose any additional opportunities for attacks (DoS or >> > other) than are already available since all of the I/O traffic belongs > to a >> > single guest. >> > You can make the argument that with low I/O loads this mechanism may > not >> > make much difference. However when you try to maximize the utilization > of >> > your hardware (such as in a commercial scenario) this technique can > gain >> > you a large benefit. >> > >> > Regards, >> > >> > Joel Nider >> > Virtualization Research >> > IBM Research and Development >> > Haifa Research Lab >> >> So all this would sound more convincing if we had sharing between VMs. >> When it's only a single VM it's somehow less convincing, isn't it? >> Of course if we would bypass a scheduler like this it becomes harder to >> enforce cgroup limits. > > True, but here the issue becomes isolation/cgroups. We can start to show > the value for VMs that have multiple devices / queues and then we could > re-consider extending the mechanism for multiple VMs (at least as a > experimental feature). > >> But it might be easier to give scheduler the info it needs to do what we >> need. Would an API that basically says "r
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 04:00:53PM +0100, Stefan Hajnoczi wrote: > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > Hi, > > > > Razya is out for a few days, so I will try to answer the questions as well > > as I can: > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > From: "Michael S. Tsirkin" > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > Haifa/IBM@IBMIL > > > Date: 27/11/2013 01:08 AM > > > Subject: Re: Elvis upstreaming plan > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > > PM: > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > > the management stack to configure 1 thread per I/O device (as it is > > today) > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > > now create a whole new class of DoS attacks in the best case > > scenario. > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > vhost thread. We are proposing to schedule multiple devices belonging > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > If you mean "why serve multiple devices from a single thread" the answer is > > that we cannot rely on the Linux scheduler which has no knowledge of I/O > > queues to do a decent job of scheduling I/O. The idea is to take over the > > I/O scheduling responsibilities from the kernel's thread scheduler with a > > more efficient I/O scheduler inside each vhost thread. So by combining all > > of the I/O devices from the same guest (disks, network cards, etc) in a > > single I/O thread, it allows us to provide better scheduling by giving us > > more knowledge of the nature of the work. So now instead of relying on the > > linux scheduler to perform context switches between multiple vhost threads, > > we have a single thread context in which we can do the I/O scheduling more > > efficiently. We can closely monitor the performance needs of each queue of > > each device inside the vhost thread which gives us much more information > > than relying on the kernel's thread scheduler. > > And now there are 2 performance-critical pieces that need to be > optimized/tuned instead of just 1: > > 1. Kernel infrastructure that QEMU and vhost use today but you decided > to bypass. > 2. The new ELVIS code which only affects vhost devices in the same VM. > > If you split the code paths it results in more effort in the long run > and the benefit seems quite limited once you acknowledge that isolation > is important. > > Isn't the sane thing to do taking lessons from ELVIS improving existing > pieces instead of bypassing them? That way both the single VM and > host-wide performance improves. And as a bonus non-virtualization use > cases may also benefit. > > Stefan I'm not sure about that. elvis is all about specific behaviour patterns that are virtualization specific, and general claims that we can improve scheduler for all workloads seem somewhat optimistic. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > Hi, > > Razya is out for a few days, so I will try to answer the questions as well > as I can: > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > From: "Michael S. Tsirkin" > > To: Abel Gordon/Haifa/IBM@IBMIL, > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > Haifa/IBM@IBMIL > > Date: 27/11/2013 01:08 AM > > Subject: Re: Elvis upstreaming plan > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > PM: > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > the management stack to configure 1 thread per I/O device (as it is > today) > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > now create a whole new class of DoS attacks in the best case > scenario. > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > vhost thread. We are proposing to schedule multiple devices belonging > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > I guess a question then becomes why have multiple devices? > > If you mean "why serve multiple devices from a single thread" the answer is > that we cannot rely on the Linux scheduler which has no knowledge of I/O > queues to do a decent job of scheduling I/O. The idea is to take over the > I/O scheduling responsibilities from the kernel's thread scheduler with a > more efficient I/O scheduler inside each vhost thread. So by combining all > of the I/O devices from the same guest (disks, network cards, etc) in a > single I/O thread, it allows us to provide better scheduling by giving us > more knowledge of the nature of the work. So now instead of relying on the > linux scheduler to perform context switches between multiple vhost threads, > we have a single thread context in which we can do the I/O scheduling more > efficiently. We can closely monitor the performance needs of each queue of > each device inside the vhost thread which gives us much more information > than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. 2. The new ELVIS code which only affects vhost devices in the same VM. If you split the code paths it results in more effort in the long run and the benefit seems quite limited once you acknowledge that isolation is important. Isn't the sane thing to do taking lessons from ELVIS improving existing pieces instead of bypassing them? That way both the single VM and host-wide performance improves. And as a bonus non-virtualization use cases may also benefit. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 01:05:40PM +0200, Abel Gordon wrote: > > > (CCing Eyal Moscovici who is actually prototyping with multiple > > > policies and may want to join this thread) > > > > > > Starting with basic policies: we can use a single vhost thread > > > and create new vhost threads if it becomes saturated and there > > > are enough cpu cycles available in the system > > > or if the latency (how long the requests in the virtio queues wait > > > until they are handled) is too high. > > > We can merge threads if the latency is already low or if the threads > > > are not saturated. > > > > > > There is a hidden trade-off here: when you run more vhost threads you > > > may actually be stealing cpu cycles from the vcpu threads and also > > > increasing context switches. So, from the vhost perspective it may > > > improve performance but from the vcpu threads perspective it may > > > degrade performance. > > > > So this is a very interesting problem to solve but what does > > management know that suggests it can solve it better? > > Yep, and Eyal is currently working on this. > What the management knows ? depends who the management is :) > Could be just I/O activity (black-box: I/O request rate, I/O > handling rate, latency) We know much more about this than managament, don't we? > or application performance (white-box). This would have to come with a proposal for getting this white-box info out of guest somehow. -- MSR -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote: > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:59:38 PM: > > > > On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: > > > > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > > > > > > > > > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > > > > Hi, > > > > > > > > > > Razya is out for a few days, so I will try to answer the questions > as > > > well > > > > > as I can: > > > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 > PM: > > > > > > > > > > > From: "Michael S. Tsirkin" > > > > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > > > > Cc: Anthony Liguori , > abel.gor...@gmail.com, > > > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel > Nider/Haifa/ > > > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya > Ladelsky/ > > > > > > Haifa/IBM@IBMIL > > > > > > Date: 27/11/2013 01:08 AM > > > > > > Subject: Re: Elvis upstreaming plan > > > > > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > > > 08:05:00 > > > > > PM: > > > > > > > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > > > enable > > > > > > > the management stack to configure 1 thread per I/O device (as > it is > > > > > today) > > > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost > device, > > > you > > > > > > > > now create a whole new class of DoS attacks in the best case > > > > > scenario. > > > > > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a > single > > > > > > > vhost thread. We are proposing to schedule multiple devices > > > belonging > > > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > > > > > If you mean "why serve multiple devices from a single thread" the > > > answer is > > > > > that we cannot rely on the Linux scheduler which has no knowledge > of > > > I/O > > > > > queues to do a decent job of scheduling I/O. The idea is to take > over > > > the > > > > > I/O scheduling responsibilities from the kernel's thread scheduler > with > > > a > > > > > more efficient I/O scheduler inside each vhost thread. So by > combining > > > all > > > > > of the I/O devices from the same guest (disks, network cards, etc) > in a > > > > > single I/O thread, it allows us to provide better scheduling by > giving > > > us > > > > > more knowledge of the nature of the work. So now instead of > relying on > > > the > > > > > linux scheduler to perform context switches between multiple vhost > > > threads, > > > > > we have a single thread context in which we can do the I/O > scheduling > > > more > > > > > efficiently. We can closely monitor the performance needs of each > > > queue of > > > > > each device inside the vhost thread which gives us much more > > > information > > > > > than relying on the kernel's thread scheduler. > > > > > This does not expose any additional opportunities for attacks (DoS > or > > > > > other) than are already available since all of the I/O traffic > belongs > > > to a > > > > > single guest. &
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 27/11/2013 01:03:25 PM: > > On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote: > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:29:43 PM: > > > > > > > > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: > > > > > > > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 11:21:00 AM: > > > > > > > > > > > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 > > PM: > > > > > > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > > > > 08:05:00 > > > > > > PM: > > > > > > > > > > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization > > team, > > > > which > > > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM > > > > forum: > > > > > > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > > > > > > ELVIS slides: > > > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > > > > > > upstreaming > > > > > > > > > > some of the Elvis approaches seems to be a good idea, which > > we > > > > > > would > > > > > > > > like > > > > > > > > > > to pursue. > > > > > > > > > > > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > > > > > > This patch creates a worker thread and worker queue shared > > > > across > > > > > > > > multiple > > > > > > > > > > virtio devices > > > > > > > > > > We would like to modify the patch posted in > > > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > > > > > > to limit a vhost thread to serve multiple devices only if > > they > > > > > > belong > > > > > > > > to > > > > > > > > > > the same VM as Paolo suggested to avoid isolation or > > cgroups > > > > > > concerns. > > > > > > > > > > > > > > > > > > > > Another modification is related to the creation and removal > > of > > > > > > vhost > > > > > > > > > > threads, which will be discussed next. > > > > > > > > > > > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > > > > > > > > > > > We shouldn't throw away isolation without exhausting every > > other > > > > > > > > > possibility. > > > > > > > > > > > > > > > > Seems you have missed the important details here. > > > > > > > > Anthony, we are aware you are concerned about isolation > > > > > > > > and you believe we should not share a single vhost thread > > across > > > > > > > > multiple VMs. That's why Razya proposed to change the patch > > > > > > > > so we will serve multiple virtio devices using a single vhost > > > > thread > > > > > > > > "only if the devices belong to the same VM". This series of > > patches > > > > > > > > will not allow two different VMs to share the same vhost > > thread. > > > > > > > > So, I don't see why this will be throwing away isolation and > > why > > > > > > > > this could be a "exceptionally bad idea". > > > > > > > > > > > > > > > > By the way, I remember that during the KVM forum a similar > > > > > > > > approach of having a single data plane thread for many devices > > > > > > > > was discussed > > > > > > > > > We've seen very positive results from adding threads. We > > should > > > > also > > > > > > > > > look at scheduling. > > > > > > > > > > > > > > > > ...and we have also seen exceptionally negative results from > > > > > > > > adding threads, both for vhost and data-plane. If you have lot > > of > > > > idle > > > > > > > > time/cores > > > > > > > > then it makes sense to run multiple threads. But IMHO in many > > > > scenarios > > > > > > you > > > > > > > > don't have lot of idle time/cores.. and if you have them you > > would > > > > > > probably > > > > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when > > you > > > > have > > > > > > > > enough physical cores to run all the VCPU threads and the I/O > > > > threads > > > > > > is > > > > > > > > not a > > > > > > > > realistic scenario. > > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > > > > enable > > > > > > > > the management stack to configure 1 thread per I/O device (as > > it is > > > > > > today) > > > > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost > > device, > > > > you > > > > > > > > > now create a whole new class of DoS attacks in the best case > > > > > > scenario.
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 27/11/2013 12:59:38 PM: > On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > > > > > > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > > > Hi, > > > > > > > > Razya is out for a few days, so I will try to answer the questions as > > well > > > > as I can: > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > > > > > From: "Michael S. Tsirkin" > > > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > > > Haifa/IBM@IBMIL > > > > > Date: 27/11/2013 01:08 AM > > > > > Subject: Re: Elvis upstreaming plan > > > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > > 08:05:00 > > > > PM: > > > > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > > enable > > > > > > the management stack to configure 1 thread per I/O device (as it is > > > > today) > > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, > > you > > > > > > > now create a whole new class of DoS attacks in the best case > > > > scenario. > > > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > > > vhost thread. We are proposing to schedule multiple devices > > belonging > > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > > > If you mean "why serve multiple devices from a single thread" the > > answer is > > > > that we cannot rely on the Linux scheduler which has no knowledge of > > I/O > > > > queues to do a decent job of scheduling I/O. The idea is to take over > > the > > > > I/O scheduling responsibilities from the kernel's thread scheduler with > > a > > > > more efficient I/O scheduler inside each vhost thread. So by combining > > all > > > > of the I/O devices from the same guest (disks, network cards, etc) in a > > > > single I/O thread, it allows us to provide better scheduling by giving > > us > > > > more knowledge of the nature of the work. So now instead of relying on > > the > > > > linux scheduler to perform context switches between multiple vhost > > threads, > > > > we have a single thread context in which we can do the I/O scheduling > > more > > > > efficiently. We can closely monitor the performance needs of each > > queue of > > > > each device inside the vhost thread which gives us much more > > information > > > > than relying on the kernel's thread scheduler. > > > > This does not expose any additional opportunities for attacks (DoS or > > > > other) than are already available since all of the I/O traffic belongs > > to a > > > > single guest. > > > > You can make the argument that with low I/O loads this mechanism may > > not > > > > make much difference. However when you try to maximize the utilization > > of > > > > your hardware (such as in a commercial scenario) this technique can > > gain > > > > you a large benefit. > > > > > > > > Regards, > > > > > > > > Joel Nider > > > > Virtualization Research > > > > IBM Research and Development > > > > Haifa Research Lab > >
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote: > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:29:43 PM: > > > > > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: > > > > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 11:21:00 AM: > > > > > > > > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 > PM: > > > > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > > > 08:05:00 > > > > > PM: > > > > > > > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization > team, > > > which > > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM > > > forum: > > > > > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > > > > > ELVIS slides: > > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > > > > > upstreaming > > > > > > > > > some of the Elvis approaches seems to be a good idea, which > we > > > > > would > > > > > > > like > > > > > > > > > to pursue. > > > > > > > > > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > > > > > This patch creates a worker thread and worker queue shared > > > across > > > > > > > multiple > > > > > > > > > virtio devices > > > > > > > > > We would like to modify the patch posted in > > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > > > > > to limit a vhost thread to serve multiple devices only if > they > > > > > belong > > > > > > > to > > > > > > > > > the same VM as Paolo suggested to avoid isolation or > cgroups > > > > > concerns. > > > > > > > > > > > > > > > > > > Another modification is related to the creation and removal > of > > > > > vhost > > > > > > > > > threads, which will be discussed next. > > > > > > > > > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > > > > > > > > > We shouldn't throw away isolation without exhausting every > other > > > > > > > > possibility. > > > > > > > > > > > > > > Seems you have missed the important details here. > > > > > > > Anthony, we are aware you are concerned about isolation > > > > > > > and you believe we should not share a single vhost thread > across > > > > > > > multiple VMs. That's why Razya proposed to change the patch > > > > > > > so we will serve multiple virtio devices using a single vhost > > > thread > > > > > > > "only if the devices belong to the same VM". This series of > patches > > > > > > > will not allow two different VMs to share the same vhost > thread. > > > > > > > So, I don't see why this will be throwing away isolation and > why > > > > > > > this could be a "exceptionally bad idea". > > > > > > > > > > > > > > By the way, I remember that during the KVM forum a similar > > > > > > > approach of having a single data plane thread for many devices > > > > > > > was discussed > > > > > > > > We've seen very positive results from adding threads. We > should > > > also > > > > > > > > look at scheduling. > > > > > > > > > > > > > > ...and we have also seen exceptionally negative results from > > > > > > > adding threads, both for vhost and data-plane. If you have lot > of > > > idle > > > > > > > time/cores > > > > > > > then it makes sense to run multiple threads. But IMHO in many > > > scenarios > > > > > you > > > > > > > don't have lot of idle time/cores.. and if you have them you > would > > > > > probably > > > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when > you > > > have > > > > > > > enough physical cores to run all the VCPU threads and the I/O > > > threads > > > > > is > > > > > > > not a > > > > > > > realistic scenario. > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > > > enable > > > > > > > the management stack to configure 1 thread per I/O device (as > it is > > > > > today) > > > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost > device, > > > you > > > > > > > > now create a whole new class of DoS attacks in the best case > > > > > scenario. > > > > > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a > single > > > > > > > vhost thread. We are proposing to schedule multiple devices > > > belonging > > > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > > > > > > > I guess a question
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: > > > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > > > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > > Hi, > > > > > > Razya is out for a few days, so I will try to answer the questions as > well > > > as I can: > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > > > From: "Michael S. Tsirkin" > > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > > Haifa/IBM@IBMIL > > > > Date: 27/11/2013 01:08 AM > > > > Subject: Re: Elvis upstreaming plan > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > 08:05:00 > > > PM: > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > enable > > > > > the management stack to configure 1 thread per I/O device (as it is > > > today) > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, > you > > > > > > now create a whole new class of DoS attacks in the best case > > > scenario. > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > > vhost thread. We are proposing to schedule multiple devices > belonging > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > If you mean "why serve multiple devices from a single thread" the > answer is > > > that we cannot rely on the Linux scheduler which has no knowledge of > I/O > > > queues to do a decent job of scheduling I/O. The idea is to take over > the > > > I/O scheduling responsibilities from the kernel's thread scheduler with > a > > > more efficient I/O scheduler inside each vhost thread. So by combining > all > > > of the I/O devices from the same guest (disks, network cards, etc) in a > > > single I/O thread, it allows us to provide better scheduling by giving > us > > > more knowledge of the nature of the work. So now instead of relying on > the > > > linux scheduler to perform context switches between multiple vhost > threads, > > > we have a single thread context in which we can do the I/O scheduling > more > > > efficiently. We can closely monitor the performance needs of each > queue of > > > each device inside the vhost thread which gives us much more > information > > > than relying on the kernel's thread scheduler. > > > This does not expose any additional opportunities for attacks (DoS or > > > other) than are already available since all of the I/O traffic belongs > to a > > > single guest. > > > You can make the argument that with low I/O loads this mechanism may > not > > > make much difference. However when you try to maximize the utilization > of > > > your hardware (such as in a commercial scenario) this technique can > gain > > > you a large benefit. > > > > > > Regards, > > > > > > Joel Nider > > > Virtualization Research > > > IBM Research and Development > > > Haifa Research Lab > > > > So all this would sound more convincing if we had sharing between VMs. > > When it's only a single VM it's somehow less convincing, isn't it? > > Of course if we would bypass a scheduler like this it becomes harder to > > enforce cgroup limits. > > True, but here the issue becomes isolation/cgroups. We can start to show > the value for VMs that have multiple devices / queues and then we could > re-consider extending the mechanism for multiple VMs (at least as a > experimental feature). Sorry, If it
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 27/11/2013 12:29:43 PM: > > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: > > > > > > "Michael S. Tsirkin" wrote on 27/11/2013 11:21:00 AM: > > > > > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > > > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > > 08:05:00 > > > > PM: > > > > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, > > which > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM > > forum: > > > > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > > > > ELVIS slides: > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > > > > upstreaming > > > > > > > > some of the Elvis approaches seems to be a good idea, which we > > > > would > > > > > > like > > > > > > > > to pursue. > > > > > > > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > > > > This patch creates a worker thread and worker queue shared > > across > > > > > > multiple > > > > > > > > virtio devices > > > > > > > > We would like to modify the patch posted in > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > > > > to limit a vhost thread to serve multiple devices only if they > > > > belong > > > > > > to > > > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups > > > > concerns. > > > > > > > > > > > > > > > > Another modification is related to the creation and removal of > > > > vhost > > > > > > > > threads, which will be discussed next. > > > > > > > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > > > > > > > We shouldn't throw away isolation without exhausting every other > > > > > > > possibility. > > > > > > > > > > > > Seems you have missed the important details here. > > > > > > Anthony, we are aware you are concerned about isolation > > > > > > and you believe we should not share a single vhost thread across > > > > > > multiple VMs. That's why Razya proposed to change the patch > > > > > > so we will serve multiple virtio devices using a single vhost > > thread > > > > > > "only if the devices belong to the same VM". This series of patches > > > > > > will not allow two different VMs to share the same vhost thread. > > > > > > So, I don't see why this will be throwing away isolation and why > > > > > > this could be a "exceptionally bad idea". > > > > > > > > > > > > By the way, I remember that during the KVM forum a similar > > > > > > approach of having a single data plane thread for many devices > > > > > > was discussed > > > > > > > We've seen very positive results from adding threads. We should > > also > > > > > > > look at scheduling. > > > > > > > > > > > > ...and we have also seen exceptionally negative results from > > > > > > adding threads, both for vhost and data-plane. If you have lot of > > idle > > > > > > time/cores > > > > > > then it makes sense to run multiple threads. But IMHO in many > > scenarios > > > > you > > > > > > don't have lot of idle time/cores.. and if you have them you would > > > > probably > > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when you > > have > > > > > > enough physical cores to run all the VCPU threads and the I/O > > threads > > > > is > > > > > > not a > > > > > > realistic scenario. > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will > > enable > > > > > > the management stack to configure 1 thread per I/O device (as it is > > > > today) > > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, > > you > > > > > > > now create a whole new class of DoS attacks in the best case > > > > scenario. > > > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > > > vhost thread. We are proposing to schedule multiple devices > > belonging > > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > > > I assume that there are guests that have multiple vhost devices > > > > (net or scsi/tcm). > > > > > > These are kind of uncommon though. In fact a kernel thread is not a > > > unit of isolation - cgroups supply isolation. > > > If we had use_cgroups kind of like use_mm, we could thi
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > > Hi, > > > > Razya is out for a few days, so I will try to answer the questions as well > > as I can: > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > From: "Michael S. Tsirkin" > > > To: Abel Gordon/Haifa/IBM@IBMIL, > > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > > Haifa/IBM@IBMIL > > > Date: 27/11/2013 01:08 AM > > > Subject: Re: Elvis upstreaming plan > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > > PM: > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > > the management stack to configure 1 thread per I/O device (as it is > > today) > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > > now create a whole new class of DoS attacks in the best case > > scenario. > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > vhost thread. We are proposing to schedule multiple devices belonging > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > If you mean "why serve multiple devices from a single thread" the answer is > > that we cannot rely on the Linux scheduler which has no knowledge of I/O > > queues to do a decent job of scheduling I/O. The idea is to take over the > > I/O scheduling responsibilities from the kernel's thread scheduler with a > > more efficient I/O scheduler inside each vhost thread. So by combining all > > of the I/O devices from the same guest (disks, network cards, etc) in a > > single I/O thread, it allows us to provide better scheduling by giving us > > more knowledge of the nature of the work. So now instead of relying on the > > linux scheduler to perform context switches between multiple vhost threads, > > we have a single thread context in which we can do the I/O scheduling more > > efficiently. We can closely monitor the performance needs of each queue of > > each device inside the vhost thread which gives us much more information > > than relying on the kernel's thread scheduler. > > This does not expose any additional opportunities for attacks (DoS or > > other) than are already available since all of the I/O traffic belongs to a > > single guest. > > You can make the argument that with low I/O loads this mechanism may not > > make much difference. However when you try to maximize the utilization of > > your hardware (such as in a commercial scenario) this technique can gain > > you a large benefit. > > > > Regards, > > > > Joel Nider > > Virtualization Research > > IBM Research and Development > > Haifa Research Lab > > So all this would sound more convincing if we had sharing between VMs. > When it's only a single VM it's somehow less convincing, isn't it? > Of course if we would bypass a scheduler like this it becomes harder to > enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). > But it might be easier to give scheduler the info it needs to do what we > need. Would an API that basically says "run this kthread right now" > do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than "run this kthread right now" because you need to monitor the virtio ring activity to specify "when" you will like to run a "specific kthread" and for "how long". > > > > > > > > > Phone: 972-4-829-6326 | Mobile: 972-54-3
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:18:51PM +0200, Abel Gordon wrote: > > > Jason Wang wrote on 27/11/2013 04:49:20 AM: > > > > > On 11/24/2013 05:22 PM, Razya Ladelsky wrote: > > > Hi all, > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > ELVIS slides: > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > According to the discussions that took place at the forum, upstreaming > > > some of the Elvis approaches seems to be a good idea, which we would > like > > > to pursue. > > > > > > Our plan for the first patches is the following: > > > > > > 1.Shared vhost thread between mutiple devices > > > This patch creates a worker thread and worker queue shared across > multiple > > > virtio devices > > > We would like to modify the patch posted in > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > to limit a vhost thread to serve multiple devices only if they belong > to > > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > > > Another modification is related to the creation and removal of vhost > > > threads, which will be discussed next. > > > > > > 2. Sysfs mechanism to add and remove vhost threads > > > This patch allows us to add and remove vhost threads dynamically. > > > > > > A simpler way to control the creation of vhost threads is statically > > > determining the maximum number of virtio devices per worker via a > kernel > > > module parameter (which is the way the previously mentioned patch is > > > currently implemented) > > > > Any chance we can re-use the cwmq instead of inventing another > > mechanism? Looks like there're lots of function duplication here. Bandan > > has an RFC to do this. > > Thanks for the suggestion. We should certainly take a look at Bandan's > patches which I guess are: > > http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html > > My only concern here is that we may not be able to easily implement > our polling mechanism and heuristics with cwmq. It's not so hard, to poll you just requeue work to make sure it's re-invoked. > > > > > > I'd like to ask for advice here about the more preferable way to go: > > > Although having the sysfs mechanism provides more flexibility, it may > be a > > > good idea to start with a simple static parameter, and have the first > > > patches as simple as possible. What do you think? > > > > > > 3.Add virtqueue polling mode to vhost > > > Have the vhost thread poll the virtqueues with high I/O rate for new > > > buffers , and avoid asking the guest to kick us. > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > 26616133fafb7855cc80fac070b0572fd1aaf5d0 > > > > Maybe we can make poll_stop_idle adaptive which may help the light load > > case. Consider guest is often slow than vhost, if we just have one or > > two vms, polling too much may waste cpu in this case. > > Yes, make polling adaptive based on the amount of wasted cycles (cycles > we did polling but didn't find new work) and I/O rate is a very good idea. > Note we already measure and expose these values but we do not use them > to adapt the polling mechanism. > > Having said that, note that adaptive polling may be a bit tricky. > Remember that the cycles we waste polling in the vhost thread actually > improves the performance of the vcpu threads because the guest is no longer > > require to kick (pio==exit) the host when vhost does polling. So even if > we waste cycles in the vhost thread, we are saving cycles in the > vcpu thread and improving performance. So my suggestion would be: - guest runs some kicks - measures how long it took, e.g. kick = T cycles - sends this info to host host polls for at most fraction * T cycles > > > 4. vhost statistics > > > This patch introduces a set of statistics to monitor different > performance > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > statistics are exposed using debugfs and can be easily displayed with a > > > > Python script (vhost_stat, based on the old kvm_stats) > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > How about using trace points instead? Besides statistics, it can also > > help more in debugging. > > Yep, we just had a discussion with Gleb about this :) > > > > > > > 5. Add heuristics to improve I/O scheduling > > > This patch enhances the round-robin mechanism with a set of heuristics > to > > > decide when to leave a virtqueue and proceed to the next. > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > > > > > This patch improves the handling of the requests by the vhost thread, > but > > > could perhaps be delayed to a > > > later time , and not submitted
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: > > > "Michael S. Tsirkin" wrote on 27/11/2013 11:21:00 AM: > > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 > 08:05:00 > > > PM: > > > > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, > which > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM > forum: > > > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > > > ELVIS slides: > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > > > upstreaming > > > > > > > some of the Elvis approaches seems to be a good idea, which we > > > would > > > > > like > > > > > > > to pursue. > > > > > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > > > This patch creates a worker thread and worker queue shared > across > > > > > multiple > > > > > > > virtio devices > > > > > > > We would like to modify the patch posted in > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > > > to limit a vhost thread to serve multiple devices only if they > > > belong > > > > > to > > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups > > > concerns. > > > > > > > > > > > > > > Another modification is related to the creation and removal of > > > vhost > > > > > > > threads, which will be discussed next. > > > > > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > > > > > We shouldn't throw away isolation without exhausting every other > > > > > > possibility. > > > > > > > > > > Seems you have missed the important details here. > > > > > Anthony, we are aware you are concerned about isolation > > > > > and you believe we should not share a single vhost thread across > > > > > multiple VMs. That's why Razya proposed to change the patch > > > > > so we will serve multiple virtio devices using a single vhost > thread > > > > > "only if the devices belong to the same VM". This series of patches > > > > > will not allow two different VMs to share the same vhost thread. > > > > > So, I don't see why this will be throwing away isolation and why > > > > > this could be a "exceptionally bad idea". > > > > > > > > > > By the way, I remember that during the KVM forum a similar > > > > > approach of having a single data plane thread for many devices > > > > > was discussed > > > > > > We've seen very positive results from adding threads. We should > also > > > > > > look at scheduling. > > > > > > > > > > ...and we have also seen exceptionally negative results from > > > > > adding threads, both for vhost and data-plane. If you have lot of > idle > > > > > time/cores > > > > > then it makes sense to run multiple threads. But IMHO in many > scenarios > > > you > > > > > don't have lot of idle time/cores.. and if you have them you would > > > probably > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when you > have > > > > > enough physical cores to run all the VCPU threads and the I/O > threads > > > is > > > > > not a > > > > > realistic scenario. > > > > > > > > > > That's why we are proposing to implement a mechanism that will > enable > > > > > the management stack to configure 1 thread per I/O device (as it is > > > today) > > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, > you > > > > > > now create a whole new class of DoS attacks in the best case > > > scenario. > > > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > > vhost thread. We are proposing to schedule multiple devices > belonging > > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > > > I assume that there are guests that have multiple vhost devices > > > (net or scsi/tcm). > > > > These are kind of uncommon though. In fact a kernel thread is not a > > unit of isolation - cgroups supply isolation. > > If we had use_cgroups kind of like use_mm, we could thinkably > > do work for multiple VMs on the same thread. > > > > > > > We can also extend the approach to consider > > > multiqueue devices, so we can create 1 vhost thread shared for all the > > > queues, > > > 1 vhost thread for each queue or a few threads for multiple queues. We > > > could also share
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: > Hi, > > Razya is out for a few days, so I will try to answer the questions as well > as I can: > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > From: "Michael S. Tsirkin" > > To: Abel Gordon/Haifa/IBM@IBMIL, > > Cc: Anthony Liguori , abel.gor...@gmail.com, > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > > Haifa/IBM@IBMIL > > Date: 27/11/2013 01:08 AM > > Subject: Re: Elvis upstreaming plan > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > PM: > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > the management stack to configure 1 thread per I/O device (as it is > today) > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > now create a whole new class of DoS attacks in the best case > scenario. > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > vhost thread. We are proposing to schedule multiple devices belonging > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > I guess a question then becomes why have multiple devices? > > If you mean "why serve multiple devices from a single thread" the answer is > that we cannot rely on the Linux scheduler which has no knowledge of I/O > queues to do a decent job of scheduling I/O. The idea is to take over the > I/O scheduling responsibilities from the kernel's thread scheduler with a > more efficient I/O scheduler inside each vhost thread. So by combining all > of the I/O devices from the same guest (disks, network cards, etc) in a > single I/O thread, it allows us to provide better scheduling by giving us > more knowledge of the nature of the work. So now instead of relying on the > linux scheduler to perform context switches between multiple vhost threads, > we have a single thread context in which we can do the I/O scheduling more > efficiently. We can closely monitor the performance needs of each queue of > each device inside the vhost thread which gives us much more information > than relying on the kernel's thread scheduler. > This does not expose any additional opportunities for attacks (DoS or > other) than are already available since all of the I/O traffic belongs to a > single guest. > You can make the argument that with low I/O loads this mechanism may not > make much difference. However when you try to maximize the utilization of > your hardware (such as in a commercial scenario) this technique can gain > you a large benefit. > > Regards, > > Joel Nider > Virtualization Research > IBM Research and Development > Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says "run this kthread right now" do the trick? > > > > > > > Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image > moved to file: > E-mail: jo...@il.ibm.com > pic39571.gif)IBM > > > > > > > > > > > > > Hi all, > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > ELVIS slides: > > > https://driv
Re: Elvis upstreaming plan
Jason Wang wrote on 27/11/2013 04:49:20 AM: > > On 11/24/2013 05:22 PM, Razya Ladelsky wrote: > > Hi all, > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > According to the discussions that took place at the forum, upstreaming > > some of the Elvis approaches seems to be a good idea, which we would like > > to pursue. > > > > Our plan for the first patches is the following: > > > > 1.Shared vhost thread between mutiple devices > > This patch creates a worker thread and worker queue shared across multiple > > virtio devices > > We would like to modify the patch posted in > > https://github.com/abelg/virtual_io_acceleration/commit/ > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > Another modification is related to the creation and removal of vhost > > threads, which will be discussed next. > > > > 2. Sysfs mechanism to add and remove vhost threads > > This patch allows us to add and remove vhost threads dynamically. > > > > A simpler way to control the creation of vhost threads is statically > > determining the maximum number of virtio devices per worker via a kernel > > module parameter (which is the way the previously mentioned patch is > > currently implemented) > > Any chance we can re-use the cwmq instead of inventing another > mechanism? Looks like there're lots of function duplication here. Bandan > has an RFC to do this. Thanks for the suggestion. We should certainly take a look at Bandan's patches which I guess are: http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html My only concern here is that we may not be able to easily implement our polling mechanism and heuristics with cwmq. > > > > I'd like to ask for advice here about the more preferable way to go: > > Although having the sysfs mechanism provides more flexibility, it may be a > > good idea to start with a simple static parameter, and have the first > > patches as simple as possible. What do you think? > > > > 3.Add virtqueue polling mode to vhost > > Have the vhost thread poll the virtqueues with high I/O rate for new > > buffers , and avoid asking the guest to kick us. > > https://github.com/abelg/virtual_io_acceleration/commit/ > 26616133fafb7855cc80fac070b0572fd1aaf5d0 > > Maybe we can make poll_stop_idle adaptive which may help the light load > case. Consider guest is often slow than vhost, if we just have one or > two vms, polling too much may waste cpu in this case. Yes, make polling adaptive based on the amount of wasted cycles (cycles we did polling but didn't find new work) and I/O rate is a very good idea. Note we already measure and expose these values but we do not use them to adapt the polling mechanism. Having said that, note that adaptive polling may be a bit tricky. Remember that the cycles we waste polling in the vhost thread actually improves the performance of the vcpu threads because the guest is no longer require to kick (pio==exit) the host when vhost does polling. So even if we waste cycles in the vhost thread, we are saving cycles in the vcpu thread and improving performance. > > 4. vhost statistics > > This patch introduces a set of statistics to monitor different performance > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > statistics are exposed using debugfs and can be easily displayed with a > > Python script (vhost_stat, based on the old kvm_stats) > > https://github.com/abelg/virtual_io_acceleration/commit/ > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > How about using trace points instead? Besides statistics, it can also > help more in debugging. Yep, we just had a discussion with Gleb about this :) > > > > 5. Add heuristics to improve I/O scheduling > > This patch enhances the round-robin mechanism with a set of heuristics to > > decide when to leave a virtqueue and proceed to the next. > > https://github.com/abelg/virtual_io_acceleration/commit/ > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > > > This patch improves the handling of the requests by the vhost thread, but > > could perhaps be delayed to a > > later time , and not submitted as one of the first Elvis patches. > > I'd love to hear some comments about whether this patch needs to be part > > of the first submission. > > > > Any other feedback on this plan will be appreciated, > > Thank you, > > Razya > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord.
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:33:19AM +0200, Abel Gordon wrote: > > > Gleb Natapov wrote on 27/11/2013 11:21:59 AM: > > > > On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: > > > > > > > > > Gleb Natapov wrote on 27/11/2013 09:35:01 AM: > > > > > > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > > > > > 4. vhost statistics > > > > > > This patch introduces a set of statistics to monitor different > > > > performance > > > > > > metrics of vhost and our polling and I/O scheduling mechanisms. > The > > > > > > statistics are exposed using debugfs and can be easily displayed > with > > > a > > > > > > Python script (vhost_stat, based on the old kvm_stats) > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > > > > > How about using trace points instead? Besides statistics, it can > also > > > > > help more in debugging. > > > > Definitely. kvm_stats has moved to ftrace long time ago. > > > > > > > > > > We should use trace points for debugging information but IMHO we > should > > > have a dedicated (and different) mechanism to expose data that can be > > > easily consumed by a user-space (policy) application to control how > many > > > vhost threads we need or any other vhost feature we may introduce > > > (e.g. polling). That's why we proposed something like vhost_stat > > > based on sysfs. > > > > > > This is not like kvm_stat that can be replaced with tracepoints. Here > > > we will like to expose data to "control" the system. So I would > > > say what we are trying to do something that resembles the ksm interface > > > implemented under /sys/kernel/mm/ksm/ > > There are control operation and there are performance/statistic > > gathering operations use /sys for former and ftrace for later. The fact > > that you need /sys interface for other things does not mean you can > > abuse it for statistics too. > > Agree. Any statistics that we add for debugging purposes should be > implemented > using tracepoints. But control and related data interfaces (that are not > for > debugging purposes) should be in sysfs. Look for example at Yes things that are not for statistics only and part of control interface that management will use should not use ftrace (I do not think adding more knobs is a good idea, but this is for vhost maintainer to decide), but ksm predates ftrace, so some things below could have been implemented as ftrace points. > /sys/kernel/mm/ksm/full_scans > /sys/kernel/mm/ksm/pages_shared > /sys/kernel/mm/ksm/pages_sharing > /sys/kernel/mm/ksm/pages_to_scan > /sys/kernel/mm/ksm/pages_unshared > /sys/kernel/mm/ksm/pages_volatile > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 27/11/2013 11:21:00 AM: > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > > PM: > > > > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > > ELVIS slides: > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > > upstreaming > > > > > > some of the Elvis approaches seems to be a good idea, which we > > would > > > > like > > > > > > to pursue. > > > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > > This patch creates a worker thread and worker queue shared across > > > > multiple > > > > > > virtio devices > > > > > > We would like to modify the patch posted in > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > > to limit a vhost thread to serve multiple devices only if they > > belong > > > > to > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups > > concerns. > > > > > > > > > > > > Another modification is related to the creation and removal of > > vhost > > > > > > threads, which will be discussed next. > > > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > > > We shouldn't throw away isolation without exhausting every other > > > > > possibility. > > > > > > > > Seems you have missed the important details here. > > > > Anthony, we are aware you are concerned about isolation > > > > and you believe we should not share a single vhost thread across > > > > multiple VMs. That's why Razya proposed to change the patch > > > > so we will serve multiple virtio devices using a single vhost thread > > > > "only if the devices belong to the same VM". This series of patches > > > > will not allow two different VMs to share the same vhost thread. > > > > So, I don't see why this will be throwing away isolation and why > > > > this could be a "exceptionally bad idea". > > > > > > > > By the way, I remember that during the KVM forum a similar > > > > approach of having a single data plane thread for many devices > > > > was discussed > > > > > We've seen very positive results from adding threads. We should also > > > > > look at scheduling. > > > > > > > > ...and we have also seen exceptionally negative results from > > > > adding threads, both for vhost and data-plane. If you have lot of idle > > > > time/cores > > > > then it makes sense to run multiple threads. But IMHO in many scenarios > > you > > > > don't have lot of idle time/cores.. and if you have them you would > > probably > > > > prefer to run more VMs/VCPUshosting a single SMP VM when you have > > > > enough physical cores to run all the VCPU threads and the I/O threads > > is > > > > not a > > > > realistic scenario. > > > > > > > > That's why we are proposing to implement a mechanism that will enable > > > > the management stack to configure 1 thread per I/O device (as it is > > today) > > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > > now create a whole new class of DoS attacks in the best case > > scenario. > > > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > > vhost thread. We are proposing to schedule multiple devices belonging > > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > > > > I guess a question then becomes why have multiple devices? > > > > I assume that there are guests that have multiple vhost devices > > (net or scsi/tcm). > > These are kind of uncommon though. In fact a kernel thread is not a > unit of isolation - cgroups supply isolation. > If we had use_cgroups kind of like use_mm, we could thinkably > do work for multiple VMs on the same thread. > > > > We can also extend the approach to consider > > multiqueue devices, so we can create 1 vhost thread shared for all the > > queues, > > 1 vhost thread for each queue or a few threads for multiple queues. We > > could also share a thread across multiple queues even if they do not belong > > to the same device. > > > > Remember the experiments Shirley Ma did with the split > > tx/rx ? If we have a control interface we could support both > > approaches: different threads or a single thread. > > > I'm a bit concerned about interface managing specifi
Re: Elvis upstreaming plan
Gleb Natapov wrote on 27/11/2013 11:21:59 AM: > On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: > > > > > > Gleb Natapov wrote on 27/11/2013 09:35:01 AM: > > > > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > > > > 4. vhost statistics > > > > > This patch introduces a set of statistics to monitor different > > > performance > > > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > > > statistics are exposed using debugfs and can be easily displayed with > > a > > > > > Python script (vhost_stat, based on the old kvm_stats) > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > > > How about using trace points instead? Besides statistics, it can also > > > > help more in debugging. > > > Definitely. kvm_stats has moved to ftrace long time ago. > > > > > > > We should use trace points for debugging information but IMHO we should > > have a dedicated (and different) mechanism to expose data that can be > > easily consumed by a user-space (policy) application to control how many > > vhost threads we need or any other vhost feature we may introduce > > (e.g. polling). That's why we proposed something like vhost_stat > > based on sysfs. > > > > This is not like kvm_stat that can be replaced with tracepoints. Here > > we will like to expose data to "control" the system. So I would > > say what we are trying to do something that resembles the ksm interface > > implemented under /sys/kernel/mm/ksm/ > There are control operation and there are performance/statistic > gathering operations use /sys for former and ftrace for later. The fact > that you need /sys interface for other things does not mean you can > abuse it for statistics too. Agree. Any statistics that we add for debugging purposes should be implemented using tracepoints. But control and related data interfaces (that are not for debugging purposes) should be in sysfs. Look for example at /sys/kernel/mm/ksm/full_scans /sys/kernel/mm/ksm/pages_shared /sys/kernel/mm/ksm/pages_sharing /sys/kernel/mm/ksm/pages_to_scan /sys/kernel/mm/ksm/pages_unshared /sys/kernel/mm/ksm/pages_volatile -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: > > > Gleb Natapov wrote on 27/11/2013 09:35:01 AM: > > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > > > 4. vhost statistics > > > > This patch introduces a set of statistics to monitor different > > performance > > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > > statistics are exposed using debugfs and can be easily displayed with > a > > > > Python script (vhost_stat, based on the old kvm_stats) > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > How about using trace points instead? Besides statistics, it can also > > > help more in debugging. > > Definitely. kvm_stats has moved to ftrace long time ago. > > > > We should use trace points for debugging information but IMHO we should > have a dedicated (and different) mechanism to expose data that can be > easily consumed by a user-space (policy) application to control how many > vhost threads we need or any other vhost feature we may introduce > (e.g. polling). That's why we proposed something like vhost_stat > based on sysfs. > > This is not like kvm_stat that can be replaced with tracepoints. Here > we will like to expose data to "control" the system. So I would > say what we are trying to do something that resembles the ksm interface > implemented under /sys/kernel/mm/ksm/ There are control operation and there are performance/statistic gathering operations use /sys for former and ftrace for later. The fact that you need /sys interface for other things does not mean you can abuse it for statistics too. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Gleb Natapov wrote on 27/11/2013 09:35:01 AM: > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > > 4. vhost statistics > > > This patch introduces a set of statistics to monitor different > performance > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > statistics are exposed using debugfs and can be easily displayed with a > > > Python script (vhost_stat, based on the old kvm_stats) > > > https://github.com/abelg/virtual_io_acceleration/commit/ > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > How about using trace points instead? Besides statistics, it can also > > help more in debugging. > Definitely. kvm_stats has moved to ftrace long time ago. > We should use trace points for debugging information but IMHO we should have a dedicated (and different) mechanism to expose data that can be easily consumed by a user-space (policy) application to control how many vhost threads we need or any other vhost feature we may introduce (e.g. polling). That's why we proposed something like vhost_stat based on sysfs. This is not like kvm_stat that can be replaced with tracepoints. Here we will like to expose data to "control" the system. So I would say what we are trying to do something that resembles the ksm interface implemented under /sys/kernel/mm/ksm/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: > > > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 > PM: > > > > > > > > > > > Razya Ladelsky writes: > > > > > > > > > Hi all, > > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > > ELVIS slides: > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > > > > According to the discussions that took place at the forum, > upstreaming > > > > > some of the Elvis approaches seems to be a good idea, which we > would > > > like > > > > > to pursue. > > > > > > > > > > Our plan for the first patches is the following: > > > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > > This patch creates a worker thread and worker queue shared across > > > multiple > > > > > virtio devices > > > > > We would like to modify the patch posted in > > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > > to limit a vhost thread to serve multiple devices only if they > belong > > > to > > > > > the same VM as Paolo suggested to avoid isolation or cgroups > concerns. > > > > > > > > > > Another modification is related to the creation and removal of > vhost > > > > > threads, which will be discussed next. > > > > > > > > I think this is an exceptionally bad idea. > > > > > > > > We shouldn't throw away isolation without exhausting every other > > > > possibility. > > > > > > Seems you have missed the important details here. > > > Anthony, we are aware you are concerned about isolation > > > and you believe we should not share a single vhost thread across > > > multiple VMs. That's why Razya proposed to change the patch > > > so we will serve multiple virtio devices using a single vhost thread > > > "only if the devices belong to the same VM". This series of patches > > > will not allow two different VMs to share the same vhost thread. > > > So, I don't see why this will be throwing away isolation and why > > > this could be a "exceptionally bad idea". > > > > > > By the way, I remember that during the KVM forum a similar > > > approach of having a single data plane thread for many devices > > > was discussed > > > > We've seen very positive results from adding threads. We should also > > > > look at scheduling. > > > > > > ...and we have also seen exceptionally negative results from > > > adding threads, both for vhost and data-plane. If you have lot of idle > > > time/cores > > > then it makes sense to run multiple threads. But IMHO in many scenarios > you > > > don't have lot of idle time/cores.. and if you have them you would > probably > > > prefer to run more VMs/VCPUshosting a single SMP VM when you have > > > enough physical cores to run all the VCPU threads and the I/O threads > is > > > not a > > > realistic scenario. > > > > > > That's why we are proposing to implement a mechanism that will enable > > > the management stack to configure 1 thread per I/O device (as it is > today) > > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > > now create a whole new class of DoS attacks in the best case > scenario. > > > > > > Again, we are NOT proposing to schedule multiple guests in a single > > > vhost thread. We are proposing to schedule multiple devices belonging > > > to the same guest in a single (or multiple) vhost thread/s. > > > > > > > I guess a question then becomes why have multiple devices? > > I assume that there are guests that have multiple vhost devices > (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. > We can also extend the approach to consider > multiqueue devices, so we can create 1 vhost thread shared for all the > queues, > 1 vhost thread for each queue or a few threads for multiple queues. We > could also share a thread across multiple queues even if they do not belong > to the same device. > > Remember the experiments Shirley Ma did with the split > tx/rx ? If we have a control interface we could support both > approaches: different threads or a single thread. I'm a bit concerned about interface managing specific threads being so low level. What exactly is it that management knows that makes it efficient to group threads together? That host is over-committed so we should use less CPU? I'd like the interface to express that knowledge. > > > > > > > > > > > > > 2. Sysfs mechanism to add and remove
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 PM: > > > > > > > > Razya Ladelsky writes: > > > > > > > Hi all, > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > ELVIS slides: > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > According to the discussions that took place at the forum, upstreaming > > > > some of the Elvis approaches seems to be a good idea, which we would > > like > > > > to pursue. > > > > > > > > Our plan for the first patches is the following: > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > This patch creates a worker thread and worker queue shared across > > multiple > > > > virtio devices > > > > We would like to modify the patch posted in > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > to limit a vhost thread to serve multiple devices only if they belong > > to > > > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > > > > > Another modification is related to the creation and removal of vhost > > > > threads, which will be discussed next. > > > > > > I think this is an exceptionally bad idea. > > > > > > We shouldn't throw away isolation without exhausting every other > > > possibility. > > > > Seems you have missed the important details here. > > Anthony, we are aware you are concerned about isolation > > and you believe we should not share a single vhost thread across > > multiple VMs. That's why Razya proposed to change the patch > > so we will serve multiple virtio devices using a single vhost thread > > "only if the devices belong to the same VM". This series of patches > > will not allow two different VMs to share the same vhost thread. > > So, I don't see why this will be throwing away isolation and why > > this could be a "exceptionally bad idea". > > > > By the way, I remember that during the KVM forum a similar > > approach of having a single data plane thread for many devices > > was discussed > > > We've seen very positive results from adding threads. We should also > > > look at scheduling. > > > > ...and we have also seen exceptionally negative results from > > adding threads, both for vhost and data-plane. If you have lot of idle > > time/cores > > then it makes sense to run multiple threads. But IMHO in many scenarios you > > don't have lot of idle time/cores.. and if you have them you would probably > > prefer to run more VMs/VCPUshosting a single SMP VM when you have > > enough physical cores to run all the VCPU threads and the I/O threads is > > not a > > realistic scenario. > > > > That's why we are proposing to implement a mechanism that will enable > > the management stack to configure 1 thread per I/O device (as it is today) > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > now create a whole new class of DoS attacks in the best case scenario. > > > > Again, we are NOT proposing to schedule multiple guests in a single > > vhost thread. We are proposing to schedule multiple devices belonging > > to the same guest in a single (or multiple) vhost thread/s. > > > > I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. > > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads > > > > This patch allows us to add and remove vhost threads dynamically. > > > > > > > > A simpler way to control the creation of vhost threads is statically > > > > determining the maximum number of virtio devices per worker via a > > kernel > > > > module parameter (which is the way the previously mentioned patch is > > > > currently implemented) > > > > > > > > I'd like to ask for advice here about the more preferable way to go: > > > > Although having the sysfs mechanism provides more flexibility, it may > > be a > > > > good idea to start with a simple static parameter, and have the first > > > > patches as simple as possible. What do you think? > > > > > > > > 3.Add virtqueue polling mode to vhost > > > > Have the vhost thread poll the virtqueues with high I/O rate
Re: Elvis upstreaming plan
Gleb Natapov wrote on 27/11/2013 09:35:01 AM: > From: Gleb Natapov > To: Jason Wang , > Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, > anth...@codemonkey.ws, "Michael S. Tsirkin" , > pbonz...@redhat.com, as...@redhat.com, digitale...@google.com, > abel.gor...@gmail.com, Abel Gordon/Haifa/IBM@IBMIL, Eran Raichstein/ > Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, b...@redhat.com > Date: 27/11/2013 11:35 AM > Subject: Re: Elvis upstreaming plan > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > > 4. vhost statistics > > > This patch introduces a set of statistics to monitor different > performance > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > statistics are exposed using debugfs and can be easily displayed with a > > > Python script (vhost_stat, based on the old kvm_stats) > > > https://github.com/abelg/virtual_io_acceleration/commit/ > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > How about using trace points instead? Besides statistics, it can also > > help more in debugging. > Definitely. kvm_stats has moved to ftrace long time ago. > > -- > Gleb. > Ok - we will look at this newer mechanism. Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic56195.gif)IBM <>
Re: Elvis upstreaming plan
Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: > From: "Michael S. Tsirkin" > To: Abel Gordon/Haifa/IBM@IBMIL, > Cc: Anthony Liguori , abel.gor...@gmail.com, > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ > Haifa/IBM@IBMIL > Date: 27/11/2013 01:08 AM > Subject: Re: Elvis upstreaming plan > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > > > > Anthony Liguori wrote on 26/11/2013 08:05:00 PM: > > > > > > > > Razya Ladelsky writes: > > > > > > > That's why we are proposing to implement a mechanism that will enable > > the management stack to configure 1 thread per I/O device (as it is today) > > or 1 thread for many I/O devices (belonging to the same VM). > > > > > Once you are scheduling multiple guests in a single vhost device, you > > > now create a whole new class of DoS attacks in the best case scenario. > > > > Again, we are NOT proposing to schedule multiple guests in a single > > vhost thread. We are proposing to schedule multiple devices belonging > > to the same guest in a single (or multiple) vhost thread/s. > > > > I guess a question then becomes why have multiple devices? If you mean "why serve multiple devices from a single thread" the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic31578.gif)IBM > > > > Hi all, > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > > ELVIS slides: > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > > > > According to the discussions that took place at the forum, upstreaming > > > > some of the Elvis approaches seems to be a good idea, which we would > > like > > > > to pursue. > > > > > > > > Our plan for the first patches is the following: > > > > > > > > 1.Shared vhost thread between mutiple devices > > > > This patch creates a worker thread and worker queue shared across > > multiple > > > > virtio devices > > > > We would like to modify the patch posted in > > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > > to limit a vhost thread to serve multiple devices only if they belong > > to > > > > the same VM as Paolo s
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: > > 4. vhost statistics > > This patch introduces a set of statistics to monitor different performance > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > statistics are exposed using debugfs and can be easily displayed with a > > Python script (vhost_stat, based on the old kvm_stats) > > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > How about using trace points instead? Besides statistics, it can also > help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On 11/24/2013 05:22 PM, Razya Ladelsky wrote: > Hi all, > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > developed Elvis, presented by Abel Gordon at the last KVM forum: > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > According to the discussions that took place at the forum, upstreaming > some of the Elvis approaches seems to be a good idea, which we would like > to pursue. > > Our plan for the first patches is the following: > > 1.Shared vhost thread between mutiple devices > This patch creates a worker thread and worker queue shared across multiple > virtio devices > We would like to modify the patch posted in > https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > Another modification is related to the creation and removal of vhost > threads, which will be discussed next. > > 2. Sysfs mechanism to add and remove vhost threads > This patch allows us to add and remove vhost threads dynamically. > > A simpler way to control the creation of vhost threads is statically > determining the maximum number of virtio devices per worker via a kernel > module parameter (which is the way the previously mentioned patch is > currently implemented) Any chance we can re-use the cwmq instead of inventing another mechanism? Looks like there're lots of function duplication here. Bandan has an RFC to do this. > > I'd like to ask for advice here about the more preferable way to go: > Although having the sysfs mechanism provides more flexibility, it may be a > good idea to start with a simple static parameter, and have the first > patches as simple as possible. What do you think? > > 3.Add virtqueue polling mode to vhost > Have the vhost thread poll the virtqueues with high I/O rate for new > buffers , and avoid asking the guest to kick us. > https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 Maybe we can make poll_stop_idle adaptive which may help the light load case. Consider guest is often slow than vhost, if we just have one or two vms, polling too much may waste cpu in this case. > 4. vhost statistics > This patch introduces a set of statistics to monitor different performance > metrics of vhost and our polling and I/O scheduling mechanisms. The > statistics are exposed using debugfs and can be easily displayed with a > Python script (vhost_stat, based on the old kvm_stats) > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. > > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > This patch improves the handling of the requests by the vhost thread, but > could perhaps be delayed to a > later time , and not submitted as one of the first Elvis patches. > I'd love to hear some comments about whether this patch needs to be part > of the first submission. > > Any other feedback on this plan will be appreciated, > Thank you, > Razya > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Razya Ladelsky writes: > Hi all, > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > developed Elvis, presented by Abel Gordon at the last KVM forum: > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > According to the discussions that took place at the forum, upstreaming > some of the Elvis approaches seems to be a good idea, which we would like > to pursue. > > Our plan for the first patches is the following: > > 1.Shared vhost thread between mutiple devices > This patch creates a worker thread and worker queue shared across multiple > virtio devices > We would like to modify the patch posted in > https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > Another modification is related to the creation and removal of vhost > threads, which will be discussed next. > > 2. Sysfs mechanism to add and remove vhost threads > This patch allows us to add and remove vhost threads dynamically. > > A simpler way to control the creation of vhost threads is statically > determining the maximum number of virtio devices per worker via a kernel > module parameter (which is the way the previously mentioned patch is > currently implemented) Does the sysfs interface aim to let the _user_ control the maximum number of devices per vhost thread or/and let the user create and destroy worker threads at will ? Setting the limit on the number of devices makes sense but I am not sure if there is any reason to actually expose an interface to create or destroy workers. Also, it might be worthwhile to think if it's better to just let the worker thread stay around (hoping it might be used again in the future) rather then destroying it.. > I'd like to ask for advice here about the more preferable way to go: > Although having the sysfs mechanism provides more flexibility, it may be a > good idea to start with a simple static parameter, and have the first > patches as simple as possible. What do you think? I am actually inclined more towards a static limit. I think that in a typical setup, the user will set this for his/her environment just once at load time and forget about it. Bandan > 3.Add virtqueue polling mode to vhost > Have the vhost thread poll the virtqueues with high I/O rate for new > buffers , and avoid asking the guest to kick us. > https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 > > 4. vhost statistics > This patch introduces a set of statistics to monitor different performance > metrics of vhost and our polling and I/O scheduling mechanisms. The > statistics are exposed using debugfs and can be easily displayed with a > Python script (vhost_stat, based on the old kvm_stats) > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > This patch improves the handling of the requests by the vhost thread, but > could perhaps be delayed to a > later time , and not submitted as one of the first Elvis patches. > I'd love to hear some comments about whether this patch needs to be part > of the first submission. > > Any other feedback on this plan will be appreciated, > Thank you, > Razya > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: > > > Anthony Liguori wrote on 26/11/2013 08:05:00 PM: > > > > > Razya Ladelsky writes: > > > > > Hi all, > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > > ELVIS slides: > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > > > > According to the discussions that took place at the forum, upstreaming > > > some of the Elvis approaches seems to be a good idea, which we would > like > > > to pursue. > > > > > > Our plan for the first patches is the following: > > > > > > 1.Shared vhost thread between mutiple devices > > > This patch creates a worker thread and worker queue shared across > multiple > > > virtio devices > > > We would like to modify the patch posted in > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > > to limit a vhost thread to serve multiple devices only if they belong > to > > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > > > Another modification is related to the creation and removal of vhost > > > threads, which will be discussed next. > > > > I think this is an exceptionally bad idea. > > > > We shouldn't throw away isolation without exhausting every other > > possibility. > > Seems you have missed the important details here. > Anthony, we are aware you are concerned about isolation > and you believe we should not share a single vhost thread across > multiple VMs. That's why Razya proposed to change the patch > so we will serve multiple virtio devices using a single vhost thread > "only if the devices belong to the same VM". This series of patches > will not allow two different VMs to share the same vhost thread. > So, I don't see why this will be throwing away isolation and why > this could be a "exceptionally bad idea". > > By the way, I remember that during the KVM forum a similar > approach of having a single data plane thread for many devices > was discussed > > We've seen very positive results from adding threads. We should also > > look at scheduling. > > ...and we have also seen exceptionally negative results from > adding threads, both for vhost and data-plane. If you have lot of idle > time/cores > then it makes sense to run multiple threads. But IMHO in many scenarios you > don't have lot of idle time/cores.. and if you have them you would probably > prefer to run more VMs/VCPUshosting a single SMP VM when you have > enough physical cores to run all the VCPU threads and the I/O threads is > not a > realistic scenario. > > That's why we are proposing to implement a mechanism that will enable > the management stack to configure 1 thread per I/O device (as it is today) > or 1 thread for many I/O devices (belonging to the same VM). > > > Once you are scheduling multiple guests in a single vhost device, you > > now create a whole new class of DoS attacks in the best case scenario. > > Again, we are NOT proposing to schedule multiple guests in a single > vhost thread. We are proposing to schedule multiple devices belonging > to the same guest in a single (or multiple) vhost thread/s. > I guess a question then becomes why have multiple devices? > > > > > 2. Sysfs mechanism to add and remove vhost threads > > > This patch allows us to add and remove vhost threads dynamically. > > > > > > A simpler way to control the creation of vhost threads is statically > > > determining the maximum number of virtio devices per worker via a > kernel > > > module parameter (which is the way the previously mentioned patch is > > > currently implemented) > > > > > > I'd like to ask for advice here about the more preferable way to go: > > > Although having the sysfs mechanism provides more flexibility, it may > be a > > > good idea to start with a simple static parameter, and have the first > > > patches as simple as possible. What do you think? > > > > > > 3.Add virtqueue polling mode to vhost > > > Have the vhost thread poll the virtqueues with high I/O rate for new > > > buffers , and avoid asking the guest to kick us. > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > 26616133fafb7855cc80fac070b0572fd1aaf5d0 > > > > Ack on this. > > :) > > Regards, > Abel. > > > > > Regards, > > > > Anthony Liguori > > > > > 4. vhost statistics > > > This patch introduces a set of statistics to monitor different > performance > > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > > statistics are exposed using debugfs and can be easily displayed with a > > > > Python script (vhost_stat, based on the old kvm_stats) > > > https://github.com/abelg/virtual_io_acceleration/commit/ > > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > > > > 5. Add heuristics to improve I/O scheduling > > > This patch enhances
Re: Elvis upstreaming plan
Anthony Liguori wrote on 26/11/2013 08:05:00 PM: > > Razya Ladelsky writes: > > > Hi all, > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > According to the discussions that took place at the forum, upstreaming > > some of the Elvis approaches seems to be a good idea, which we would like > > to pursue. > > > > Our plan for the first patches is the following: > > > > 1.Shared vhost thread between mutiple devices > > This patch creates a worker thread and worker queue shared across multiple > > virtio devices > > We would like to modify the patch posted in > > https://github.com/abelg/virtual_io_acceleration/commit/ > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > Another modification is related to the creation and removal of vhost > > threads, which will be discussed next. > > I think this is an exceptionally bad idea. > > We shouldn't throw away isolation without exhausting every other > possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread "only if the devices belong to the same VM". This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a "exceptionally bad idea". By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed > We've seen very positive results from adding threads. We should also > look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). > Once you are scheduling multiple guests in a single vhost device, you > now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. > > > 2. Sysfs mechanism to add and remove vhost threads > > This patch allows us to add and remove vhost threads dynamically. > > > > A simpler way to control the creation of vhost threads is statically > > determining the maximum number of virtio devices per worker via a kernel > > module parameter (which is the way the previously mentioned patch is > > currently implemented) > > > > I'd like to ask for advice here about the more preferable way to go: > > Although having the sysfs mechanism provides more flexibility, it may be a > > good idea to start with a simple static parameter, and have the first > > patches as simple as possible. What do you think? > > > > 3.Add virtqueue polling mode to vhost > > Have the vhost thread poll the virtqueues with high I/O rate for new > > buffers , and avoid asking the guest to kick us. > > https://github.com/abelg/virtual_io_acceleration/commit/ > 26616133fafb7855cc80fac070b0572fd1aaf5d0 > > Ack on this. :) Regards, Abel. > > Regards, > > Anthony Liguori > > > 4. vhost statistics > > This patch introduces a set of statistics to monitor different performance > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > statistics are exposed using debugfs and can be easily displayed with a > > Python script (vhost_stat, based on the old kvm_stats) > > https://github.com/abelg/virtual_io_acceleration/commit/ > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > 5. Add heuristics to improve I/O scheduling > > This patch enhances the round-robin mechanism with a set of heuristics to > > decide when to leave a virtqueue and proceed to the next. > > https://github.com/abelg/virtual_io_acceleration/commit/ > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > > > This patch improves the handling of the requests by the vhost thread, but > > could perhaps be delayed to a > > later time , and not submitted as
Re: Elvis upstreaming plan
Razya Ladelsky writes: > Hi all, > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > developed Elvis, presented by Abel Gordon at the last KVM forum: > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > According to the discussions that took place at the forum, upstreaming > some of the Elvis approaches seems to be a good idea, which we would like > to pursue. > > Our plan for the first patches is the following: > > 1.Shared vhost thread between mutiple devices > This patch creates a worker thread and worker queue shared across multiple > virtio devices > We would like to modify the patch posted in > https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > Another modification is related to the creation and removal of vhost > threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. We've seen very positive results from adding threads. We should also look at scheduling. Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. > 2. Sysfs mechanism to add and remove vhost threads > This patch allows us to add and remove vhost threads dynamically. > > A simpler way to control the creation of vhost threads is statically > determining the maximum number of virtio devices per worker via a kernel > module parameter (which is the way the previously mentioned patch is > currently implemented) > > I'd like to ask for advice here about the more preferable way to go: > Although having the sysfs mechanism provides more flexibility, it may be a > good idea to start with a simple static parameter, and have the first > patches as simple as possible. What do you think? > > 3.Add virtqueue polling mode to vhost > Have the vhost thread poll the virtqueues with high I/O rate for new > buffers , and avoid asking the guest to kick us. > https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. Regards, Anthony Liguori > 4. vhost statistics > This patch introduces a set of statistics to monitor different performance > metrics of vhost and our polling and I/O scheduling mechanisms. The > statistics are exposed using debugfs and can be easily displayed with a > Python script (vhost_stat, based on the old kvm_stats) > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > This patch improves the handling of the requests by the vhost thread, but > could perhaps be delayed to a > later time , and not submitted as one of the first Elvis patches. > I'd love to hear some comments about whether this patch needs to be part > of the first submission. > > Any other feedback on this plan will be appreciated, > Thank you, > Razya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch should probably do something portable instead of relying on x86-only rdtscll(). Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
"Michael S. Tsirkin" wrote on 24/11/2013 12:26:15 PM: > From: "Michael S. Tsirkin" > To: Razya Ladelsky/Haifa/IBM@IBMIL, > Cc: kvm@vger.kernel.org, anth...@codemonkey.ws, g...@redhat.com, > pbonz...@redhat.com, as...@redhat.com, jasow...@redhat.com, > digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/ > IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL > Date: 24/11/2013 12:22 PM > Subject: Re: Elvis upstreaming plan > > On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: > > Hi all, > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > > developed Elvis, presented by Abel Gordon at the last KVM forum: > > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > > > > According to the discussions that took place at the forum, upstreaming > > some of the Elvis approaches seems to be a good idea, which we would like > > to pursue. > > > > Our plan for the first patches is the following: > > > > 1.Shared vhost thread between mutiple devices > > This patch creates a worker thread and worker queue shared across multiple > > virtio devices > > We would like to modify the patch posted in > > https://github.com/abelg/virtual_io_acceleration/commit/ > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > > > Another modification is related to the creation and removal of vhost > > threads, which will be discussed next. > > > > 2. Sysfs mechanism to add and remove vhost threads > > This patch allows us to add and remove vhost threads dynamically. > > > > A simpler way to control the creation of vhost threads is statically > > determining the maximum number of virtio devices per worker via a kernel > > module parameter (which is the way the previously mentioned patch is > > currently implemented) > > > > I'd like to ask for advice here about the more preferable way to go: > > Although having the sysfs mechanism provides more flexibility, it may be a > > good idea to start with a simple static parameter, and have the first > > patches as simple as possible. What do you think? > > > > 3.Add virtqueue polling mode to vhost > > Have the vhost thread poll the virtqueues with high I/O rate for new > > buffers , and avoid asking the guest to kick us. > > https://github.com/abelg/virtual_io_acceleration/commit/ > 26616133fafb7855cc80fac070b0572fd1aaf5d0 > > > > 4. vhost statistics > > This patch introduces a set of statistics to monitor different performance > > metrics of vhost and our polling and I/O scheduling mechanisms. The > > statistics are exposed using debugfs and can be easily displayed with a > > Python script (vhost_stat, based on the old kvm_stats) > > https://github.com/abelg/virtual_io_acceleration/commit/ > ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > > > > 5. Add heuristics to improve I/O scheduling > > This patch enhances the round-robin mechanism with a set of heuristics to > > decide when to leave a virtqueue and proceed to the next. > > https://github.com/abelg/virtual_io_acceleration/commit/ > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > > > This patch improves the handling of the requests by the vhost thread, but > > could perhaps be delayed to a > > later time , and not submitted as one of the first Elvis patches. > > I'd love to hear some comments about whether this patch needs to be part > > of the first submission. > > > > Any other feedback on this plan will be appreciated, > > Thank you, > > Razya > > > How about we start with the stats patch? > This will make it easier to evaluate the other patches. > Hi Michael, Thank you for your quick reply. Our plan was to send all these patches that contain the Elvis code. We can start with the stats patch, however, many of the statistics there are related to the features that the other patches provide... B.T.W. If you a chance to look at the rest of the patches, I'd really appreciate your comments, Thank you very much, Razya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: > Hi all, > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > developed Elvis, presented by Abel Gordon at the last KVM forum: > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > According to the discussions that took place at the forum, upstreaming > some of the Elvis approaches seems to be a good idea, which we would like > to pursue. > > Our plan for the first patches is the following: > > 1.Shared vhost thread between mutiple devices > This patch creates a worker thread and worker queue shared across multiple > virtio devices > We would like to modify the patch posted in > https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > Another modification is related to the creation and removal of vhost > threads, which will be discussed next. > > 2. Sysfs mechanism to add and remove vhost threads > This patch allows us to add and remove vhost threads dynamically. > > A simpler way to control the creation of vhost threads is statically > determining the maximum number of virtio devices per worker via a kernel > module parameter (which is the way the previously mentioned patch is > currently implemented) > > I'd like to ask for advice here about the more preferable way to go: > Although having the sysfs mechanism provides more flexibility, it may be a > good idea to start with a simple static parameter, and have the first > patches as simple as possible. What do you think? > > 3.Add virtqueue polling mode to vhost > Have the vhost thread poll the virtqueues with high I/O rate for new > buffers , and avoid asking the guest to kick us. > https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 > > 4. vhost statistics > This patch introduces a set of statistics to monitor different performance > metrics of vhost and our polling and I/O scheduling mechanisms. The > statistics are exposed using debugfs and can be easily displayed with a > Python script (vhost_stat, based on the old kvm_stats) > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > This patch improves the handling of the requests by the vhost thread, but > could perhaps be delayed to a > later time , and not submitted as one of the first Elvis patches. > I'd love to hear some comments about whether this patch needs to be part > of the first submission. > > Any other feedback on this plan will be appreciated, > Thank you, > Razya How about we start with the stats patch? This will make it easier to evaluate the other patches. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html