Re: Elvis upstreaming plan
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote: Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. We are NOT bypassing existing components. We are just changing the threading model: instead of having one vhost-thread per virtio device, we propose to use 1 vhost thread to server devices belonging to the same VM. In addition, we propose to add new features such as polling. What I meant with bypassing is that reducing scope to single VMs leaves multi-VM performance unchanged. I know the original aim was to improve multi-VM performance too and I hope that will be possible by extending the current approach. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Anthony Liguori anth...@codemonkey.ws wrote on 28/11/2013 12:33:36 AM: From: Anthony Liguori anth...@codemonkey.ws To: Abel Gordon/Haifa/IBM@IBMIL, Michael S. Tsirkin m...@redhat.com, Cc: abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL Date: 28/11/2013 12:33 AM Subject: Re: Elvis upstreaming plan Abel Gordon ab...@il.ibm.com writes: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. Paul Turner has a proposal for exactly this: http://www.linuxplumbersconf.org/2013/ocw/sessions/1653 The video is up on Youtube I think. It definitely is a general problem that is not at all virtual I/O specific. Interesting, thanks for sharing. If you have a link
Re: Elvis upstreaming plan
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote: Isolation is important but the question is what isolation means ? Mostly two things: - Count resource usage against the correct cgroups, and limit it as appropriate - If one user does something silly and is blocked, another user isn't affected -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. :) Regards, Abel. Regards, Anthony
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. I'm a bit concerned about interface managing specific threads being so low level. What exactly is it that management knows that makes it efficient to group threads together? That host is over-committed so we should use less CPU? I'd like the interface to express that knowledge. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the
Re: Elvis upstreaming plan
Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM: On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. We should use trace points for debugging information but IMHO we should have a dedicated (and different) mechanism to expose data that can be easily consumed by a user-space (policy) application to control how many vhost threads we need or any other vhost feature we may introduce (e.g. polling). That's why we proposed something like vhost_stat based on sysfs. This is not like kvm_stat that can be replaced with tracepoints. Here we will like to expose data to control the system. So I would say what we are trying to do something that resembles the ksm interface implemented under /sys/kernel/mm/ksm/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM: On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. We should use trace points for debugging information but IMHO we should have a dedicated (and different) mechanism to expose data that can be easily consumed by a user-space (policy) application to control how many vhost threads we need or any other vhost feature we may introduce (e.g. polling). That's why we proposed something like vhost_stat based on sysfs. This is not like kvm_stat that can be replaced with tracepoints. Here we will like to expose data to control the system. So I would say what we are trying to do something that resembles the ksm interface implemented under /sys/kernel/mm/ksm/ There are control operation and there are performance/statistic gathering operations use /sys for former and ftrace for later. The fact that you need /sys interface for other things does not mean you can abuse it for statistics too. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Gleb Natapov g...@redhat.com wrote on 27/11/2013 11:21:59 AM: On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM: On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. We should use trace points for debugging information but IMHO we should have a dedicated (and different) mechanism to expose data that can be easily consumed by a user-space (policy) application to control how many vhost threads we need or any other vhost feature we may introduce (e.g. polling). That's why we proposed something like vhost_stat based on sysfs. This is not like kvm_stat that can be replaced with tracepoints. Here we will like to expose data to control the system. So I would say what we are trying to do something that resembles the ksm interface implemented under /sys/kernel/mm/ksm/ There are control operation and there are performance/statistic gathering operations use /sys for former and ftrace for later. The fact that you need /sys interface for other things does not mean you can abuse it for statistics too. Agree. Any statistics that we add for debugging purposes should be implemented using tracepoints. But control and related data interfaces (that are not for debugging purposes) should be in sysfs. Look for example at /sys/kernel/mm/ksm/full_scans /sys/kernel/mm/ksm/pages_shared /sys/kernel/mm/ksm/pages_sharing /sys/kernel/mm/ksm/pages_to_scan /sys/kernel/mm/ksm/pages_unshared /sys/kernel/mm/ksm/pages_volatile -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:33:19AM +0200, Abel Gordon wrote: Gleb Natapov g...@redhat.com wrote on 27/11/2013 11:21:59 AM: On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote: Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM: On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. We should use trace points for debugging information but IMHO we should have a dedicated (and different) mechanism to expose data that can be easily consumed by a user-space (policy) application to control how many vhost threads we need or any other vhost feature we may introduce (e.g. polling). That's why we proposed something like vhost_stat based on sysfs. This is not like kvm_stat that can be replaced with tracepoints. Here we will like to expose data to control the system. So I would say what we are trying to do something that resembles the ksm interface implemented under /sys/kernel/mm/ksm/ There are control operation and there are performance/statistic gathering operations use /sys for former and ftrace for later. The fact that you need /sys interface for other things does not mean you can abuse it for statistics too. Agree. Any statistics that we add for debugging purposes should be implemented using tracepoints. But control and related data interfaces (that are not for debugging purposes) should be in sysfs. Look for example at Yes things that are not for statistics only and part of control interface that management will use should not use ftrace (I do not think adding more knobs is a good idea, but this is for vhost maintainer to decide), but ksm predates ftrace, so some things below could have been implemented as ftrace points. /sys/kernel/mm/ksm/full_scans /sys/kernel/mm/ksm/pages_shared /sys/kernel/mm/ksm/pages_sharing /sys/kernel/mm/ksm/pages_to_scan /sys/kernel/mm/ksm/pages_unshared /sys/kernel/mm/ksm/pages_volatile -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM: On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. I'm a bit concerned about interface managing specific threads being so low level. What exactly is it that management knows that makes it efficient to group threads together? That host is over-committed so we should use less CPU? I'd like the interface to express that knowledge. We can expose information such as the amount of I/O being handled for each queue, the amount of CPU cycles consumed for processing the I/O, latency and more. If
Re: Elvis upstreaming plan
Jason Wang jasow...@redhat.com wrote on 27/11/2013 04:49:20 AM: On 11/24/2013 05:22 PM, Razya Ladelsky wrote: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) Any chance we can re-use the cwmq instead of inventing another mechanism? Looks like there're lots of function duplication here. Bandan has an RFC to do this. Thanks for the suggestion. We should certainly take a look at Bandan's patches which I guess are: http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html My only concern here is that we may not be able to easily implement our polling mechanism and heuristics with cwmq. I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 Maybe we can make poll_stop_idle adaptive which may help the light load case. Consider guest is often slow than vhost, if we just have one or two vms, polling too much may waste cpu in this case. Yes, make polling adaptive based on the amount of wasted cycles (cycles we did polling but didn't find new work) and I/O rate is a very good idea. Note we already measure and expose these values but we do not use them to adapt the polling mechanism. Having said that, note that adaptive polling may be a bit tricky. Remember that the cycles we waste polling in the vhost thread actually improves the performance of the vcpu threads because the guest is no longer require to kick (pio==exit) the host when vhost does polling. So even if we waste cycles in the vhost thread, we are saving cycles in the vcpu thread and improving performance. 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Yep, we just had a discussion with Gleb about this :) 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/ f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic39571.gif)IBM Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM: On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. I'm a bit concerned about interface managing specific threads being so low level. What exactly is it that management knows that makes it efficient to group threads together? That host is over-committed so we should use less CPU? I'd
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:18:51PM +0200, Abel Gordon wrote: Jason Wang jasow...@redhat.com wrote on 27/11/2013 04:49:20 AM: On 11/24/2013 05:22 PM, Razya Ladelsky wrote: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) Any chance we can re-use the cwmq instead of inventing another mechanism? Looks like there're lots of function duplication here. Bandan has an RFC to do this. Thanks for the suggestion. We should certainly take a look at Bandan's patches which I guess are: http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html My only concern here is that we may not be able to easily implement our polling mechanism and heuristics with cwmq. It's not so hard, to poll you just requeue work to make sure it's re-invoked. I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 Maybe we can make poll_stop_idle adaptive which may help the light load case. Consider guest is often slow than vhost, if we just have one or two vms, polling too much may waste cpu in this case. Yes, make polling adaptive based on the amount of wasted cycles (cycles we did polling but didn't find new work) and I/O rate is a very good idea. Note we already measure and expose these values but we do not use them to adapt the polling mechanism. Having said that, note that adaptive polling may be a bit tricky. Remember that the cycles we waste polling in the vhost thread actually improves the performance of the vcpu threads because the guest is no longer require to kick (pio==exit) the host when vhost does polling. So even if we waste cycles in the vhost thread, we are saving cycles in the vcpu thread and improving performance. So my suggestion would be: - guest runs some kicks - measures how long it took, e.g. kick = T cycles - sends this info to host host polls for at most fraction * T cycles 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Yep, we just had a discussion with Gleb about this :) 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/ f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic39571.gif)IBM Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM: On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM: On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device. Remember the experiments Shirley Ma did with the split tx/rx ? If we have a control interface we could support both approaches: different threads or a single thread. I'm a bit concerned about interface
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). Sorry, If it's unsafe we can't merge it even if it's experimental. But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. How long is easy - just call schedule. When sounds like specifying a deadline which sounds like a reasonable fit to how scheduler works now. Certainly adding an in-kernel API sounds like a better approach than a bunch of user-visible ones. So I'm not at all saying we need to change the scheduler - it's more adding APIs to existing functionality. Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic39571.gif)IBM Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM: On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM: On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread shared for all the queues, 1 vhost thread for each queue or a few threads for multiple queues. We could also share a thread across multiple queues even if they do not belong to the same device.
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:59:38 PM: On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). Sorry, If it's unsafe we can't merge it even if it's experimental. But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. How long is easy - just call schedule. When sounds like specifying a deadline which sounds like a reasonable fit to how scheduler works now. ... but when you should call schedule actually depends on the I/O activity of the queues. The patches we shared constantly monitor the virtio rings (pending items and for how long they are pending there) to decide if we should continue processing the same queue or switch to other queue. Certainly adding an in-kernel API sounds like a better approach than a bunch of user
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 01:03:25 PM: On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM: On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM: On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? I assume that there are guests that have multiple vhost devices (net or scsi/tcm). These are kind of uncommon though. In fact a kernel thread is not a unit of isolation - cgroups supply isolation. If we had use_cgroups kind of like use_mm, we could thinkably do work for multiple VMs on the same thread. We can also extend the approach to consider multiqueue devices, so we can create 1 vhost thread
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:59:38 PM: On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). Sorry, If it's unsafe we can't merge it even if it's experimental. But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. How long is easy - just call schedule. When sounds like specifying a deadline which sounds like a reasonable fit to how scheduler works now. ... but when you should call schedule actually depends on the I/O activity of the queues. The patches we shared constantly monitor the virtio rings (pending items
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 01:05:40PM +0200, Abel Gordon wrote: (CCing Eyal Moscovici who is actually prototyping with multiple policies and may want to join this thread) Starting with basic policies: we can use a single vhost thread and create new vhost threads if it becomes saturated and there are enough cpu cycles available in the system or if the latency (how long the requests in the virtio queues wait until they are handled) is too high. We can merge threads if the latency is already low or if the threads are not saturated. There is a hidden trade-off here: when you run more vhost threads you may actually be stealing cpu cycles from the vcpu threads and also increasing context switches. So, from the vhost perspective it may improve performance but from the vcpu threads perspective it may degrade performance. So this is a very interesting problem to solve but what does management know that suggests it can solve it better? Yep, and Eyal is currently working on this. What the management knows ? depends who the management is :) Could be just I/O activity (black-box: I/O request rate, I/O handling rate, latency) We know much more about this than managament, don't we? or application performance (white-box). This would have to come with a proposal for getting this white-box info out of guest somehow. -- MSR -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. 2. The new ELVIS code which only affects vhost devices in the same VM. If you split the code paths it results in more effort in the long run and the benefit seems quite limited once you acknowledge that isolation is important. Isn't the sane thing to do taking lessons from ELVIS improving existing pieces instead of bypassing them? That way both the single VM and host-wide performance improves. And as a bonus non-virtualization use cases may also benefit. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 04:00:53PM +0100, Stefan Hajnoczi wrote: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. 2. The new ELVIS code which only affects vhost devices in the same VM. If you split the code paths it results in more effort in the long run and the benefit seems quite limited once you acknowledge that isolation is important. Isn't the sane thing to do taking lessons from ELVIS improving existing pieces instead of bypassing them? That way both the single VM and host-wide performance improves. And as a bonus non-virtualization use cases may also benefit. Stefan I'm not sure about that. elvis is all about specific behaviour patterns that are virtualization specific, and general claims that we can improve scheduler for all workloads seem somewhat optimistic. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Abel Gordon ab...@il.ibm.com writes: Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab So all this would sound more convincing if we had sharing between VMs. When it's only a single VM it's somehow less convincing, isn't it? Of course if we would bypass a scheduler like this it becomes harder to enforce cgroup limits. True, but here the issue becomes isolation/cgroups. We can start to show the value for VMs that have multiple devices / queues and then we could re-consider extending the mechanism for multiple VMs (at least as a experimental feature). But it might be easier to give scheduler the info it needs to do what we need. Would an API that basically says run this kthread right now do the trick? ...do you really believe it would be possible to push this kind of change to the Linux scheduler ? In addition, we need more than run this kthread right now because you need to monitor the virtio ring activity to specify when you will like to run a specific kthread and for how long. Paul Turner has a proposal for exactly this: http://www.linuxplumbersconf.org/2013/ocw/sessions/1653 The video is up on Youtube I think. It definitely is a general problem that is not at all virtual I/O specific. Regards, Anthony Liguori Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic39571.gif)IBM Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would
Re: Elvis upstreaming plan
Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM: From: Stefan Hajnoczi stefa...@gmail.com To: Joel Nider/Haifa/IBM@IBMIL, Cc: Michael S. Tsirkin m...@redhat.com, Abel Gordon/Haifa/ IBM@IBMIL, abel.gor...@gmail.com, Anthony Liguori anth...@codemonkey.ws, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL Date: 27/11/2013 05:00 PM Subject: Re: Elvis upstreaming plan On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. 2. The new ELVIS code which only affects vhost devices in the same VM. If you split the code paths it results in more effort in the long run and the benefit seems quite limited once you acknowledge that isolation is important. Yes you are correct that there are now 2 performance-critical pieces of code. However what we are proposing is just proper module decoupling. I believe you will be hard pressed to make a good case that all of this logic could be integrated into the Linux thread scheduler more efficiently. Think of this as an I/O scheduler for virtualized guests. I don't believe anyone would try to integrate the Linux I/O schedulers with the Linux thread scheduler, even though they are both performance-critical modules? Even if we were to take the route of using these principles to improve the existing scheduler, I have to ask: which scheduler? If we spend this effort on CFS (completely fair scheduler) but then someone switches their thread scheduler to O(1) or some other scheduler, all of our advantage would be lost. We would then have to reimplement for every possible thread scheduler. I don't agree that we are losing isolation, even if you go with the full ELVIS which was originally proposed. But that is a discussion for another day. For now, let's agree that in this reduced ELVIS solution, no isolation is lost, since each vhost thread is only dealing with I/O from the same guest. As for more effort - for whom do you mean? Development time? Maintenance effort? CPU time? I would say all of those are actually less effort in the long run. Dividing responsibility between modules with well-defined interfaces reduces both development and maintenance effort. If we were to modify the thread scheduler
Re: Elvis upstreaming plan
Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM: On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. And now there are 2 performance-critical pieces that need to be optimized/tuned instead of just 1: 1. Kernel infrastructure that QEMU and vhost use today but you decided to bypass. We are NOT bypassing existing components. We are just changing the threading model: instead of having one vhost-thread per virtio device, we propose to use 1 vhost thread to server devices belonging to the same VM. In addition, we propose to add new features such as polling. 2. The new ELVIS code which only affects vhost devices in the same VM. Also existent vhost code (or any other user-space back-end) should be optimized/tuned if you care about performance. If you split the code paths it results in more effort in the long run and the benefit seems quite limited once you acknowledge that isolation is important. Isolation is important but the question is what isolation means ? I personally don't believe that 2 kernel threads provide more isolation than 1 kernel threat that changes the mm (use_mm) and avoids queue starvation. Anyway, we propose to start with the simple approach (not sharing threads across VMs) but once we show the value for this case we can discuss if it makes sense to extend the approach and share threads between different VMs. Isn't the sane thing to do taking lessons from ELVIS improving existing pieces instead of bypassing them? That way both the single VM and host-wide performance improves. And as a bonus non-virtualization use cases may also benefit. The model we are proposing are specific for I/O virtualization... not sure if they are applicable to bare-metal. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch should probably do something portable instead of relying on x86-only rdtscll(). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. We've seen very positive results from adding threads. We should also look at scheduling. Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. Regards, Anthony Liguori 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. :) Regards, Abel. Regards, Anthony Liguori 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/ f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be
Re: Elvis upstreaming plan
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we will serve multiple virtio devices using a single vhost thread only if the devices belong to the same VM. This series of patches will not allow two different VMs to share the same vhost thread. So, I don't see why this will be throwing away isolation and why this could be a exceptionally bad idea. By the way, I remember that during the KVM forum a similar approach of having a single data plane thread for many devices was discussed We've seen very positive results from adding threads. We should also look at scheduling. ...and we have also seen exceptionally negative results from adding threads, both for vhost and data-plane. If you have lot of idle time/cores then it makes sense to run multiple threads. But IMHO in many scenarios you don't have lot of idle time/cores.. and if you have them you would probably prefer to run more VMs/VCPUshosting a single SMP VM when you have enough physical cores to run all the VCPU threads and the I/O threads is not a realistic scenario. That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. :) Regards, Abel. Regards, Anthony Liguori 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/
Re: Elvis upstreaming plan
Razya Ladelsky ra...@il.ibm.com writes: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) Does the sysfs interface aim to let the _user_ control the maximum number of devices per vhost thread or/and let the user create and destroy worker threads at will ? Setting the limit on the number of devices makes sense but I am not sure if there is any reason to actually expose an interface to create or destroy workers. Also, it might be worthwhile to think if it's better to just let the worker thread stay around (hoping it might be used again in the future) rather then destroying it.. I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? I am actually inclined more towards a static limit. I think that in a typical setup, the user will set this for his/her environment just once at load time and forget about it. Bandan 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On 11/24/2013 05:22 PM, Razya Ladelsky wrote: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) Any chance we can re-use the cwmq instead of inventing another mechanism? Looks like there're lots of function duplication here. Bandan has an RFC to do this. I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 Maybe we can make poll_stop_idle adaptive which may help the light load case. Consider guest is often slow than vhost, if we just have one or two vms, polling too much may waste cpu in this case. 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Hi, Razya is out for a few days, so I will try to answer the questions as well as I can: Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM: From: Michael S. Tsirkin m...@redhat.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ Haifa/IBM@IBMIL Date: 27/11/2013 01:08 AM Subject: Re: Elvis upstreaming plan On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM: Razya Ladelsky ra...@il.ibm.com writes: edit That's why we are proposing to implement a mechanism that will enable the management stack to configure 1 thread per I/O device (as it is today) or 1 thread for many I/O devices (belonging to the same VM). Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. Again, we are NOT proposing to schedule multiple guests in a single vhost thread. We are proposing to schedule multiple devices belonging to the same guest in a single (or multiple) vhost thread/s. I guess a question then becomes why have multiple devices? If you mean why serve multiple devices from a single thread the answer is that we cannot rely on the Linux scheduler which has no knowledge of I/O queues to do a decent job of scheduling I/O. The idea is to take over the I/O scheduling responsibilities from the kernel's thread scheduler with a more efficient I/O scheduler inside each vhost thread. So by combining all of the I/O devices from the same guest (disks, network cards, etc) in a single I/O thread, it allows us to provide better scheduling by giving us more knowledge of the nature of the work. So now instead of relying on the linux scheduler to perform context switches between multiple vhost threads, we have a single thread context in which we can do the I/O scheduling more efficiently. We can closely monitor the performance needs of each queue of each device inside the vhost thread which gives us much more information than relying on the kernel's thread scheduler. This does not expose any additional opportunities for attacks (DoS or other) than are already available since all of the I/O traffic belongs to a single guest. You can make the argument that with low I/O loads this mechanism may not make much difference. However when you try to maximize the utilization of your hardware (such as in a commercial scenario) this technique can gain you a large benefit. Regards, Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic31578.gif)IBM Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. Seems you have missed the important details here. Anthony, we are aware you are concerned about isolation and you believe we should not share a single vhost thread across multiple VMs. That's why Razya proposed to change the patch so we
Re: Elvis upstreaming plan
Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM: From: Gleb Natapov g...@redhat.com To: Jason Wang jasow...@redhat.com, Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, anth...@codemonkey.ws, Michael S. Tsirkin m...@redhat.com, pbonz...@redhat.com, as...@redhat.com, digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/IBM@IBMIL, Eran Raichstein/ Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, b...@redhat.com Date: 27/11/2013 11:35 AM Subject: Re: Elvis upstreaming plan On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote: 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 How about using trace points instead? Besides statistics, it can also help more in debugging. Definitely. kvm_stats has moved to ftrace long time ago. -- Gleb. Ok - we will look at this newer mechanism. Joel Nider Virtualization Research IBM Research and Development Haifa Research Lab Phone: 972-4-829-6326 | Mobile: 972-54-3155635 (Embedded image moved to file: E-mail: jo...@il.ibm.com pic56195.gif)IBM attachment: pic56195.gif
Re: Elvis upstreaming plan
Michael S. Tsirkin m...@redhat.com wrote on 24/11/2013 12:26:15 PM: From: Michael S. Tsirkin m...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL, Cc: kvm@vger.kernel.org, anth...@codemonkey.ws, g...@redhat.com, pbonz...@redhat.com, as...@redhat.com, jasow...@redhat.com, digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/ IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL Date: 24/11/2013 12:22 PM Subject: Re: Elvis upstreaming plan On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/ 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/ 26616133fafb7855cc80fac070b0572fd1aaf5d0 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/ f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya How about we start with the stats patch? This will make it easier to evaluate the other patches. Hi Michael, Thank you for your quick reply. Our plan was to send all these patches that contain the Elvis code. We can start with the stats patch, however, many of the statistics there are related to the features that the other patches provide... B.T.W. If you a chance to look at the rest of the patches, I'd really appreciate your comments, Thank you very much, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote: Hi all, I am Razya Ladelsky, I work at IBM Haifa virtualization team, which developed Elvis, presented by Abel Gordon at the last KVM forum: ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE According to the discussions that took place at the forum, upstreaming some of the Elvis approaches seems to be a good idea, which we would like to pursue. Our plan for the first patches is the following: 1.Shared vhost thread between mutiple devices This patch creates a worker thread and worker queue shared across multiple virtio devices We would like to modify the patch posted in https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 to limit a vhost thread to serve multiple devices only if they belong to the same VM as Paolo suggested to avoid isolation or cgroups concerns. Another modification is related to the creation and removal of vhost threads, which will be discussed next. 2. Sysfs mechanism to add and remove vhost threads This patch allows us to add and remove vhost threads dynamically. A simpler way to control the creation of vhost threads is statically determining the maximum number of virtio devices per worker via a kernel module parameter (which is the way the previously mentioned patch is currently implemented) I'd like to ask for advice here about the more preferable way to go: Although having the sysfs mechanism provides more flexibility, it may be a good idea to start with a simple static parameter, and have the first patches as simple as possible. What do you think? 3.Add virtqueue polling mode to vhost Have the vhost thread poll the virtqueues with high I/O rate for new buffers , and avoid asking the guest to kick us. https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 4. vhost statistics This patch introduces a set of statistics to monitor different performance metrics of vhost and our polling and I/O scheduling mechanisms. The statistics are exposed using debugfs and can be easily displayed with a Python script (vhost_stat, based on the old kvm_stats) https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 5. Add heuristics to improve I/O scheduling This patch enhances the round-robin mechanism with a set of heuristics to decide when to leave a virtqueue and proceed to the next. https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d This patch improves the handling of the requests by the vhost thread, but could perhaps be delayed to a later time , and not submitted as one of the first Elvis patches. I'd love to hear some comments about whether this patch needs to be part of the first submission. Any other feedback on this plan will be appreciated, Thank you, Razya How about we start with the stats patch? This will make it easier to evaluate the other patches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html