Re: Updated Elvis Upstreaming Roadmap
Hi, To summarize the issues raised and following steps: 1. Shared vhost thread will support multiple vms, while supporting cgroups. As soon as we have a design to support cgroups with multiple vms, we'll share it. 2. Adding vhost polling mode: this patch can be submitted independently from (1). We'll add a condition that will be checked periodically, in order to stop polling if the guest is not running (scheduled out) at that time. 3. Implement good heuristics (policies) in the vhost module for adding/removing vhost threads. We will not expose an interface to user-space at this time. Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Updated Elvis Upstreaming Roadmap
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: 4. vhost statistics The issue that was raised for the vhost statistics was using ftrace instead of the debugfs mechanism. However, looking further into the kvm stat mechanism, we learned that ftrace didn't replace the plain debugfs mechanism, but was used in addition to it. It did. Statistics in debugfs is deprecated. No new statistics are added there. kvm_stat is using ftrace now (if available) and of course ftrace gives seamless integration with perf. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Updated Elvis Upstreaming Roadmap
Gleb Natapov g...@minantech.com wrote on 24/12/2013 06:21:03 PM: From: Gleb Natapov g...@kernel.org To: Razya Ladelsky/Haifa/IBM@IBMIL, Cc: Michael S. Tsirkin m...@redhat.com, abel.gor...@gmail.com, Anthony Liguori anth...@codemonkey.ws, as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, Stefan Hajnoczi stefa...@gmail.com, Yossi Kuperman1/Haifa/ IBM@IBMIL, Eyal Moscovici/Haifa/IBM@IBMIL, b...@redhat.com Date: 24/12/2013 06:21 PM Subject: Re: Updated Elvis Upstreaming Roadmap Sent by: Gleb Natapov g...@minantech.com On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: 4. vhost statistics The issue that was raised for the vhost statistics was using ftrace instead of the debugfs mechanism. However, looking further into the kvm stat mechanism, we learned that ftrace didn't replace the plain debugfs mechanism, but was used in addition to it. It did. Statistics in debugfs is deprecated. No new statistics are added there. kvm_stat is using ftrace now (if available) and of course ftrace gives seamless integration with perf. O.k, I understand. We'll look more into ftrace to see that it fully supports our vhost statistics requirements. Thank you, Razya -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the queues looks like a useful feature. Could be. At the moment, multiqueue is off by default because it causes regressions for some workloads as compared to a single queue. If we have heuristics in vhost that fix this by auto-tuning threading, that would be nice. But if you need to tune it manually anyway, then from upstream perspective it does not seem to be worth it - you can just turn multiqueue on/off in the guest. Regarding cwmq, it is an interesting mechanism, which we need to explore further. At the moment we prefer not to change the vhost model to use cwmq, as some of the issues that were discussed, such as cgroups, are not supported by cwmq, and this is adding more complexity. However, we'll look further into it, and consider it at a later stage. Hmm that's still assuming some smart management tool configuring this correctly. Can't this be determined automatically depending on the workload? This is what the cwmq suggestion was really about: detect that we need more threads and spawn them. It's less about sharing the implementation with workqueues - would be very nice but not a must. But how cwmq can consider cgroup accounting ? I think cwmq is just a replacement for our own thread pool. It doesn't make cgroup accounting easier or harder. In any case, IMHO, the kernel should first provide the mechanism so later on a user-space management application (the policy) can orchestrate it. I think policy would be something coarse-grained, like setting priority.
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the queues looks like a useful feature. Could be. At the moment, multiqueue is off by default because it causes regressions for some workloads as compared to a single queue. If we have heuristics in vhost that fix this by auto-tuning threading, that would be nice. But if you need to tune it manually anyway, then from upstream perspective it does not seem to be worth it - you can just turn multiqueue on/off in the guest. I see. But we are mixing again between the policy and the mechanism. We first need a mechanism to control the system and then we need to implement the policy to orchestrate it (whenever it will be implemented in the kernel as part of vhost or outside in user-space). I don't see why to wait to have a policy to upstream the mechanism. If we upstream the mechanism in a manner that the defaults do not affect today's vhost behavior, then it will be possible to play with the policies and upstream them later. Regarding cwmq, it is an interesting mechanism, which we need to explore further. At the moment we prefer not to change the vhost model to use cwmq, as some of the issues that were discussed, such as cgroups, are not supported by cwmq, and this is adding more complexity. However, we'll look further into it, and consider it at a later stage. Hmm that's still
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) But again, you want to do things per VM now so you will have no idea about other VMs, right? Host cpu utilization could be a useful input for some heuristics, I agree, but nothing prevents us from sending this info to guest agent and controlling multiqueue based on that (kind of like balloon). 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the queues looks like a useful feature. Could be. At the moment, multiqueue is off by default because it causes regressions for some workloads as compared to a single queue. If we have heuristics in vhost that fix this by auto-tuning threading, that would be nice. But if you need to tune it manually anyway, then from upstream perspective it does not seem to be worth it - you can just turn multiqueue on/off in the guest. I see. But we are mixing again between the policy and the mechanism. We first need a mechanism to control the system and then we need to implement the policy to orchestrate it (whenever it will be implemented in the kernel as part of vhost or outside in user-space). I don't see why to wait to have a policy to upstream the mechanism. If we upstream the mechanism in a manner that the defaults do not affect today's vhost
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) But again, you want to do things per VM now so you will have no idea about other VMs, right? Host cpu utilization could be a useful input Razya shared a roadmap. The first step was to support sharing a thread for a single VM but the goal is to later on extend the mechanism to support multiple VMs and cgroups for some heuristics, I agree, but nothing prevents us from sending this info to guest agent and controlling multiqueue based on that (kind of like balloon). IMHO, we should never share host internal information (e.g. resource utilization) wit the guest. That's supposed to be confidential information :) The balloon is a bit different... kvm asks the guest OS to give (if possible) some pages but kvm never sends to the balloon information about the memory utilization of the host. If the guest wishes to send information about it's own memory consumption (like it does for MOM), that's OK. So, the guest can share information with the host but the host should be the one to make the decisions. KVM should never share host information with the guest. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) But again, you want to do things per VM now so you will have no idea about other VMs, right? Host cpu utilization could be a useful input Razya shared a roadmap. The first step was to support sharing a thread for a single VM but the goal is to later on extend the mechanism to support multiple VMs and cgroups Yes, I got that. What I'm not sure of is whether this is just a development roadmap, or do you expect to be able to merge things upstream in this order as well. If the later, all I'm saying is that I think you are doing this in the wrong order: we'll likely have to merge first 4 then 3 then possibly 1+2 together - but maybe 1+2 will have to wait until cgroups are sorted out. That's just a hunch of course until you actually try to do it. If the former, most of my comments don't really apply. for some heuristics, I agree, but nothing prevents us from sending this info to guest agent and controlling multiqueue based on that (kind of like balloon). IMHO, we should never share host internal information (e.g. resource utilization) wit the guest. That's supposed to be confidential information :) The balloon is a bit different... kvm asks the guest OS to give (if possible) some pages but kvm never sends to the balloon information about the memory utilization of the host. If the guest wishes to send information about it's own memory consumption (like it does for MOM), that's OK. So, the guest can share information with the host but the host should be the one to make the decisions. KVM should never share host information with the guest. It's also easy to just tell guest agent to turn multiqueue on/off if you have a mind to. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 3:48 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) But again, you want to do things per VM now so you will have no idea about other VMs, right? Host cpu utilization could be a useful input Razya shared a roadmap. The first step was to support sharing a thread for a single VM but the goal is to later on extend the mechanism to support multiple VMs and cgroups Yes, I got that. What I'm not sure of is whether this is just a development roadmap, or do you expect to be able to merge things upstream in this order as well. If the later, all I'm saying is that I think you are doing this in the wrong order: we'll likely have to merge first 4 then 3 then possibly 1+2 together - but maybe 1+2 will have to wait until cgroups are sorted out. That's just a hunch of course until you actually try to do it. If the former, most of my comments don't really apply. for some heuristics, I agree, but nothing prevents us from sending this info to guest agent and controlling multiqueue based on that (kind of like balloon). IMHO, we should never share host internal information (e.g. resource utilization) wit the guest. That's supposed to be confidential information :) The balloon is a bit different... kvm asks the guest OS to give (if possible) some pages but kvm never sends to the balloon information about the memory utilization of the host. If the guest wishes to send information about it's own memory consumption (like it does for MOM), that's OK. So, the guest can share information with the host but the host should be the one to make the decisions. KVM should never share host information with the guest. It's also easy to just tell guest agent to turn multiqueue on/off if you have a mind to. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and
Re: Updated Elvis Upstreaming Roadmap
On Thu, Dec 19, 2013 at 04:19:47PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 3:48 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote: On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote: On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. That is already controllable from the guest though, which likely has a better idea about the workload. but the guest has no idea about what's going on in the host system (e.g. other VMs I/O, cpu utilization of the host cores...) But again, you want to do things per VM now so you will have no idea about other VMs, right? Host cpu utilization could be a useful input Razya shared a roadmap. The first step was to support sharing a thread for a single VM but the goal is to later on extend the mechanism to support multiple VMs and cgroups Yes, I got that. What I'm not sure of is whether this is just a development roadmap, or do you expect to be able to merge things upstream in this order as well. If the later, all I'm saying is that I think you are doing this in the wrong order: we'll likely have to merge first 4 then 3 then possibly 1+2 together - but maybe 1+2 will have to wait until cgroups are sorted out. That's just a hunch of course until you actually try to do it. If the former, most of my comments don't really apply. for some heuristics, I agree, but nothing prevents us from sending this info to guest agent and controlling multiqueue based on that (kind of like balloon). IMHO, we should never share host internal information (e.g. resource utilization) wit the guest. That's supposed to be confidential information :) The balloon is a bit different... kvm asks the guest OS to give (if possible) some pages but kvm never sends to the balloon information about the memory utilization of the host. If the guest wishes to send information about it's own memory consumption (like it does for MOM), that's OK. So, the guest can share information with the host but the host should be the one to make the decisions. KVM should never share host information with the guest. It's also easy to just tell guest agent to turn multiqueue on/off if you have a mind to. 2. Creation of vhost threads We suggested two ways of controlling the creation and
Re: Updated Elvis Upstreaming Roadmap
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Regarding cwmq, it is an interesting mechanism, which we need to explore further. At the moment we prefer not to change the vhost model to use cwmq, as some of the issues that were discussed, such as cgroups, are not supported by cwmq, and this is adding more complexity. However, we'll look further into it, and consider it at a later stage. Hmm that's still assuming some smart management tool configuring this correctly. Can't this be determined automatically depending on the workload? This is what the cwmq suggestion was really about: detect that we need more threads and spawn them. It's less about sharing the implementation with workqueues - would be very nice but not a must. 3. Adding polling mode to vhost It is a good idea making polling adaptive based on various factors such as the I/O rate, the guest kick overhead(which is the tradeoff of polling), or the amount of wasted cycles (cycles we kept polling but no new work was added). However, as a beginning polling patch, we would prefer having a naive polling approach, which could be tuned with later patches. While any polling approach would still need a lot of testing to prove we don't for example steal CPU from guest which could be doing other useful work, given that an exit is at least 1.5K cycles at least in theory it seems like something that can improve performance. I'm not sure how naive we can be without introducing regressions for some workloads. For example, if we are on the same host CPU, there's no chance busy waiting will help us make progress. How about detecting that the VCPU thread that kicked us is currently running on another CPU, and only polling in this case? 4. vhost statistics The issue that was raised for the vhost statistics was using ftrace instead of the debugfs mechanism. However, looking further into the kvm stat mechanism, we learned that ftrace didn't replace the plain debugfs mechanism, but was used in addition to it. We propose to continue using debugfs for statistics, in a manner similar to kvm, and at some point in the future ftrace can be added to vhost as well. IMHO which kvm stat is a useful script, the best tool for perf stats is still perf. So I would try to integrate with that. How it works internally is IMHO less important. Does
Re: Updated Elvis Upstreaming Roadmap
On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the queues looks like a useful feature. Regarding cwmq, it is an interesting mechanism, which we need to explore further. At the moment we prefer not to change the vhost model to use cwmq, as some of the issues that were discussed, such as cgroups, are not supported by cwmq, and this is adding more complexity. However, we'll look further into it, and consider it at a later stage. Hmm that's still assuming some smart management tool configuring this correctly. Can't this be determined automatically depending on the workload? This is what the cwmq suggestion was really about: detect that we need more threads and spawn them. It's less about sharing the implementation with workqueues - would be very nice but not a must. But how cwmq can consider cgroup accounting ? In any case, IMHO, the kernel should first provide the mechanism so later on a user-space management application (the policy) can orchestrate it. 3. Adding polling mode to vhost It is a good idea making polling adaptive based on various factors such as the I/O rate, the guest kick overhead(which is the tradeoff of polling), or the amount of wasted cycles (cycles we kept polling but no new work was added). However, as a beginning polling patch, we would prefer having a naive polling approach, which could be tuned with later patches. While any polling approach would still need a lot of testing to prove we don't for example steal CPU from guest which could be doing other useful work, given that an exit is at least 1.5K cycles at least in theory it seems like something that can improve performance. I'm not sure how naive we can be without introducing regressions for some workloads. For example, if we are on the same host CPU, there's no chance busy waiting
Updated Elvis Upstreaming Roadmap
Hi, Thank you all for your comments. I'm sorry for taking this long to reply, I was away on vacation.. It was a good, long discussion, many issues were raised, which we'd like to address with the following proposed roadmap for Elvis patches. In general, we believe it would be best to start with patches that are as simple as possible, providing the basic Elvis functionality, and attend to the more complicated issues in subsequent patches. Here's the road map for Elvis patches: 1. Shared vhost thread for multiple devices. The way to go here, we believe, is to start with a patch having a shared vhost thread for multiple devices of the SAME vm. The next step/patch may be handling vms belonging to the same cgroup. Finally, we need to extend the functionality so that the shared vhost thread serves multiple vms (not necessarily belonging to the same cgroup). There was a lot of discussion about the way to address the enforcement of cgroup policies, and we will consider the various solutions with a future patch. 2. Creation of vhost threads We suggested two ways of controlling the creation and removal of vhost threads: - statically determining the maximum number of virtio devices per worker via a kernel module parameter - dynamically: Sysfs mechanism to add and remove vhost threads It seems that it would be simplest to take the static approach as a first stage. At a second stage (next patch), we'll advance to dynamically changing the number of vhost threads, using the static module parameter only as a default value. Regarding cwmq, it is an interesting mechanism, which we need to explore further. At the moment we prefer not to change the vhost model to use cwmq, as some of the issues that were discussed, such as cgroups, are not supported by cwmq, and this is adding more complexity. However, we'll look further into it, and consider it at a later stage. 3. Adding polling mode to vhost It is a good idea making polling adaptive based on various factors such as the I/O rate, the guest kick overhead(which is the tradeoff of polling), or the amount of wasted cycles (cycles we kept polling but no new work was added). However, as a beginning polling patch, we would prefer having a naive polling approach, which could be tuned with later patches. 4. vhost statistics The issue that was raised for the vhost statistics was using ftrace instead of the debugfs mechanism. However, looking further into the kvm stat mechanism, we learned that ftrace didn't replace the plain debugfs mechanism, but was used in addition to it. We propose to continue using debugfs for statistics, in a manner similar to kvm, and at some point in the future ftrace can be added to vhost as well. Does this plan look o.k.? If there are no further comments, I'll start preparing the patches according to what we've agreed on thus far. Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html