Re: Updated Elvis Upstreaming Roadmap

2013-12-24 Thread Razya Ladelsky
Hi,

To summarize the issues raised and following steps:

1. Shared vhost thread will support multiple vms, while supporting 
cgroups. 
As soon as we have a design to support cgroups with multiple vms, we'll 
share it.

2. Adding vhost polling mode: this patch can be submitted independently 
from (1).
We'll add a condition that will be checked periodically, in order to stop 
polling 
if the guest is not running (scheduled out) at that time. 

3. Implement good heuristics (policies) in the vhost module for 
adding/removing vhost
threads. We will not expose an interface to user-space at this time.


Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-24 Thread Gleb Natapov
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 4. vhost statistics 
 
 The issue that was raised for the vhost statistics was using ftrace 
 instead of the debugfs mechanism.
 However, looking further into the kvm stat mechanism, we learned that 
 ftrace didn't replace the plain debugfs mechanism, but was used in 
 addition to it.
  
It did. Statistics in debugfs is deprecated. No new statistics are
added there.  kvm_stat is using ftrace now (if available) and of course
ftrace gives seamless integration with perf.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-24 Thread Razya Ladelsky
Gleb Natapov g...@minantech.com wrote on 24/12/2013 06:21:03 PM:

 From: Gleb Natapov g...@kernel.org
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: Michael S. Tsirkin m...@redhat.com, abel.gor...@gmail.com, 
 Anthony Liguori anth...@codemonkey.ws, as...@redhat.com, 
 digitale...@google.com, Eran Raichstein/Haifa/IBM@IBMIL, 
 g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL, 
 kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com,
 Stefan Hajnoczi stefa...@gmail.com, Yossi Kuperman1/Haifa/
 IBM@IBMIL, Eyal Moscovici/Haifa/IBM@IBMIL, b...@redhat.com
 Date: 24/12/2013 06:21 PM
 Subject: Re: Updated Elvis Upstreaming Roadmap
 Sent by: Gleb Natapov g...@minantech.com
 
 On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
  4. vhost statistics 
  
  The issue that was raised for the vhost statistics was using ftrace 
  instead of the debugfs mechanism.
  However, looking further into the kvm stat mechanism, we learned that 
  ftrace didn't replace the plain debugfs mechanism, but was used in 
  addition to it.
  
 It did. Statistics in debugfs is deprecated. No new statistics are
 added there.  kvm_stat is using ftrace now (if available) and of course
 ftrace gives seamless integration with perf.


O.k, I understand.
We'll look more into ftrace to see that it fully supports our vhost 
statistics
requirements.
Thank you,
Razya
 
 --
  Gleb.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Michael S. Tsirkin
On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
 On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
  Hi,
 
  Thank you all for your comments.
  I'm sorry for taking this long to reply, I was away on vacation..
 
  It was a good, long discussion, many issues were raised, which we'd like
  to address with the following proposed roadmap for Elvis patches.
  In general, we believe it would be best to start with patches that are
  as simple as possible, providing the basic Elvis functionality,
  and attend to the more complicated issues in subsequent patches.
 
  Here's the road map for Elvis patches:
 
  Thanks for the follow up. Some suggestions below.
  Please note they suggestions below merely represent
  thoughts on merging upstream.
  If as the first step you are content with keeping this
  work as out of tree patches, in order to have
  the freedom to experiment with interfaces and
  performance, please feel free to ignore them.
 
  1. Shared vhost thread for multiple devices.
 
  The way to go here, we believe, is to start with a patch having a shared
  vhost thread for multiple devices of the SAME vm.
  The next step/patch may be handling vms belonging to the same cgroup.
 
  Finally, we need to extend the functionality so that the shared vhost
  thread
  serves multiple vms (not necessarily belonging to the same cgroup).
 
  There was a lot of discussion about the way to address the enforcement
  of cgroup policies, and we will consider the various solutions with a
  future
  patch.
 
  With respect to the upstream kernel,
  I'm not sure a bunch of changes just for the sake of guests with
  multiple virtual NIC cards makes sense.
  And I wonder how this step, in isolation, will affect e.g.
  multiqueue workloads.
  But I guess if the numbers are convincing, this can be mergeable.
 
 Even if you have a single multiqueue device this change allows
 to create one vhost thread for all the queues, one vhost thread per
 queue or any other combination. I guess that depending on the workload
 and depending on the system utilization (free cycles/cores, density)
 you would prefer
 to use one or more vhost threads.

That is already controllable from the guest though, which likely has a better
idea about the workload.

 
 
  2. Creation of vhost threads
 
  We suggested two ways of controlling the creation and removal of vhost
  threads:
  - statically determining the maximum number of virtio devices per worker
  via a kernel module parameter
  - dynamically: Sysfs mechanism to add and remove vhost threads
 
  It seems that it would be simplest to take the static approach as
  a first stage. At a second stage (next patch), we'll advance to
  dynamically
  changing the number of vhost threads, using the static module parameter
  only as a default value.
 
  I'm not sure how independent this is from 1.
  With respect to the upstream kernel,
  Introducing interfaces (which we'll have to maintain
  forever) just for the sake of guests with
  multiple virtual NIC cards does not look like a good tradeoff.
 
  So I'm unlikely to merge this upstream without making it useful cross-VM,
  and yes this means isolation and accounting with cgroups need to
  work properly.
 
 Agree, but even if you use a single multiqueue device having the
 ability to use 1 thread to serve all the queues or multiple threads to
 serve all the queues looks like a useful feature.

Could be.  At the moment, multiqueue is off by default because it causes
regressions for some workloads as compared to a single queue.
If we have heuristics in vhost that fix this by auto-tuning threading, that
would be nice.  But if you need to tune it manually anyway,
then from upstream perspective it does not seem to be worth it - you can just
turn multiqueue on/off in the guest.


 
  Regarding cwmq, it is an interesting mechanism, which we need to explore
  further.
  At the moment we prefer not to change the vhost model to use cwmq, as some
  of the issues that were discussed, such as cgroups, are not supported by
  cwmq, and this is adding more complexity.
  However, we'll look further into it, and consider it at a later stage.
 
  Hmm that's still assuming some smart management tool configuring
  this correctly.  Can't this be determined automatically depending
  on the workload?
  This is what the cwmq suggestion was really about: detect
  that we need more threads and spawn them.
  It's less about sharing the implementation with workqueues -
  would be very nice but not a must.
 
 But how cwmq can consider cgroup accounting ?

I think cwmq is just a replacement for our own thread pool.
It doesn't make cgroup accounting easier or harder.

 In any case, IMHO, the kernel should first provide the mechanism so
 later on a user-space management application (the policy) can
 orchestrate it.

I think policy would be something coarse-grained, like setting priority.

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Abel Gordon
On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
 On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
  Hi,
 
  Thank you all for your comments.
  I'm sorry for taking this long to reply, I was away on vacation..
 
  It was a good, long discussion, many issues were raised, which we'd like
  to address with the following proposed roadmap for Elvis patches.
  In general, we believe it would be best to start with patches that are
  as simple as possible, providing the basic Elvis functionality,
  and attend to the more complicated issues in subsequent patches.
 
  Here's the road map for Elvis patches:
 
  Thanks for the follow up. Some suggestions below.
  Please note they suggestions below merely represent
  thoughts on merging upstream.
  If as the first step you are content with keeping this
  work as out of tree patches, in order to have
  the freedom to experiment with interfaces and
  performance, please feel free to ignore them.
 
  1. Shared vhost thread for multiple devices.
 
  The way to go here, we believe, is to start with a patch having a shared
  vhost thread for multiple devices of the SAME vm.
  The next step/patch may be handling vms belonging to the same cgroup.
 
  Finally, we need to extend the functionality so that the shared vhost
  thread
  serves multiple vms (not necessarily belonging to the same cgroup).
 
  There was a lot of discussion about the way to address the enforcement
  of cgroup policies, and we will consider the various solutions with a
  future
  patch.
 
  With respect to the upstream kernel,
  I'm not sure a bunch of changes just for the sake of guests with
  multiple virtual NIC cards makes sense.
  And I wonder how this step, in isolation, will affect e.g.
  multiqueue workloads.
  But I guess if the numbers are convincing, this can be mergeable.

 Even if you have a single multiqueue device this change allows
 to create one vhost thread for all the queues, one vhost thread per
 queue or any other combination. I guess that depending on the workload
 and depending on the system utilization (free cycles/cores, density)
 you would prefer
 to use one or more vhost threads.

 That is already controllable from the guest though, which likely has a better
 idea about the workload.

but the guest has no idea about what's going on in the host system
(e.g. other VMs I/O, cpu utilization of the host cores...)


 
 
  2. Creation of vhost threads
 
  We suggested two ways of controlling the creation and removal of vhost
  threads:
  - statically determining the maximum number of virtio devices per worker
  via a kernel module parameter
  - dynamically: Sysfs mechanism to add and remove vhost threads
 
  It seems that it would be simplest to take the static approach as
  a first stage. At a second stage (next patch), we'll advance to
  dynamically
  changing the number of vhost threads, using the static module parameter
  only as a default value.
 
  I'm not sure how independent this is from 1.
  With respect to the upstream kernel,
  Introducing interfaces (which we'll have to maintain
  forever) just for the sake of guests with
  multiple virtual NIC cards does not look like a good tradeoff.
 
  So I'm unlikely to merge this upstream without making it useful cross-VM,
  and yes this means isolation and accounting with cgroups need to
  work properly.

 Agree, but even if you use a single multiqueue device having the
 ability to use 1 thread to serve all the queues or multiple threads to
 serve all the queues looks like a useful feature.

 Could be.  At the moment, multiqueue is off by default because it causes
 regressions for some workloads as compared to a single queue.
 If we have heuristics in vhost that fix this by auto-tuning threading, that
 would be nice.  But if you need to tune it manually anyway,
 then from upstream perspective it does not seem to be worth it - you can just
 turn multiqueue on/off in the guest.

I see. But we are mixing again between the policy and the mechanism.
We first need a mechanism to control the system and then we need to
implement the policy to orchestrate it (whenever it will be
implemented in the kernel as part of vhost or outside in user-space).
I don't see why to wait to have a policy to upstream the mechanism. If
we upstream the mechanism in a manner that the defaults do not affect
today's vhost behavior, then it will be possible to play with the
policies and upstream them later.



 
  Regarding cwmq, it is an interesting mechanism, which we need to explore
  further.
  At the moment we prefer not to change the vhost model to use cwmq, as some
  of the issues that were discussed, such as cgroups, are not supported by
  cwmq, and this is adding more complexity.
  However, we'll look further into it, and consider it at a later stage.
 
  Hmm that's still 

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Michael S. Tsirkin
On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote:
 On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
  On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
   Hi,
  
   Thank you all for your comments.
   I'm sorry for taking this long to reply, I was away on vacation..
  
   It was a good, long discussion, many issues were raised, which we'd like
   to address with the following proposed roadmap for Elvis patches.
   In general, we believe it would be best to start with patches that are
   as simple as possible, providing the basic Elvis functionality,
   and attend to the more complicated issues in subsequent patches.
  
   Here's the road map for Elvis patches:
  
   Thanks for the follow up. Some suggestions below.
   Please note they suggestions below merely represent
   thoughts on merging upstream.
   If as the first step you are content with keeping this
   work as out of tree patches, in order to have
   the freedom to experiment with interfaces and
   performance, please feel free to ignore them.
  
   1. Shared vhost thread for multiple devices.
  
   The way to go here, we believe, is to start with a patch having a shared
   vhost thread for multiple devices of the SAME vm.
   The next step/patch may be handling vms belonging to the same cgroup.
  
   Finally, we need to extend the functionality so that the shared vhost
   thread
   serves multiple vms (not necessarily belonging to the same cgroup).
  
   There was a lot of discussion about the way to address the enforcement
   of cgroup policies, and we will consider the various solutions with a
   future
   patch.
  
   With respect to the upstream kernel,
   I'm not sure a bunch of changes just for the sake of guests with
   multiple virtual NIC cards makes sense.
   And I wonder how this step, in isolation, will affect e.g.
   multiqueue workloads.
   But I guess if the numbers are convincing, this can be mergeable.
 
  Even if you have a single multiqueue device this change allows
  to create one vhost thread for all the queues, one vhost thread per
  queue or any other combination. I guess that depending on the workload
  and depending on the system utilization (free cycles/cores, density)
  you would prefer
  to use one or more vhost threads.
 
  That is already controllable from the guest though, which likely has a 
  better
  idea about the workload.
 
 but the guest has no idea about what's going on in the host system
 (e.g. other VMs I/O, cpu utilization of the host cores...)

But again, you want to do things per VM now so you will have no idea
about other VMs, right? Host cpu utilization could be a useful input
for some heuristics, I agree, but nothing prevents us from sending
this info to guest agent and controlling multiqueue based on that
(kind of like balloon).

 
  
  
   2. Creation of vhost threads
  
   We suggested two ways of controlling the creation and removal of vhost
   threads:
   - statically determining the maximum number of virtio devices per worker
   via a kernel module parameter
   - dynamically: Sysfs mechanism to add and remove vhost threads
  
   It seems that it would be simplest to take the static approach as
   a first stage. At a second stage (next patch), we'll advance to
   dynamically
   changing the number of vhost threads, using the static module parameter
   only as a default value.
  
   I'm not sure how independent this is from 1.
   With respect to the upstream kernel,
   Introducing interfaces (which we'll have to maintain
   forever) just for the sake of guests with
   multiple virtual NIC cards does not look like a good tradeoff.
  
   So I'm unlikely to merge this upstream without making it useful cross-VM,
   and yes this means isolation and accounting with cgroups need to
   work properly.
 
  Agree, but even if you use a single multiqueue device having the
  ability to use 1 thread to serve all the queues or multiple threads to
  serve all the queues looks like a useful feature.
 
  Could be.  At the moment, multiqueue is off by default because it causes
  regressions for some workloads as compared to a single queue.
  If we have heuristics in vhost that fix this by auto-tuning threading, that
  would be nice.  But if you need to tune it manually anyway,
  then from upstream perspective it does not seem to be worth it - you can 
  just
  turn multiqueue on/off in the guest.
 
 I see. But we are mixing again between the policy and the mechanism.
 We first need a mechanism to control the system and then we need to
 implement the policy to orchestrate it (whenever it will be
 implemented in the kernel as part of vhost or outside in user-space).
 I don't see why to wait to have a policy to upstream the mechanism. If
 we upstream the mechanism in a manner that the defaults do not affect
 today's vhost 

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Abel Gordon
On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote:
 On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
  On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
   Hi,
  
   Thank you all for your comments.
   I'm sorry for taking this long to reply, I was away on vacation..
  
   It was a good, long discussion, many issues were raised, which we'd 
   like
   to address with the following proposed roadmap for Elvis patches.
   In general, we believe it would be best to start with patches that are
   as simple as possible, providing the basic Elvis functionality,
   and attend to the more complicated issues in subsequent patches.
  
   Here's the road map for Elvis patches:
  
   Thanks for the follow up. Some suggestions below.
   Please note they suggestions below merely represent
   thoughts on merging upstream.
   If as the first step you are content with keeping this
   work as out of tree patches, in order to have
   the freedom to experiment with interfaces and
   performance, please feel free to ignore them.
  
   1. Shared vhost thread for multiple devices.
  
   The way to go here, we believe, is to start with a patch having a 
   shared
   vhost thread for multiple devices of the SAME vm.
   The next step/patch may be handling vms belonging to the same cgroup.
  
   Finally, we need to extend the functionality so that the shared vhost
   thread
   serves multiple vms (not necessarily belonging to the same cgroup).
  
   There was a lot of discussion about the way to address the enforcement
   of cgroup policies, and we will consider the various solutions with a
   future
   patch.
  
   With respect to the upstream kernel,
   I'm not sure a bunch of changes just for the sake of guests with
   multiple virtual NIC cards makes sense.
   And I wonder how this step, in isolation, will affect e.g.
   multiqueue workloads.
   But I guess if the numbers are convincing, this can be mergeable.
 
  Even if you have a single multiqueue device this change allows
  to create one vhost thread for all the queues, one vhost thread per
  queue or any other combination. I guess that depending on the workload
  and depending on the system utilization (free cycles/cores, density)
  you would prefer
  to use one or more vhost threads.
 
  That is already controllable from the guest though, which likely has a 
  better
  idea about the workload.

 but the guest has no idea about what's going on in the host system
 (e.g. other VMs I/O, cpu utilization of the host cores...)

 But again, you want to do things per VM now so you will have no idea
 about other VMs, right? Host cpu utilization could be a useful input

Razya shared a roadmap. The first step was to support sharing a thread
for a single VM but the goal is to later on extend the mechanism to
support multiple VMs and cgroups

 for some heuristics, I agree, but nothing prevents us from sending
 this info to guest agent and controlling multiqueue based on that
 (kind of like balloon).

IMHO, we should never share host internal information (e.g. resource
utilization) wit the guest. That's supposed to be confidential
information :)
The balloon is a bit different... kvm asks the guest OS to give  (if
possible) some pages but  kvm never sends to the balloon information
about the memory utilization of the host. If the guest wishes to send
information about it's own memory consumption (like it does for MOM),
that's OK.

So, the guest can share information with the host but the host should
be the one to make the decisions. KVM should never share host
information with the guest.


 
  
  
   2. Creation of vhost threads
  
   We suggested two ways of controlling the creation and removal of vhost
   threads:
   - statically determining the maximum number of virtio devices per 
   worker
   via a kernel module parameter
   - dynamically: Sysfs mechanism to add and remove vhost threads
  
   It seems that it would be simplest to take the static approach as
   a first stage. At a second stage (next patch), we'll advance to
   dynamically
   changing the number of vhost threads, using the static module parameter
   only as a default value.
  
   I'm not sure how independent this is from 1.
   With respect to the upstream kernel,
   Introducing interfaces (which we'll have to maintain
   forever) just for the sake of guests with
   multiple virtual NIC cards does not look like a good tradeoff.
  
   So I'm unlikely to merge this upstream without making it useful 
   cross-VM,
   and yes this means isolation and accounting with cgroups need to
   work properly.
 
  Agree, but even if you use a single multiqueue device having the
  ability to use 1 thread to serve all the queues or multiple threads to
  serve all the 

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Michael S. Tsirkin
On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote:
 On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote:
  On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
   On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com 
   wrote:
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
Hi,
   
Thank you all for your comments.
I'm sorry for taking this long to reply, I was away on vacation..
   
It was a good, long discussion, many issues were raised, which we'd 
like
to address with the following proposed roadmap for Elvis patches.
In general, we believe it would be best to start with patches that 
are
as simple as possible, providing the basic Elvis functionality,
and attend to the more complicated issues in subsequent patches.
   
Here's the road map for Elvis patches:
   
Thanks for the follow up. Some suggestions below.
Please note they suggestions below merely represent
thoughts on merging upstream.
If as the first step you are content with keeping this
work as out of tree patches, in order to have
the freedom to experiment with interfaces and
performance, please feel free to ignore them.
   
1. Shared vhost thread for multiple devices.
   
The way to go here, we believe, is to start with a patch having a 
shared
vhost thread for multiple devices of the SAME vm.
The next step/patch may be handling vms belonging to the same cgroup.
   
Finally, we need to extend the functionality so that the shared vhost
thread
serves multiple vms (not necessarily belonging to the same cgroup).
   
There was a lot of discussion about the way to address the 
enforcement
of cgroup policies, and we will consider the various solutions with a
future
patch.
   
With respect to the upstream kernel,
I'm not sure a bunch of changes just for the sake of guests with
multiple virtual NIC cards makes sense.
And I wonder how this step, in isolation, will affect e.g.
multiqueue workloads.
But I guess if the numbers are convincing, this can be mergeable.
  
   Even if you have a single multiqueue device this change allows
   to create one vhost thread for all the queues, one vhost thread per
   queue or any other combination. I guess that depending on the workload
   and depending on the system utilization (free cycles/cores, density)
   you would prefer
   to use one or more vhost threads.
  
   That is already controllable from the guest though, which likely has a 
   better
   idea about the workload.
 
  but the guest has no idea about what's going on in the host system
  (e.g. other VMs I/O, cpu utilization of the host cores...)
 
  But again, you want to do things per VM now so you will have no idea
  about other VMs, right? Host cpu utilization could be a useful input
 
 Razya shared a roadmap. The first step was to support sharing a thread
 for a single VM but the goal is to later on extend the mechanism to
 support multiple VMs and cgroups

Yes, I got that. What I'm not sure of is whether this is just a
development roadmap, or do you expect to be able to merge things
upstream in this order as well.
If the later, all I'm saying is that I think you are doing this in the wrong
order: we'll likely have to merge first 4 then 3 then possibly 1+2 together -
but maybe 1+2 will have to wait until cgroups are sorted out.
That's just a hunch of course until you actually try to do it.
If the former, most of my comments don't really apply.

  for some heuristics, I agree, but nothing prevents us from sending
  this info to guest agent and controlling multiqueue based on that
  (kind of like balloon).
 
 IMHO, we should never share host internal information (e.g. resource
 utilization) wit the guest. That's supposed to be confidential
 information :)
 The balloon is a bit different... kvm asks the guest OS to give  (if
 possible) some pages but  kvm never sends to the balloon information
 about the memory utilization of the host. If the guest wishes to send
 information about it's own memory consumption (like it does for MOM),
 that's OK.
 
 So, the guest can share information with the host but the host should
 be the one to make the decisions. KVM should never share host
 information with the guest.

It's also easy to just tell guest agent to turn multiqueue on/off if you have
a mind to.


 
  
   
   
2. Creation of vhost threads
   
We suggested two ways of controlling the creation and removal of 
vhost
threads:
- statically determining the maximum number of virtio devices per 
worker
via a kernel module parameter
- dynamically: Sysfs mechanism to add and remove vhost threads
   
It seems that it would be simplest to take the static approach as
a 

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Abel Gordon
On Thu, Dec 19, 2013 at 3:48 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote:
 On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote:
  On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
   On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com 
   wrote:
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
Hi,
   
Thank you all for your comments.
I'm sorry for taking this long to reply, I was away on vacation..
   
It was a good, long discussion, many issues were raised, which we'd 
like
to address with the following proposed roadmap for Elvis patches.
In general, we believe it would be best to start with patches that 
are
as simple as possible, providing the basic Elvis functionality,
and attend to the more complicated issues in subsequent patches.
   
Here's the road map for Elvis patches:
   
Thanks for the follow up. Some suggestions below.
Please note they suggestions below merely represent
thoughts on merging upstream.
If as the first step you are content with keeping this
work as out of tree patches, in order to have
the freedom to experiment with interfaces and
performance, please feel free to ignore them.
   
1. Shared vhost thread for multiple devices.
   
The way to go here, we believe, is to start with a patch having a 
shared
vhost thread for multiple devices of the SAME vm.
The next step/patch may be handling vms belonging to the same 
cgroup.
   
Finally, we need to extend the functionality so that the shared 
vhost
thread
serves multiple vms (not necessarily belonging to the same cgroup).
   
There was a lot of discussion about the way to address the 
enforcement
of cgroup policies, and we will consider the various solutions with 
a
future
patch.
   
With respect to the upstream kernel,
I'm not sure a bunch of changes just for the sake of guests with
multiple virtual NIC cards makes sense.
And I wonder how this step, in isolation, will affect e.g.
multiqueue workloads.
But I guess if the numbers are convincing, this can be mergeable.
  
   Even if you have a single multiqueue device this change allows
   to create one vhost thread for all the queues, one vhost thread per
   queue or any other combination. I guess that depending on the workload
   and depending on the system utilization (free cycles/cores, density)
   you would prefer
   to use one or more vhost threads.
  
   That is already controllable from the guest though, which likely has a 
   better
   idea about the workload.
 
  but the guest has no idea about what's going on in the host system
  (e.g. other VMs I/O, cpu utilization of the host cores...)
 
  But again, you want to do things per VM now so you will have no idea
  about other VMs, right? Host cpu utilization could be a useful input

 Razya shared a roadmap. The first step was to support sharing a thread
 for a single VM but the goal is to later on extend the mechanism to
 support multiple VMs and cgroups

 Yes, I got that. What I'm not sure of is whether this is just a
 development roadmap, or do you expect to be able to merge things
 upstream in this order as well.
 If the later, all I'm saying is that I think you are doing this in the wrong
 order: we'll likely have to merge first 4 then 3 then possibly 1+2 together -
 but maybe 1+2 will have to wait until cgroups are sorted out.
 That's just a hunch of course until you actually try to do it.
 If the former, most of my comments don't really apply.

  for some heuristics, I agree, but nothing prevents us from sending
  this info to guest agent and controlling multiqueue based on that
  (kind of like balloon).

 IMHO, we should never share host internal information (e.g. resource
 utilization) wit the guest. That's supposed to be confidential
 information :)
 The balloon is a bit different... kvm asks the guest OS to give  (if
 possible) some pages but  kvm never sends to the balloon information
 about the memory utilization of the host. If the guest wishes to send
 information about it's own memory consumption (like it does for MOM),
 that's OK.

 So, the guest can share information with the host but the host should
 be the one to make the decisions. KVM should never share host
 information with the guest.

 It's also easy to just tell guest agent to turn multiqueue on/off if you have
 a mind to.


 
  
   
   
2. Creation of vhost threads
   
We suggested two ways of controlling the creation and removal of 
vhost
threads:
- statically determining the maximum number of virtio devices per 
worker
via a kernel module parameter
- dynamically: Sysfs mechanism to add and 

Re: Updated Elvis Upstreaming Roadmap

2013-12-19 Thread Michael S. Tsirkin
On Thu, Dec 19, 2013 at 04:19:47PM +0200, Abel Gordon wrote:
 On Thu, Dec 19, 2013 at 3:48 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Dec 19, 2013 at 02:56:10PM +0200, Abel Gordon wrote:
  On Thu, Dec 19, 2013 at 1:37 PM, Michael S. Tsirkin m...@redhat.com 
  wrote:
   On Thu, Dec 19, 2013 at 12:36:30PM +0200, Abel Gordon wrote:
   On Thu, Dec 19, 2013 at 12:13 PM, Michael S. Tsirkin m...@redhat.com 
   wrote:
On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin 
m...@redhat.com wrote:
 On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 Hi,

 Thank you all for your comments.
 I'm sorry for taking this long to reply, I was away on vacation..

 It was a good, long discussion, many issues were raised, which 
 we'd like
 to address with the following proposed roadmap for Elvis patches.
 In general, we believe it would be best to start with patches 
 that are
 as simple as possible, providing the basic Elvis functionality,
 and attend to the more complicated issues in subsequent patches.

 Here's the road map for Elvis patches:

 Thanks for the follow up. Some suggestions below.
 Please note they suggestions below merely represent
 thoughts on merging upstream.
 If as the first step you are content with keeping this
 work as out of tree patches, in order to have
 the freedom to experiment with interfaces and
 performance, please feel free to ignore them.

 1. Shared vhost thread for multiple devices.

 The way to go here, we believe, is to start with a patch having a 
 shared
 vhost thread for multiple devices of the SAME vm.
 The next step/patch may be handling vms belonging to the same 
 cgroup.

 Finally, we need to extend the functionality so that the shared 
 vhost
 thread
 serves multiple vms (not necessarily belonging to the same 
 cgroup).

 There was a lot of discussion about the way to address the 
 enforcement
 of cgroup policies, and we will consider the various solutions 
 with a
 future
 patch.

 With respect to the upstream kernel,
 I'm not sure a bunch of changes just for the sake of guests with
 multiple virtual NIC cards makes sense.
 And I wonder how this step, in isolation, will affect e.g.
 multiqueue workloads.
 But I guess if the numbers are convincing, this can be mergeable.
   
Even if you have a single multiqueue device this change allows
to create one vhost thread for all the queues, one vhost thread per
queue or any other combination. I guess that depending on the 
workload
and depending on the system utilization (free cycles/cores, density)
you would prefer
to use one or more vhost threads.
   
That is already controllable from the guest though, which likely has 
a better
idea about the workload.
  
   but the guest has no idea about what's going on in the host system
   (e.g. other VMs I/O, cpu utilization of the host cores...)
  
   But again, you want to do things per VM now so you will have no idea
   about other VMs, right? Host cpu utilization could be a useful input
 
  Razya shared a roadmap. The first step was to support sharing a thread
  for a single VM but the goal is to later on extend the mechanism to
  support multiple VMs and cgroups
 
  Yes, I got that. What I'm not sure of is whether this is just a
  development roadmap, or do you expect to be able to merge things
  upstream in this order as well.
  If the later, all I'm saying is that I think you are doing this in the wrong
  order: we'll likely have to merge first 4 then 3 then possibly 1+2 together 
  -
  but maybe 1+2 will have to wait until cgroups are sorted out.
  That's just a hunch of course until you actually try to do it.
  If the former, most of my comments don't really apply.
 
   for some heuristics, I agree, but nothing prevents us from sending
   this info to guest agent and controlling multiqueue based on that
   (kind of like balloon).
 
  IMHO, we should never share host internal information (e.g. resource
  utilization) wit the guest. That's supposed to be confidential
  information :)
  The balloon is a bit different... kvm asks the guest OS to give  (if
  possible) some pages but  kvm never sends to the balloon information
  about the memory utilization of the host. If the guest wishes to send
  information about it's own memory consumption (like it does for MOM),
  that's OK.
 
  So, the guest can share information with the host but the host should
  be the one to make the decisions. KVM should never share host
  information with the guest.
 
  It's also easy to just tell guest agent to turn multiqueue on/off if you 
  have
  a mind to.
 
 
  
   


 2. Creation of vhost threads

 We suggested two ways of controlling the creation and 

Re: Updated Elvis Upstreaming Roadmap

2013-12-18 Thread Michael S. Tsirkin
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 Hi,
 
 Thank you all for your comments.
 I'm sorry for taking this long to reply, I was away on vacation..
 
 It was a good, long discussion, many issues were raised, which we'd like 
 to address with the following proposed roadmap for Elvis patches.
 In general, we believe it would be best to start with patches that are 
 as simple as possible, providing the basic Elvis functionality, 
 and attend to the more complicated issues in subsequent patches.
 
 Here's the road map for Elvis patches: 

Thanks for the follow up. Some suggestions below.
Please note they suggestions below merely represent
thoughts on merging upstream.
If as the first step you are content with keeping this
work as out of tree patches, in order to have
the freedom to experiment with interfaces and
performance, please feel free to ignore them.

 1. Shared vhost thread for multiple devices.
 
 The way to go here, we believe, is to start with a patch having a shared 
 vhost thread for multiple devices of the SAME vm.
 The next step/patch may be handling vms belonging to the same cgroup.
 
 Finally, we need to extend the functionality so that the shared vhost 
 thread 
 serves multiple vms (not necessarily belonging to the same cgroup).
 
 There was a lot of discussion about the way to address the enforcement 
 of cgroup policies, and we will consider the various solutions with a 
 future
 patch.

With respect to the upstream kernel,
I'm not sure a bunch of changes just for the sake of guests with
multiple virtual NIC cards makes sense.
And I wonder how this step, in isolation, will affect e.g.
multiqueue workloads.
But I guess if the numbers are convincing, this can be mergeable.

 
 2. Creation of vhost threads
 
 We suggested two ways of controlling the creation and removal of vhost
 threads: 
 - statically determining the maximum number of virtio devices per worker 
 via a kernel module parameter 
 - dynamically: Sysfs mechanism to add and remove vhost threads 
 
 It seems that it would be simplest to take the static approach as
 a first stage. At a second stage (next patch), we'll advance to 
 dynamically 
 changing the number of vhost threads, using the static module parameter 
 only as a default value. 

I'm not sure how independent this is from 1.
With respect to the upstream kernel,
Introducing interfaces (which we'll have to maintain
forever) just for the sake of guests with
multiple virtual NIC cards does not look like a good tradeoff.

So I'm unlikely to merge this upstream without making it useful cross-VM,
and yes this means isolation and accounting with cgroups need to
work properly.

 Regarding cwmq, it is an interesting mechanism, which we need to explore 
 further.
 At the moment we prefer not to change the vhost model to use cwmq, as some 
 of the issues that were discussed, such as cgroups, are not supported by 
 cwmq, and this is adding more complexity.
 However, we'll look further into it, and consider it at a later stage.

Hmm that's still assuming some smart management tool configuring
this correctly.  Can't this be determined automatically depending
on the workload?
This is what the cwmq suggestion was really about: detect
that we need more threads and spawn them.
It's less about sharing the implementation with workqueues -
would be very nice but not a must.



 3. Adding polling mode to vhost 
 
 It is a good idea making polling adaptive based on various factors such as 
 the I/O rate, the guest kick overhead(which is the tradeoff of polling), 
 or the amount of wasted cycles (cycles we kept polling but no new work was 
 added).
 However, as a beginning polling patch, we would prefer having a naive 
 polling approach, which could be tuned with later patches.
 

While any polling approach would still need a lot of testing to prove we
don't for example steal CPU from guest which could be doing other useful
work, given that an exit is at least 1.5K cycles at least in theory it
seems like something that can improve performance.  I'm not sure how
naive we can be without introducing regressions  for some workloads.
For example, if we are on the same host CPU, there's no
chance busy waiting will help us make progress.
How about detecting that the VCPU thread that kicked us
is currently running on another CPU, and only polling in
this case?

 4. vhost statistics 
 
 The issue that was raised for the vhost statistics was using ftrace 
 instead of the debugfs mechanism.
 However, looking further into the kvm stat mechanism, we learned that 
 ftrace didn't replace the plain debugfs mechanism, but was used in 
 addition to it.
  
 We propose to continue using debugfs for statistics, in a manner similar 
 to kvm,
 and at some point in the future ftrace can be added to vhost as well.

IMHO which kvm stat is a useful script, the best tool
for perf stats is still perf. So I would try to integrate with that.
How it works internally is IMHO less important.

 Does 

Re: Updated Elvis Upstreaming Roadmap

2013-12-18 Thread Abel Gordon
On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 Hi,

 Thank you all for your comments.
 I'm sorry for taking this long to reply, I was away on vacation..

 It was a good, long discussion, many issues were raised, which we'd like
 to address with the following proposed roadmap for Elvis patches.
 In general, we believe it would be best to start with patches that are
 as simple as possible, providing the basic Elvis functionality,
 and attend to the more complicated issues in subsequent patches.

 Here's the road map for Elvis patches:

 Thanks for the follow up. Some suggestions below.
 Please note they suggestions below merely represent
 thoughts on merging upstream.
 If as the first step you are content with keeping this
 work as out of tree patches, in order to have
 the freedom to experiment with interfaces and
 performance, please feel free to ignore them.

 1. Shared vhost thread for multiple devices.

 The way to go here, we believe, is to start with a patch having a shared
 vhost thread for multiple devices of the SAME vm.
 The next step/patch may be handling vms belonging to the same cgroup.

 Finally, we need to extend the functionality so that the shared vhost
 thread
 serves multiple vms (not necessarily belonging to the same cgroup).

 There was a lot of discussion about the way to address the enforcement
 of cgroup policies, and we will consider the various solutions with a
 future
 patch.

 With respect to the upstream kernel,
 I'm not sure a bunch of changes just for the sake of guests with
 multiple virtual NIC cards makes sense.
 And I wonder how this step, in isolation, will affect e.g.
 multiqueue workloads.
 But I guess if the numbers are convincing, this can be mergeable.

Even if you have a single multiqueue device this change allows
to create one vhost thread for all the queues, one vhost thread per
queue or any other combination. I guess that depending on the workload
and depending on the system utilization (free cycles/cores, density)
you would prefer
to use one or more vhost threads.



 2. Creation of vhost threads

 We suggested two ways of controlling the creation and removal of vhost
 threads:
 - statically determining the maximum number of virtio devices per worker
 via a kernel module parameter
 - dynamically: Sysfs mechanism to add and remove vhost threads

 It seems that it would be simplest to take the static approach as
 a first stage. At a second stage (next patch), we'll advance to
 dynamically
 changing the number of vhost threads, using the static module parameter
 only as a default value.

 I'm not sure how independent this is from 1.
 With respect to the upstream kernel,
 Introducing interfaces (which we'll have to maintain
 forever) just for the sake of guests with
 multiple virtual NIC cards does not look like a good tradeoff.

 So I'm unlikely to merge this upstream without making it useful cross-VM,
 and yes this means isolation and accounting with cgroups need to
 work properly.

Agree, but even if you use a single multiqueue device having the
ability to use 1 thread to serve all the queues or multiple threads to
serve all the queues looks like a useful feature.


 Regarding cwmq, it is an interesting mechanism, which we need to explore
 further.
 At the moment we prefer not to change the vhost model to use cwmq, as some
 of the issues that were discussed, such as cgroups, are not supported by
 cwmq, and this is adding more complexity.
 However, we'll look further into it, and consider it at a later stage.

 Hmm that's still assuming some smart management tool configuring
 this correctly.  Can't this be determined automatically depending
 on the workload?
 This is what the cwmq suggestion was really about: detect
 that we need more threads and spawn them.
 It's less about sharing the implementation with workqueues -
 would be very nice but not a must.

But how cwmq can consider cgroup accounting ?
In any case, IMHO, the kernel should first provide the mechanism so
later on a user-space management application (the policy) can
orchestrate it.



 3. Adding polling mode to vhost

 It is a good idea making polling adaptive based on various factors such as
 the I/O rate, the guest kick overhead(which is the tradeoff of polling),
 or the amount of wasted cycles (cycles we kept polling but no new work was
 added).
 However, as a beginning polling patch, we would prefer having a naive
 polling approach, which could be tuned with later patches.


 While any polling approach would still need a lot of testing to prove we
 don't for example steal CPU from guest which could be doing other useful
 work, given that an exit is at least 1.5K cycles at least in theory it
 seems like something that can improve performance.  I'm not sure how
 naive we can be without introducing regressions  for some workloads.
 For example, if we are on the same host CPU, there's no
 chance busy waiting 

Updated Elvis Upstreaming Roadmap

2013-12-17 Thread Razya Ladelsky
Hi,

Thank you all for your comments.
I'm sorry for taking this long to reply, I was away on vacation..

It was a good, long discussion, many issues were raised, which we'd like 
to address with the following proposed roadmap for Elvis patches.
In general, we believe it would be best to start with patches that are 
as simple as possible, providing the basic Elvis functionality, 
and attend to the more complicated issues in subsequent patches.

Here's the road map for Elvis patches: 

1. Shared vhost thread for multiple devices.

The way to go here, we believe, is to start with a patch having a shared 
vhost thread for multiple devices of the SAME vm.
The next step/patch may be handling vms belonging to the same cgroup.

Finally, we need to extend the functionality so that the shared vhost 
thread 
serves multiple vms (not necessarily belonging to the same cgroup).

There was a lot of discussion about the way to address the enforcement 
of cgroup policies, and we will consider the various solutions with a 
future
patch.

2. Creation of vhost threads

We suggested two ways of controlling the creation and removal of vhost
threads: 
- statically determining the maximum number of virtio devices per worker 
via a kernel module parameter 
- dynamically: Sysfs mechanism to add and remove vhost threads 

It seems that it would be simplest to take the static approach as
a first stage. At a second stage (next patch), we'll advance to 
dynamically 
changing the number of vhost threads, using the static module parameter 
only as a default value. 

Regarding cwmq, it is an interesting mechanism, which we need to explore 
further.
At the moment we prefer not to change the vhost model to use cwmq, as some 
of the issues that were discussed, such as cgroups, are not supported by 
cwmq, and this is adding more complexity.
However, we'll look further into it, and consider it at a later stage.

3. Adding polling mode to vhost 

It is a good idea making polling adaptive based on various factors such as 
the I/O rate, the guest kick overhead(which is the tradeoff of polling), 
or the amount of wasted cycles (cycles we kept polling but no new work was 
added).
However, as a beginning polling patch, we would prefer having a naive 
polling approach, which could be tuned with later patches.

4. vhost statistics 

The issue that was raised for the vhost statistics was using ftrace 
instead of the debugfs mechanism.
However, looking further into the kvm stat mechanism, we learned that 
ftrace didn't replace the plain debugfs mechanism, but was used in 
addition to it.
 
We propose to continue using debugfs for statistics, in a manner similar 
to kvm,
and at some point in the future ftrace can be added to vhost as well.
 
Does this plan look o.k.?
If there are no further comments, I'll start preparing the patches 
according to what we've agreed on thus far.
Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html