Re: Elvis upstreaming plan

2013-12-02 Thread Stefan Hajnoczi
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
 
 
 Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM:
 
  On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
   Hi,
  
   Razya is out for a few days, so I will try to answer the questions as
 well
   as I can:
  
   Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
  
From: Michael S. Tsirkin m...@redhat.com
To: Abel Gordon/Haifa/IBM@IBMIL,
Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
Haifa/IBM@IBMIL
Date: 27/11/2013 01:08 AM
Subject: Re: Elvis upstreaming plan
   
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:


 Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
 08:05:00
   PM:

 
  Razya Ladelsky ra...@il.ibm.com writes:
 
   edit

 That's why we are proposing to implement a mechanism that will
 enable
 the management stack to configure 1 thread per I/O device (as it is
   today)
 or 1 thread for many I/O devices (belonging to the same VM).

  Once you are scheduling multiple guests in a single vhost device,
 you
  now create a whole new class of DoS attacks in the best case
   scenario.

 Again, we are NOT proposing to schedule multiple guests in a single
 vhost thread. We are proposing to schedule multiple devices
 belonging
 to the same guest in a single (or multiple) vhost thread/s.

   
I guess a question then becomes why have multiple devices?
  
   If you mean why serve multiple devices from a single thread the
 answer is
   that we cannot rely on the Linux scheduler which has no knowledge of
 I/O
   queues to do a decent job of scheduling I/O.  The idea is to take over
 the
   I/O scheduling responsibilities from the kernel's thread scheduler with
 a
   more efficient I/O scheduler inside each vhost thread.  So by combining
 all
   of the I/O devices from the same guest (disks, network cards, etc) in a
   single I/O thread, it allows us to provide better scheduling by giving
 us
   more knowledge of the nature of the work.  So now instead of relying on
 the
   linux scheduler to perform context switches between multiple vhost
 threads,
   we have a single thread context in which we can do the I/O scheduling
 more
   efficiently.  We can closely monitor the performance needs of each
 queue of
   each device inside the vhost thread which gives us much more
 information
   than relying on the kernel's thread scheduler.
 
  And now there are 2 performance-critical pieces that need to be
  optimized/tuned instead of just 1:
 
  1. Kernel infrastructure that QEMU and vhost use today but you decided
  to bypass.
 
 We are NOT bypassing existing components. We are just changing the
 threading
 model: instead of having one vhost-thread per virtio device, we propose to
 use
 1 vhost thread to server devices belonging to the same VM. In addition, we
 propose to add new features such as polling.

What I meant with bypassing is that reducing scope to single VMs
leaves multi-VM performance unchanged.  I know the original aim was to
improve multi-VM performance too and I hope that will be possible by
extending the current approach.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-28 Thread Abel Gordon


Anthony Liguori anth...@codemonkey.ws wrote on 28/11/2013 12:33:36 AM:

 From: Anthony Liguori anth...@codemonkey.ws
 To: Abel Gordon/Haifa/IBM@IBMIL, Michael S. Tsirkin m...@redhat.com,
 Cc: abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com,
 Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com,
 jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL,
 kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL
 Date: 28/11/2013 12:33 AM
 Subject: Re: Elvis upstreaming plan

 Abel Gordon ab...@il.ibm.com writes:

  Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:
 
 
  On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
   Hi,
  
   Razya is out for a few days, so I will try to answer the questions
as
  well
   as I can:
  
   Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57
PM:
  
From: Michael S. Tsirkin m...@redhat.com
To: Abel Gordon/Haifa/IBM@IBMIL,
Cc: Anthony Liguori anth...@codemonkey.ws,
abel.gor...@gmail.com,
as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
Ladelsky/
Haifa/IBM@IBMIL
Date: 27/11/2013 01:08 AM
Subject: Re: Elvis upstreaming plan
   
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:


 Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
  08:05:00
   PM:

 
  Razya Ladelsky ra...@il.ibm.com writes:
 
   edit

 That's why we are proposing to implement a mechanism that will
  enable
 the management stack to configure 1 thread per I/O device (as it
is
   today)
 or 1 thread for many I/O devices (belonging to the same VM).

  Once you are scheduling multiple guests in a single vhost
device,
  you
  now create a whole new class of DoS attacks in the best case
   scenario.

 Again, we are NOT proposing to schedule multiple guests in a
single
 vhost thread. We are proposing to schedule multiple devices
  belonging
 to the same guest in a single (or multiple) vhost thread/s.

   
I guess a question then becomes why have multiple devices?
  
   If you mean why serve multiple devices from a single thread the
  answer is
   that we cannot rely on the Linux scheduler which has no knowledge of
  I/O
   queues to do a decent job of scheduling I/O.  The idea is to take
over
  the
   I/O scheduling responsibilities from the kernel's thread scheduler
with
  a
   more efficient I/O scheduler inside each vhost thread.  So by
combining
  all
   of the I/O devices from the same guest (disks, network cards, etc)
in a
   single I/O thread, it allows us to provide better scheduling by
giving
  us
   more knowledge of the nature of the work.  So now instead of relying
on
  the
   linux scheduler to perform context switches between multiple vhost
  threads,
   we have a single thread context in which we can do the I/O
scheduling
  more
   efficiently.  We can closely monitor the performance needs of each
  queue of
   each device inside the vhost thread which gives us much more
  information
   than relying on the kernel's thread scheduler.
   This does not expose any additional opportunities for attacks (DoS
or
   other) than are already available since all of the I/O traffic
belongs
  to a
   single guest.
   You can make the argument that with low I/O loads this mechanism may
  not
   make much difference.  However when you try to maximize the
utilization
  of
   your hardware (such as in a commercial scenario) this technique can
  gain
   you a large benefit.
  
   Regards,
  
   Joel Nider
   Virtualization Research
   IBM Research and Development
   Haifa Research Lab
 
  So all this would sound more convincing if we had sharing between VMs.
  When it's only a single VM it's somehow less convincing, isn't it?
  Of course if we would bypass a scheduler like this it becomes harder
to
  enforce cgroup limits.
 
  True, but here the issue becomes isolation/cgroups. We can start to
show
  the value for VMs that have multiple devices / queues and then we could
  re-consider extending the mechanism for multiple VMs (at least as a
  experimental feature).
 
  But it might be easier to give scheduler the info it needs to do what
we
  need.  Would an API that basically says run this kthread right now
  do the trick?
 
  ...do you really believe it would be possible to push this kind of
change
  to the Linux scheduler ? In addition, we need more than
  run this kthread right now because you need to monitor the virtio
  ring activity to specify when you will like to run a specific
kthread
  and for how long.

 Paul Turner has a proposal for exactly this:

 http://www.linuxplumbersconf.org/2013/ocw/sessions/1653

 The video is up on Youtube I think. It definitely is a general problem
 that is not at all virtual I/O specific.

Interesting, thanks for sharing. If you have a link

Re: Elvis upstreaming plan

2013-11-28 Thread Michael S. Tsirkin
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
 Isolation is important but the question is what isolation means ?

Mostly two things:
- Count resource usage against the correct cgroups,
  and limit it as appropriate
- If one user does something silly and is blocked,
  another user isn't affected


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:

 On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
 
 
  Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
PM:
 
  
   Razya Ladelsky ra...@il.ibm.com writes:
  
Hi all,
   
I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
developed Elvis, presented by Abel Gordon at the last KVM forum:
ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
ELVIS slides:
  https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
   
   
According to the discussions that took place at the forum,
upstreaming
some of the Elvis approaches seems to be a good idea, which we
would
  like
to pursue.
   
Our plan for the first patches is the following:
   
1.Shared vhost thread between mutiple devices
This patch creates a worker thread and worker queue shared across
  multiple
virtio devices
We would like to modify the patch posted in
https://github.com/abelg/virtual_io_acceleration/commit/
   3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
to limit a vhost thread to serve multiple devices only if they
belong
  to
the same VM as Paolo suggested to avoid isolation or cgroups
concerns.
   
Another modification is related to the creation and removal of
vhost
threads, which will be discussed next.
  
   I think this is an exceptionally bad idea.
  
   We shouldn't throw away isolation without exhausting every other
   possibility.
 
  Seems you have missed the important details here.
  Anthony, we are aware you are concerned about isolation
  and you believe we should not share a single vhost thread across
  multiple VMs.  That's why Razya proposed to change the patch
  so we will serve multiple virtio devices using a single vhost thread
  only if the devices belong to the same VM. This series of patches
  will not allow two different VMs to share the same vhost thread.
  So, I don't see why this will be throwing away isolation and why
  this could be a exceptionally bad idea.
 
  By the way, I remember that during the KVM forum a similar
  approach of having a single data plane thread for many devices
  was discussed
   We've seen very positive results from adding threads.  We should also
   look at scheduling.
 
  ...and we have also seen exceptionally negative results from
  adding threads, both for vhost and data-plane. If you have lot of idle
  time/cores
  then it makes sense to run multiple threads. But IMHO in many scenarios
you
  don't have lot of idle time/cores.. and if you have them you would
probably
  prefer to run more VMs/VCPUshosting a single SMP VM when you have
  enough physical cores to run all the VCPU threads and the I/O threads
is
  not a
  realistic scenario.
 
  That's why we are proposing to implement a mechanism that will enable
  the management stack to configure 1 thread per I/O device (as it is
today)
  or 1 thread for many I/O devices (belonging to the same VM).
 
   Once you are scheduling multiple guests in a single vhost device, you
   now create a whole new class of DoS attacks in the best case
scenario.
 
  Again, we are NOT proposing to schedule multiple guests in a single
  vhost thread. We are proposing to schedule multiple devices belonging
  to the same guest in a single (or multiple) vhost thread/s.
 

 I guess a question then becomes why have multiple devices?

I assume that there are guests that have multiple vhost devices
(net or scsi/tcm). We can also extend the approach to consider
multiqueue devices, so we can create 1 vhost thread shared for all the
queues,
1 vhost thread for each queue or a few threads for multiple queues. We
could also share a thread across multiple queues even if they do not belong
to the same device.

Remember the experiments Shirley Ma did with the split
tx/rx ? If we have a control interface we could support both
approaches: different threads or a single thread.



  
2. Sysfs mechanism to add and remove vhost threads
This patch allows us to add and remove vhost threads dynamically.
   
A simpler way to control the creation of vhost threads is
statically
determining the maximum number of virtio devices per worker via a
  kernel
module parameter (which is the way the previously mentioned patch
is
currently implemented)
   
I'd like to ask for advice here about the more preferable way to
go:
Although having the sysfs mechanism provides more flexibility, it
may
  be a
good idea to start with a simple static parameter, and have the
first
patches as simple as possible. What do you think?
   
3.Add virtqueue polling mode to vhost
Have the vhost thread poll the virtqueues with high I/O rate for
new
buffers , and avoid asking the guest to kick us.
https://github.com/abelg/virtual_io_acceleration/commit/
   26616133fafb7855cc80fac070b0572fd1aaf5d0
  
   Ack on this.
 
  :)
 
  Regards,
  Abel.
 
  
   Regards,
  
   Anthony 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
 
 
 Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
  On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
  
  
   Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
 PM:
  
   
Razya Ladelsky ra...@il.ibm.com writes:
   
 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
 developed Elvis, presented by Abel Gordon at the last KVM forum:
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
 ELVIS slides:
   https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE


 According to the discussions that took place at the forum,
 upstreaming
 some of the Elvis approaches seems to be a good idea, which we
 would
   like
 to pursue.

 Our plan for the first patches is the following:

 1.Shared vhost thread between mutiple devices
 This patch creates a worker thread and worker queue shared across
   multiple
 virtio devices
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/
3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
 to limit a vhost thread to serve multiple devices only if they
 belong
   to
 the same VM as Paolo suggested to avoid isolation or cgroups
 concerns.

 Another modification is related to the creation and removal of
 vhost
 threads, which will be discussed next.
   
I think this is an exceptionally bad idea.
   
We shouldn't throw away isolation without exhausting every other
possibility.
  
   Seems you have missed the important details here.
   Anthony, we are aware you are concerned about isolation
   and you believe we should not share a single vhost thread across
   multiple VMs.  That's why Razya proposed to change the patch
   so we will serve multiple virtio devices using a single vhost thread
   only if the devices belong to the same VM. This series of patches
   will not allow two different VMs to share the same vhost thread.
   So, I don't see why this will be throwing away isolation and why
   this could be a exceptionally bad idea.
  
   By the way, I remember that during the KVM forum a similar
   approach of having a single data plane thread for many devices
   was discussed
We've seen very positive results from adding threads.  We should also
look at scheduling.
  
   ...and we have also seen exceptionally negative results from
   adding threads, both for vhost and data-plane. If you have lot of idle
   time/cores
   then it makes sense to run multiple threads. But IMHO in many scenarios
 you
   don't have lot of idle time/cores.. and if you have them you would
 probably
   prefer to run more VMs/VCPUshosting a single SMP VM when you have
   enough physical cores to run all the VCPU threads and the I/O threads
 is
   not a
   realistic scenario.
  
   That's why we are proposing to implement a mechanism that will enable
   the management stack to configure 1 thread per I/O device (as it is
 today)
   or 1 thread for many I/O devices (belonging to the same VM).
  
Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case
 scenario.
  
   Again, we are NOT proposing to schedule multiple guests in a single
   vhost thread. We are proposing to schedule multiple devices belonging
   to the same guest in a single (or multiple) vhost thread/s.
  
 
  I guess a question then becomes why have multiple devices?
 
 I assume that there are guests that have multiple vhost devices
 (net or scsi/tcm).

These are kind of uncommon though.  In fact a kernel thread is not a
unit of isolation - cgroups supply isolation.
If we had use_cgroups kind of like use_mm, we could thinkably
do work for multiple VMs on the same thread.


 We can also extend the approach to consider
 multiqueue devices, so we can create 1 vhost thread shared for all the
 queues,
 1 vhost thread for each queue or a few threads for multiple queues. We
 could also share a thread across multiple queues even if they do not belong
 to the same device.
 
 Remember the experiments Shirley Ma did with the split
 tx/rx ? If we have a control interface we could support both
 approaches: different threads or a single thread.


I'm a bit concerned about interface managing specific
threads being so low level.
What exactly is it that management knows that makes it
efficient to group threads together?
That host is over-committed so we should use less CPU?
I'd like the interface to express that knowledge.


 
 
   
 2. Sysfs mechanism to add and remove vhost threads
 This patch allows us to add and remove vhost threads dynamically.

 A simpler way to control the creation of vhost threads is
 statically
 determining the maximum number of virtio devices per worker via a
   kernel
 module parameter (which is the way the 

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM:

 On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
   4. vhost statistics
   This patch introduces a set of statistics to monitor different
 performance
   metrics of vhost and our polling and I/O scheduling mechanisms. The
   statistics are exposed using debugfs and can be easily displayed with
a
   Python script (vhost_stat, based on the old kvm_stats)
   https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
  How about using trace points instead? Besides statistics, it can also
  help more in debugging.
 Definitely. kvm_stats has moved to ftrace long time ago.


We should use trace points for debugging information  but IMHO we should
have a dedicated (and different) mechanism to expose data that can be
easily consumed by a user-space (policy) application to control how many
vhost threads we need or any other vhost feature we may introduce
(e.g. polling). That's why we proposed something like vhost_stat
based on sysfs.

This is not like kvm_stat that can be replaced with tracepoints. Here
we will like to expose data to control the system. So I would
say what we are trying to do something that resembles the ksm interface
implemented under /sys/kernel/mm/ksm/

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
 
 
 Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM:
 
  On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
4. vhost statistics
This patch introduces a set of statistics to monitor different
  performance
metrics of vhost and our polling and I/O scheduling mechanisms. The
statistics are exposed using debugfs and can be easily displayed with
 a
Python script (vhost_stat, based on the old kvm_stats)
https://github.com/abelg/virtual_io_acceleration/commit/
  ac14206ea56939ecc3608dc5f978b86fa322e7b0
  
   How about using trace points instead? Besides statistics, it can also
   help more in debugging.
  Definitely. kvm_stats has moved to ftrace long time ago.
 
 
 We should use trace points for debugging information  but IMHO we should
 have a dedicated (and different) mechanism to expose data that can be
 easily consumed by a user-space (policy) application to control how many
 vhost threads we need or any other vhost feature we may introduce
 (e.g. polling). That's why we proposed something like vhost_stat
 based on sysfs.
 
 This is not like kvm_stat that can be replaced with tracepoints. Here
 we will like to expose data to control the system. So I would
 say what we are trying to do something that resembles the ksm interface
 implemented under /sys/kernel/mm/ksm/
There are control operation and there are performance/statistic
gathering operations use /sys for former and ftrace for later. The fact
that you need /sys interface for other things does not mean you can
abuse it for statistics too.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Gleb Natapov g...@redhat.com wrote on 27/11/2013 11:21:59 AM:


 On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
 
 
  Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM:
 
   On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
 4. vhost statistics
 This patch introduces a set of statistics to monitor different
   performance
 metrics of vhost and our polling and I/O scheduling mechanisms.
The
 statistics are exposed using debugfs and can be easily displayed
with
  a
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/
   ac14206ea56939ecc3608dc5f978b86fa322e7b0
   
How about using trace points instead? Besides statistics, it can
also
help more in debugging.
   Definitely. kvm_stats has moved to ftrace long time ago.
  
 
  We should use trace points for debugging information  but IMHO we
should
  have a dedicated (and different) mechanism to expose data that can be
  easily consumed by a user-space (policy) application to control how
many
  vhost threads we need or any other vhost feature we may introduce
  (e.g. polling). That's why we proposed something like vhost_stat
  based on sysfs.
 
  This is not like kvm_stat that can be replaced with tracepoints. Here
  we will like to expose data to control the system. So I would
  say what we are trying to do something that resembles the ksm interface
  implemented under /sys/kernel/mm/ksm/
 There are control operation and there are performance/statistic
 gathering operations use /sys for former and ftrace for later. The fact
 that you need /sys interface for other things does not mean you can
 abuse it for statistics too.

Agree. Any statistics that we add for debugging purposes should be
implemented
using tracepoints. But control and related data interfaces (that are not
for
debugging purposes) should be in sysfs. Look for example at
 /sys/kernel/mm/ksm/full_scans
 /sys/kernel/mm/ksm/pages_shared
 /sys/kernel/mm/ksm/pages_sharing
 /sys/kernel/mm/ksm/pages_to_scan
 /sys/kernel/mm/ksm/pages_unshared
 /sys/kernel/mm/ksm/pages_volatile


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 11:33:19AM +0200, Abel Gordon wrote:
 
 
 Gleb Natapov g...@redhat.com wrote on 27/11/2013 11:21:59 AM:
 
 
  On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
  
  
   Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM:
  
On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
  4. vhost statistics
  This patch introduces a set of statistics to monitor different
performance
  metrics of vhost and our polling and I/O scheduling mechanisms.
 The
  statistics are exposed using debugfs and can be easily displayed
 with
   a
  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/
ac14206ea56939ecc3608dc5f978b86fa322e7b0

 How about using trace points instead? Besides statistics, it can
 also
 help more in debugging.
Definitely. kvm_stats has moved to ftrace long time ago.
   
  
   We should use trace points for debugging information  but IMHO we
 should
   have a dedicated (and different) mechanism to expose data that can be
   easily consumed by a user-space (policy) application to control how
 many
   vhost threads we need or any other vhost feature we may introduce
   (e.g. polling). That's why we proposed something like vhost_stat
   based on sysfs.
  
   This is not like kvm_stat that can be replaced with tracepoints. Here
   we will like to expose data to control the system. So I would
   say what we are trying to do something that resembles the ksm interface
   implemented under /sys/kernel/mm/ksm/
  There are control operation and there are performance/statistic
  gathering operations use /sys for former and ftrace for later. The fact
  that you need /sys interface for other things does not mean you can
  abuse it for statistics too.
 
 Agree. Any statistics that we add for debugging purposes should be
 implemented
 using tracepoints. But control and related data interfaces (that are not
 for
 debugging purposes) should be in sysfs. Look for example at
Yes things that are not for statistics only and part of control interface
that management will use should not use ftrace (I do not think adding
more knobs is a good idea, but this is for vhost maintainer to decide),
but ksm predates ftrace, so some things below could have been implemented
as ftrace points.

  /sys/kernel/mm/ksm/full_scans
  /sys/kernel/mm/ksm/pages_shared
  /sys/kernel/mm/ksm/pages_sharing
  /sys/kernel/mm/ksm/pages_to_scan
  /sys/kernel/mm/ksm/pages_unshared
  /sys/kernel/mm/ksm/pages_volatile
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM:


 On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
 
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team,
which
  developed Elvis, presented by Abel Gordon at the last KVM
forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum,
  upstreaming
  some of the Elvis approaches seems to be a good idea, which we
  would
like
  to pursue.
 
  Our plan for the first patches is the following:
 
  1.Shared vhost thread between mutiple devices
  This patch creates a worker thread and worker queue shared
across
multiple
  virtio devices
  We would like to modify the patch posted in
  https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  to limit a vhost thread to serve multiple devices only if they
  belong
to
  the same VM as Paolo suggested to avoid isolation or cgroups
  concerns.
 
  Another modification is related to the creation and removal of
  vhost
  threads, which will be discussed next.

 I think this is an exceptionally bad idea.

 We shouldn't throw away isolation without exhausting every other
 possibility.
   
Seems you have missed the important details here.
Anthony, we are aware you are concerned about isolation
and you believe we should not share a single vhost thread across
multiple VMs.  That's why Razya proposed to change the patch
so we will serve multiple virtio devices using a single vhost
thread
only if the devices belong to the same VM. This series of patches
will not allow two different VMs to share the same vhost thread.
So, I don't see why this will be throwing away isolation and why
this could be a exceptionally bad idea.
   
By the way, I remember that during the KVM forum a similar
approach of having a single data plane thread for many devices
was discussed
 We've seen very positive results from adding threads.  We should
also
 look at scheduling.
   
...and we have also seen exceptionally negative results from
adding threads, both for vhost and data-plane. If you have lot of
idle
time/cores
then it makes sense to run multiple threads. But IMHO in many
scenarios
  you
don't have lot of idle time/cores.. and if you have them you would
  probably
prefer to run more VMs/VCPUshosting a single SMP VM when you
have
enough physical cores to run all the VCPU threads and the I/O
threads
  is
not a
realistic scenario.
   
That's why we are proposing to implement a mechanism that will
enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  I assume that there are guests that have multiple vhost devices
  (net or scsi/tcm).

 These are kind of uncommon though.  In fact a kernel thread is not a
 unit of isolation - cgroups supply isolation.
 If we had use_cgroups kind of like use_mm, we could thinkably
 do work for multiple VMs on the same thread.


  We can also extend the approach to consider
  multiqueue devices, so we can create 1 vhost thread shared for all the
  queues,
  1 vhost thread for each queue or a few threads for multiple queues. We
  could also share a thread across multiple queues even if they do not
belong
  to the same device.
 
  Remember the experiments Shirley Ma did with the split
  tx/rx ? If we have a control interface we could support both
  approaches: different threads or a single thread.


 I'm a bit concerned about interface managing specific
 threads being so low level.
 What exactly is it that management knows that makes it
 efficient to group threads together?
 That host is over-committed so we should use less CPU?
 I'd like the interface to express that knowledge.


We can expose information such as the amount of I/O being
handled for each queue, the amount of CPU cycles consumed for
processing the I/O, latency and more.
If 

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Jason Wang jasow...@redhat.com wrote on 27/11/2013 04:49:20 AM:


 On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
  developed Elvis, presented by Abel Gordon at the last KVM forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum, upstreaming
  some of the Elvis approaches seems to be a good idea, which we would
like
  to pursue.
 
  Our plan for the first patches is the following:
 
  1.Shared vhost thread between mutiple devices
  This patch creates a worker thread and worker queue shared across
multiple
  virtio devices
  We would like to modify the patch posted in
  https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  to limit a vhost thread to serve multiple devices only if they belong
to
  the same VM as Paolo suggested to avoid isolation or cgroups concerns.
 
  Another modification is related to the creation and removal of vhost
  threads, which will be discussed next.
 
  2. Sysfs mechanism to add and remove vhost threads
  This patch allows us to add and remove vhost threads dynamically.
 
  A simpler way to control the creation of vhost threads is statically
  determining the maximum number of virtio devices per worker via a
kernel
  module parameter (which is the way the previously mentioned patch is
  currently implemented)

 Any chance we can re-use the cwmq instead of inventing another
 mechanism? Looks like there're lots of function duplication here. Bandan
 has an RFC to do this.

Thanks for the suggestion. We should certainly take a look at Bandan's
patches which I guess are:

http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html

My only concern here is that we may not be able to easily implement
our polling mechanism and heuristics with cwmq.

 
  I'd like to ask for advice here about the more preferable way to go:
  Although having the sysfs mechanism provides more flexibility, it may
be a
  good idea to start with a simple static parameter, and have the first
  patches as simple as possible. What do you think?
 
  3.Add virtqueue polling mode to vhost
  Have the vhost thread poll the virtqueues with high I/O rate for new
  buffers , and avoid asking the guest to kick us.
  https://github.com/abelg/virtual_io_acceleration/commit/
 26616133fafb7855cc80fac070b0572fd1aaf5d0

 Maybe we can make poll_stop_idle adaptive which may help the light load
 case. Consider guest is often slow than vhost, if we just have one or
 two vms, polling too much may waste cpu in this case.

Yes, make polling adaptive based on the amount of wasted cycles (cycles
we did polling but didn't find new work) and I/O rate is a very good idea.
Note we already measure and expose these values but we do not use them
to adapt the polling mechanism.

Having said that, note that adaptive polling may be a bit tricky.
Remember that the cycles we waste polling in the vhost thread actually
improves the performance of the vcpu threads because the guest is no longer

require to kick (pio==exit) the host when vhost does polling. So even if
we waste cycles in the vhost thread, we are saving cycles in the
vcpu thread and improving performance.

  4. vhost statistics
  This patch introduces a set of statistics to monitor different
performance
  metrics of vhost and our polling and I/O scheduling mechanisms. The
  statistics are exposed using debugfs and can be easily displayed with a

  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0

 How about using trace points instead? Besides statistics, it can also
 help more in debugging.

Yep, we just had a discussion with Gleb about this :)

 
  5. Add heuristics to improve I/O scheduling
  This patch enhances the round-robin mechanism with a set of heuristics
to
  decide when to leave a virtqueue and proceed to the next.
  https://github.com/abelg/virtual_io_acceleration/commit/
 f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
 
  This patch improves the handling of the requests by the vhost thread,
but
  could perhaps be delayed to a
  later time , and not submitted as one of the first Elvis patches.
  I'd love to hear some comments about whether this patch needs to be
part
  of the first submission.
 
  Any other feedback on this plan will be appreciated,
  Thank you,
  Razya
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
 Hi,
 
 Razya is out for a few days, so I will try to answer the questions as well
 as I can:
 
 Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
  From: Michael S. Tsirkin m...@redhat.com
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
  as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
  IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
  IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
  Haifa/IBM@IBMIL
  Date: 27/11/2013 01:08 AM
  Subject: Re: Elvis upstreaming plan
 
  On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
  
  
   Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
 PM:
  
   
Razya Ladelsky ra...@il.ibm.com writes:
   
 edit
  
   That's why we are proposing to implement a mechanism that will enable
   the management stack to configure 1 thread per I/O device (as it is
 today)
   or 1 thread for many I/O devices (belonging to the same VM).
  
Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case
 scenario.
  
   Again, we are NOT proposing to schedule multiple guests in a single
   vhost thread. We are proposing to schedule multiple devices belonging
   to the same guest in a single (or multiple) vhost thread/s.
  
 
  I guess a question then becomes why have multiple devices?
 
 If you mean why serve multiple devices from a single thread the answer is
 that we cannot rely on the Linux scheduler which has no knowledge of I/O
 queues to do a decent job of scheduling I/O.  The idea is to take over the
 I/O scheduling responsibilities from the kernel's thread scheduler with a
 more efficient I/O scheduler inside each vhost thread.  So by combining all
 of the I/O devices from the same guest (disks, network cards, etc) in a
 single I/O thread, it allows us to provide better scheduling by giving us
 more knowledge of the nature of the work.  So now instead of relying on the
 linux scheduler to perform context switches between multiple vhost threads,
 we have a single thread context in which we can do the I/O scheduling more
 efficiently.  We can closely monitor the performance needs of each queue of
 each device inside the vhost thread which gives us much more information
 than relying on the kernel's thread scheduler.
 This does not expose any additional opportunities for attacks (DoS or
 other) than are already available since all of the I/O traffic belongs to a
 single guest.
 You can make the argument that with low I/O loads this mechanism may not
 make much difference.  However when you try to maximize the utilization of
 your hardware (such as in a commercial scenario) this technique can gain
 you a large benefit.
 
 Regards,
 
 Joel Nider
 Virtualization Research
 IBM Research and Development
 Haifa Research Lab

So all this would sound more convincing if we had sharing between VMs.
When it's only a single VM it's somehow less convincing, isn't it?
Of course if we would bypass a scheduler like this it becomes harder to
enforce cgroup limits.
But it might be easier to give scheduler the info it needs to do what we
need.  Would an API that basically says run this kthread right now
do the trick?


   
   
   
   
   
   
  Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image 
 moved to file: 
  E-mail: jo...@il.ibm.com  
 pic39571.gif)IBM 
   
   
   
   
 
 
 
 
 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
 developed Elvis, presented by Abel Gordon at the last KVM forum:
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
 ELVIS slides:
   https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE


 According to the discussions that took place at the forum,
 upstreaming
 some of the Elvis approaches seems to be a good idea, which we
 would
   like
 to pursue.

 Our plan for the first patches is the following:

 1.Shared vhost thread between mutiple devices
 This patch creates a worker thread and worker queue shared across
   multiple
 virtio devices
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/
3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
 to limit a vhost thread to serve multiple devices only if they
 belong
   to
 the same VM

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
 
 
 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM:
 
 
  On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
  
  
   Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
  
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:


 Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
 08:05:00
   PM:

 
  Razya Ladelsky ra...@il.ibm.com writes:
 
   Hi all,
  
   I am Razya Ladelsky, I work at IBM Haifa virtualization team,
 which
   developed Elvis, presented by Abel Gordon at the last KVM
 forum:
   ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
   ELVIS slides:
 https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
  
  
   According to the discussions that took place at the forum,
   upstreaming
   some of the Elvis approaches seems to be a good idea, which we
   would
 like
   to pursue.
  
   Our plan for the first patches is the following:
  
   1.Shared vhost thread between mutiple devices
   This patch creates a worker thread and worker queue shared
 across
 multiple
   virtio devices
   We would like to modify the patch posted in
   https://github.com/abelg/virtual_io_acceleration/commit/
  3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
   to limit a vhost thread to serve multiple devices only if they
   belong
 to
   the same VM as Paolo suggested to avoid isolation or cgroups
   concerns.
  
   Another modification is related to the creation and removal of
   vhost
   threads, which will be discussed next.
 
  I think this is an exceptionally bad idea.
 
  We shouldn't throw away isolation without exhausting every other
  possibility.

 Seems you have missed the important details here.
 Anthony, we are aware you are concerned about isolation
 and you believe we should not share a single vhost thread across
 multiple VMs.  That's why Razya proposed to change the patch
 so we will serve multiple virtio devices using a single vhost
 thread
 only if the devices belong to the same VM. This series of patches
 will not allow two different VMs to share the same vhost thread.
 So, I don't see why this will be throwing away isolation and why
 this could be a exceptionally bad idea.

 By the way, I remember that during the KVM forum a similar
 approach of having a single data plane thread for many devices
 was discussed
  We've seen very positive results from adding threads.  We should
 also
  look at scheduling.

 ...and we have also seen exceptionally negative results from
 adding threads, both for vhost and data-plane. If you have lot of
 idle
 time/cores
 then it makes sense to run multiple threads. But IMHO in many
 scenarios
   you
 don't have lot of idle time/cores.. and if you have them you would
   probably
 prefer to run more VMs/VCPUshosting a single SMP VM when you
 have
 enough physical cores to run all the VCPU threads and the I/O
 threads
   is
 not a
 realistic scenario.

 That's why we are proposing to implement a mechanism that will
 enable
 the management stack to configure 1 thread per I/O device (as it is
   today)
 or 1 thread for many I/O devices (belonging to the same VM).

  Once you are scheduling multiple guests in a single vhost device,
 you
  now create a whole new class of DoS attacks in the best case
   scenario.

 Again, we are NOT proposing to schedule multiple guests in a single
 vhost thread. We are proposing to schedule multiple devices
 belonging
 to the same guest in a single (or multiple) vhost thread/s.

   
I guess a question then becomes why have multiple devices?
  
   I assume that there are guests that have multiple vhost devices
   (net or scsi/tcm).
 
  These are kind of uncommon though.  In fact a kernel thread is not a
  unit of isolation - cgroups supply isolation.
  If we had use_cgroups kind of like use_mm, we could thinkably
  do work for multiple VMs on the same thread.
 
 
   We can also extend the approach to consider
   multiqueue devices, so we can create 1 vhost thread shared for all the
   queues,
   1 vhost thread for each queue or a few threads for multiple queues. We
   could also share a thread across multiple queues even if they do not
 belong
   to the same device.
  
   Remember the experiments Shirley Ma did with the split
   tx/rx ? If we have a control interface we could support both
   approaches: different threads or a single thread.
 
 
  I'm a bit concerned about interface managing specific
  threads being so low level.
  What exactly is it that management knows that makes it
  efficient to group threads together?
  That host is over-committed so we should use less CPU?
  I'd 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:18:51PM +0200, Abel Gordon wrote:
 
 
 Jason Wang jasow...@redhat.com wrote on 27/11/2013 04:49:20 AM:
 
 
  On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
   Hi all,
  
   I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
   developed Elvis, presented by Abel Gordon at the last KVM forum:
   ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
   ELVIS slides:
 https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
  
  
   According to the discussions that took place at the forum, upstreaming
   some of the Elvis approaches seems to be a good idea, which we would
 like
   to pursue.
  
   Our plan for the first patches is the following:
  
   1.Shared vhost thread between mutiple devices
   This patch creates a worker thread and worker queue shared across
 multiple
   virtio devices
   We would like to modify the patch posted in
   https://github.com/abelg/virtual_io_acceleration/commit/
  3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
   to limit a vhost thread to serve multiple devices only if they belong
 to
   the same VM as Paolo suggested to avoid isolation or cgroups concerns.
  
   Another modification is related to the creation and removal of vhost
   threads, which will be discussed next.
  
   2. Sysfs mechanism to add and remove vhost threads
   This patch allows us to add and remove vhost threads dynamically.
  
   A simpler way to control the creation of vhost threads is statically
   determining the maximum number of virtio devices per worker via a
 kernel
   module parameter (which is the way the previously mentioned patch is
   currently implemented)
 
  Any chance we can re-use the cwmq instead of inventing another
  mechanism? Looks like there're lots of function duplication here. Bandan
  has an RFC to do this.
 
 Thanks for the suggestion. We should certainly take a look at Bandan's
 patches which I guess are:
 
 http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html
 
 My only concern here is that we may not be able to easily implement
 our polling mechanism and heuristics with cwmq.

It's not so hard, to poll you just requeue work to make sure it's
re-invoked.

  
   I'd like to ask for advice here about the more preferable way to go:
   Although having the sysfs mechanism provides more flexibility, it may
 be a
   good idea to start with a simple static parameter, and have the first
   patches as simple as possible. What do you think?
  
   3.Add virtqueue polling mode to vhost
   Have the vhost thread poll the virtqueues with high I/O rate for new
   buffers , and avoid asking the guest to kick us.
   https://github.com/abelg/virtual_io_acceleration/commit/
  26616133fafb7855cc80fac070b0572fd1aaf5d0
 
  Maybe we can make poll_stop_idle adaptive which may help the light load
  case. Consider guest is often slow than vhost, if we just have one or
  two vms, polling too much may waste cpu in this case.
 
 Yes, make polling adaptive based on the amount of wasted cycles (cycles
 we did polling but didn't find new work) and I/O rate is a very good idea.
 Note we already measure and expose these values but we do not use them
 to adapt the polling mechanism.
 
 Having said that, note that adaptive polling may be a bit tricky.
 Remember that the cycles we waste polling in the vhost thread actually
 improves the performance of the vcpu threads because the guest is no longer
 
 require to kick (pio==exit) the host when vhost does polling. So even if
 we waste cycles in the vhost thread, we are saving cycles in the
 vcpu thread and improving performance.


So my suggestion would be:

- guest runs some kicks
- measures how long it took, e.g. kick = T cycles
- sends this info to host

host polls for at most fraction * T cycles


   4. vhost statistics
   This patch introduces a set of statistics to monitor different
 performance
   metrics of vhost and our polling and I/O scheduling mechanisms. The
   statistics are exposed using debugfs and can be easily displayed with a
 
   Python script (vhost_stat, based on the old kvm_stats)
   https://github.com/abelg/virtual_io_acceleration/commit/
  ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
  How about using trace points instead? Besides statistics, it can also
  help more in debugging.
 
 Yep, we just had a discussion with Gleb about this :)
 
  
   5. Add heuristics to improve I/O scheduling
   This patch enhances the round-robin mechanism with a set of heuristics
 to
   decide when to leave a virtqueue and proceed to the next.
   https://github.com/abelg/virtual_io_acceleration/commit/
  f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
  
   This patch improves the handling of the requests by the vhost thread,
 but
   could perhaps be delayed to a
   later time , and not submitted as one of the first Elvis patches.
   I'd love to hear some comments about whether this patch needs to be
 part
   of the first submission.
  
   Any other feedback on this plan will be appreciated,
   Thank you,
   Razya
 

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:


 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
 
  Razya is out for a few days, so I will try to answer the questions as
well
  as I can:
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will
enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  If you mean why serve multiple devices from a single thread the
answer is
  that we cannot rely on the Linux scheduler which has no knowledge of
I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over
the
  I/O scheduling responsibilities from the kernel's thread scheduler with
a
  more efficient I/O scheduler inside each vhost thread.  So by combining
all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving
us
  more knowledge of the nature of the work.  So now instead of relying on
the
  linux scheduler to perform context switches between multiple vhost
threads,
  we have a single thread context in which we can do the I/O scheduling
more
  efficiently.  We can closely monitor the performance needs of each
queue of
  each device inside the vhost thread which gives us much more
information
  than relying on the kernel's thread scheduler.
  This does not expose any additional opportunities for attacks (DoS or
  other) than are already available since all of the I/O traffic belongs
to a
  single guest.
  You can make the argument that with low I/O loads this mechanism may
not
  make much difference.  However when you try to maximize the utilization
of
  your hardware (such as in a commercial scenario) this technique can
gain
  you a large benefit.
 
  Regards,
 
  Joel Nider
  Virtualization Research
  IBM Research and Development
  Haifa Research Lab

 So all this would sound more convincing if we had sharing between VMs.
 When it's only a single VM it's somehow less convincing, isn't it?
 Of course if we would bypass a scheduler like this it becomes harder to
 enforce cgroup limits.

True, but here the issue becomes isolation/cgroups. We can start to show
the value for VMs that have multiple devices / queues and then we could
re-consider extending the mechanism for multiple VMs (at least as a
experimental feature).

 But it might be easier to give scheduler the info it needs to do what we
 need.  Would an API that basically says run this kthread right now
 do the trick?

...do you really believe it would be possible to push this kind of change
to the Linux scheduler ? In addition, we need more than
run this kthread right now because you need to monitor the virtio
ring activity to specify when you will like to run a specific kthread
and for how long.


 

 

 

   Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded
 image moved to file:
   E-mail: jo...@il.ibm.com
 pic39571.gif)IBM
 

 

 
 
 
 
  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team,
which
  developed Elvis, presented by Abel Gordon at the last KVM
forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum,
  upstreaming
  some of the Elvis approaches seems to be a good idea, which we
  would
like
  to pursue.
 
  Our plan for the first patches is the following:
 
  1.Shared vhost thread between mutiple devices
  This patch creates a worker thread and worker queue shared
across
multiple
  virtio devices
  We would like to modify the patch posted in
  https

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM:


 On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
 
 
  Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM:
 
  
   On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
   
   
Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57
PM:
   
 On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
 
 
  Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
  08:05:00
PM:
 
  
   Razya Ladelsky ra...@il.ibm.com writes:
  
Hi all,
   
I am Razya Ladelsky, I work at IBM Haifa virtualization
team,
  which
developed Elvis, presented by Abel Gordon at the last KVM
  forum:
ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
ELVIS slides:
  https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
   
   
According to the discussions that took place at the forum,
upstreaming
some of the Elvis approaches seems to be a good idea, which
we
would
  like
to pursue.
   
Our plan for the first patches is the following:
   
1.Shared vhost thread between mutiple devices
This patch creates a worker thread and worker queue shared
  across
  multiple
virtio devices
We would like to modify the patch posted in
https://github.com/abelg/virtual_io_acceleration/commit/
   3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
to limit a vhost thread to serve multiple devices only if
they
belong
  to
the same VM as Paolo suggested to avoid isolation or
cgroups
concerns.
   
Another modification is related to the creation and removal
of
vhost
threads, which will be discussed next.
  
   I think this is an exceptionally bad idea.
  
   We shouldn't throw away isolation without exhausting every
other
   possibility.
 
  Seems you have missed the important details here.
  Anthony, we are aware you are concerned about isolation
  and you believe we should not share a single vhost thread
across
  multiple VMs.  That's why Razya proposed to change the patch
  so we will serve multiple virtio devices using a single vhost
  thread
  only if the devices belong to the same VM. This series of
patches
  will not allow two different VMs to share the same vhost
thread.
  So, I don't see why this will be throwing away isolation and
why
  this could be a exceptionally bad idea.
 
  By the way, I remember that during the KVM forum a similar
  approach of having a single data plane thread for many devices
  was discussed
   We've seen very positive results from adding threads.  We
should
  also
   look at scheduling.
 
  ...and we have also seen exceptionally negative results from
  adding threads, both for vhost and data-plane. If you have lot
of
  idle
  time/cores
  then it makes sense to run multiple threads. But IMHO in many
  scenarios
you
  don't have lot of idle time/cores.. and if you have them you
would
probably
  prefer to run more VMs/VCPUshosting a single SMP VM when
you
  have
  enough physical cores to run all the VCPU threads and the I/O
  threads
is
  not a
  realistic scenario.
 
  That's why we are proposing to implement a mechanism that will
  enable
  the management stack to configure 1 thread per I/O device (as
it is
today)
  or 1 thread for many I/O devices (belonging to the same VM).
 
   Once you are scheduling multiple guests in a single vhost
device,
  you
   now create a whole new class of DoS attacks in the best case
scenario.
 
  Again, we are NOT proposing to schedule multiple guests in a
single
  vhost thread. We are proposing to schedule multiple devices
  belonging
  to the same guest in a single (or multiple) vhost thread/s.
 

 I guess a question then becomes why have multiple devices?
   
I assume that there are guests that have multiple vhost devices
(net or scsi/tcm).
  
   These are kind of uncommon though.  In fact a kernel thread is not a
   unit of isolation - cgroups supply isolation.
   If we had use_cgroups kind of like use_mm, we could thinkably
   do work for multiple VMs on the same thread.
  
  
We can also extend the approach to consider
multiqueue devices, so we can create 1 vhost thread shared for all
the
queues,
1 vhost thread for each queue or a few threads for multiple queues.
We
could also share a thread across multiple queues even if they do
not
  belong
to the same device.
   
Remember the experiments Shirley Ma did with the split
tx/rx ? If we have a control interface we could support both
approaches: different threads or a single thread.
  
  
   I'm a bit concerned about interface 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
 
 
 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:
 
 
  On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
   Hi,
  
   Razya is out for a few days, so I will try to answer the questions as
 well
   as I can:
  
   Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
  
From: Michael S. Tsirkin m...@redhat.com
To: Abel Gordon/Haifa/IBM@IBMIL,
Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
Haifa/IBM@IBMIL
Date: 27/11/2013 01:08 AM
Subject: Re: Elvis upstreaming plan
   
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:


 Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
 08:05:00
   PM:

 
  Razya Ladelsky ra...@il.ibm.com writes:
 
   edit

 That's why we are proposing to implement a mechanism that will
 enable
 the management stack to configure 1 thread per I/O device (as it is
   today)
 or 1 thread for many I/O devices (belonging to the same VM).

  Once you are scheduling multiple guests in a single vhost device,
 you
  now create a whole new class of DoS attacks in the best case
   scenario.

 Again, we are NOT proposing to schedule multiple guests in a single
 vhost thread. We are proposing to schedule multiple devices
 belonging
 to the same guest in a single (or multiple) vhost thread/s.

   
I guess a question then becomes why have multiple devices?
  
   If you mean why serve multiple devices from a single thread the
 answer is
   that we cannot rely on the Linux scheduler which has no knowledge of
 I/O
   queues to do a decent job of scheduling I/O.  The idea is to take over
 the
   I/O scheduling responsibilities from the kernel's thread scheduler with
 a
   more efficient I/O scheduler inside each vhost thread.  So by combining
 all
   of the I/O devices from the same guest (disks, network cards, etc) in a
   single I/O thread, it allows us to provide better scheduling by giving
 us
   more knowledge of the nature of the work.  So now instead of relying on
 the
   linux scheduler to perform context switches between multiple vhost
 threads,
   we have a single thread context in which we can do the I/O scheduling
 more
   efficiently.  We can closely monitor the performance needs of each
 queue of
   each device inside the vhost thread which gives us much more
 information
   than relying on the kernel's thread scheduler.
   This does not expose any additional opportunities for attacks (DoS or
   other) than are already available since all of the I/O traffic belongs
 to a
   single guest.
   You can make the argument that with low I/O loads this mechanism may
 not
   make much difference.  However when you try to maximize the utilization
 of
   your hardware (such as in a commercial scenario) this technique can
 gain
   you a large benefit.
  
   Regards,
  
   Joel Nider
   Virtualization Research
   IBM Research and Development
   Haifa Research Lab
 
  So all this would sound more convincing if we had sharing between VMs.
  When it's only a single VM it's somehow less convincing, isn't it?
  Of course if we would bypass a scheduler like this it becomes harder to
  enforce cgroup limits.
 
 True, but here the issue becomes isolation/cgroups. We can start to show
 the value for VMs that have multiple devices / queues and then we could
 re-consider extending the mechanism for multiple VMs (at least as a
 experimental feature).

Sorry, If it's unsafe we can't merge it even if it's experimental.

  But it might be easier to give scheduler the info it needs to do what we
  need.  Would an API that basically says run this kthread right now
  do the trick?
 
 ...do you really believe it would be possible to push this kind of change
 to the Linux scheduler ? In addition, we need more than
 run this kthread right now because you need to monitor the virtio
 ring activity to specify when you will like to run a specific kthread
 and for how long.

How long is easy - just call schedule. When sounds like specifying a
deadline which sounds like a reasonable fit to how scheduler works now.
Certainly adding an in-kernel API sounds like a better approach than
a bunch of user-visible ones.
So I'm not at all saying we need to change the scheduler - it's more
adding APIs to existing functionality.

 
  
 
  
 
  
 
Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded
  image moved to file:
E-mail: jo...@il.ibm.com
  pic39571.gif)IBM
  
 
  
 
  
  
  
  
   Hi all,
  
   I am Razya Ladelsky, I work at IBM Haifa virtualization team,
 which
   developed Elvis, presented by Abel Gordon at the last KVM
 forum

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
 
 
 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM:
 
 
  On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
  
  
   Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00 AM:
  
   
On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:


 Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57
 PM:

  On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
  
  
   Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
   08:05:00
 PM:
  
   
Razya Ladelsky ra...@il.ibm.com writes:
   
 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization
 team,
   which
 developed Elvis, presented by Abel Gordon at the last KVM
   forum:
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
 ELVIS slides:
   https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE


 According to the discussions that took place at the forum,
 upstreaming
 some of the Elvis approaches seems to be a good idea, which
 we
 would
   like
 to pursue.

 Our plan for the first patches is the following:

 1.Shared vhost thread between mutiple devices
 This patch creates a worker thread and worker queue shared
   across
   multiple
 virtio devices
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/
3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
 to limit a vhost thread to serve multiple devices only if
 they
 belong
   to
 the same VM as Paolo suggested to avoid isolation or
 cgroups
 concerns.

 Another modification is related to the creation and removal
 of
 vhost
 threads, which will be discussed next.
   
I think this is an exceptionally bad idea.
   
We shouldn't throw away isolation without exhausting every
 other
possibility.
  
   Seems you have missed the important details here.
   Anthony, we are aware you are concerned about isolation
   and you believe we should not share a single vhost thread
 across
   multiple VMs.  That's why Razya proposed to change the patch
   so we will serve multiple virtio devices using a single vhost
   thread
   only if the devices belong to the same VM. This series of
 patches
   will not allow two different VMs to share the same vhost
 thread.
   So, I don't see why this will be throwing away isolation and
 why
   this could be a exceptionally bad idea.
  
   By the way, I remember that during the KVM forum a similar
   approach of having a single data plane thread for many devices
   was discussed
We've seen very positive results from adding threads.  We
 should
   also
look at scheduling.
  
   ...and we have also seen exceptionally negative results from
   adding threads, both for vhost and data-plane. If you have lot
 of
   idle
   time/cores
   then it makes sense to run multiple threads. But IMHO in many
   scenarios
 you
   don't have lot of idle time/cores.. and if you have them you
 would
 probably
   prefer to run more VMs/VCPUshosting a single SMP VM when
 you
   have
   enough physical cores to run all the VCPU threads and the I/O
   threads
 is
   not a
   realistic scenario.
  
   That's why we are proposing to implement a mechanism that will
   enable
   the management stack to configure 1 thread per I/O device (as
 it is
 today)
   or 1 thread for many I/O devices (belonging to the same VM).
  
Once you are scheduling multiple guests in a single vhost
 device,
   you
now create a whole new class of DoS attacks in the best case
 scenario.
  
   Again, we are NOT proposing to schedule multiple guests in a
 single
   vhost thread. We are proposing to schedule multiple devices
   belonging
   to the same guest in a single (or multiple) vhost thread/s.
  
 
  I guess a question then becomes why have multiple devices?

 I assume that there are guests that have multiple vhost devices
 (net or scsi/tcm).
   
These are kind of uncommon though.  In fact a kernel thread is not a
unit of isolation - cgroups supply isolation.
If we had use_cgroups kind of like use_mm, we could thinkably
do work for multiple VMs on the same thread.
   
   
 We can also extend the approach to consider
 multiqueue devices, so we can create 1 vhost thread shared for all
 the
 queues,
 1 vhost thread for each queue or a few threads for multiple queues.
 We
 could also share a thread across multiple queues even if they do
 not
   belong
 to the same device.

 

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:59:38 PM:


 On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
 
 
  Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:
 
  
   On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
Hi,
   
Razya is out for a few days, so I will try to answer the questions
as
  well
as I can:
   
Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57
PM:
   
 From: Michael S. Tsirkin m...@redhat.com
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: Anthony Liguori anth...@codemonkey.ws,
abel.gor...@gmail.com,
 as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
 IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel
Nider/Haifa/
 IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
Ladelsky/
 Haifa/IBM@IBMIL
 Date: 27/11/2013 01:08 AM
 Subject: Re: Elvis upstreaming plan

 On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
 
 
  Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
  08:05:00
PM:
 
  
   Razya Ladelsky ra...@il.ibm.com writes:
  
edit
 
  That's why we are proposing to implement a mechanism that will
  enable
  the management stack to configure 1 thread per I/O device (as
it is
today)
  or 1 thread for many I/O devices (belonging to the same VM).
 
   Once you are scheduling multiple guests in a single vhost
device,
  you
   now create a whole new class of DoS attacks in the best case
scenario.
 
  Again, we are NOT proposing to schedule multiple guests in a
single
  vhost thread. We are proposing to schedule multiple devices
  belonging
  to the same guest in a single (or multiple) vhost thread/s.
 

 I guess a question then becomes why have multiple devices?
   
If you mean why serve multiple devices from a single thread the
  answer is
that we cannot rely on the Linux scheduler which has no knowledge
of
  I/O
queues to do a decent job of scheduling I/O.  The idea is to take
over
  the
I/O scheduling responsibilities from the kernel's thread scheduler
with
  a
more efficient I/O scheduler inside each vhost thread.  So by
combining
  all
of the I/O devices from the same guest (disks, network cards, etc)
in a
single I/O thread, it allows us to provide better scheduling by
giving
  us
more knowledge of the nature of the work.  So now instead of
relying on
  the
linux scheduler to perform context switches between multiple vhost
  threads,
we have a single thread context in which we can do the I/O
scheduling
  more
efficiently.  We can closely monitor the performance needs of each
  queue of
each device inside the vhost thread which gives us much more
  information
than relying on the kernel's thread scheduler.
This does not expose any additional opportunities for attacks (DoS
or
other) than are already available since all of the I/O traffic
belongs
  to a
single guest.
You can make the argument that with low I/O loads this mechanism
may
  not
make much difference.  However when you try to maximize the
utilization
  of
your hardware (such as in a commercial scenario) this technique can
  gain
you a large benefit.
   
Regards,
   
Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab
  
   So all this would sound more convincing if we had sharing between
VMs.
   When it's only a single VM it's somehow less convincing, isn't it?
   Of course if we would bypass a scheduler like this it becomes harder
to
   enforce cgroup limits.
 
  True, but here the issue becomes isolation/cgroups. We can start to
show
  the value for VMs that have multiple devices / queues and then we could
  re-consider extending the mechanism for multiple VMs (at least as a
  experimental feature).

 Sorry, If it's unsafe we can't merge it even if it's experimental.

   But it might be easier to give scheduler the info it needs to do what
we
   need.  Would an API that basically says run this kthread right now
   do the trick?
 
  ...do you really believe it would be possible to push this kind of
change
  to the Linux scheduler ? In addition, we need more than
  run this kthread right now because you need to monitor the virtio
  ring activity to specify when you will like to run a specific
kthread
  and for how long.

 How long is easy - just call schedule. When sounds like specifying a
 deadline which sounds like a reasonable fit to how scheduler works now.

... but when you should call schedule actually depends on the I/O
activity of the queues. The patches we shared constantly monitor the
virtio rings (pending items and for how long they are pending there)
to decide if we should continue processing the same queue or switch to
other queue.

 Certainly adding an in-kernel API sounds like a better approach than
 a bunch of user

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 01:03:25 PM:


 On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
 
 
  Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:29:43 PM:
 
  
   On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
   
   
Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 11:21:00
AM:
   

 On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
 
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013
11:11:57
  PM:
 
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization
  team,
which
  developed Elvis, presented by Abel Gordon at the last
KVM
forum:
  ELVIS video:
https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
   
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the
forum,
  upstreaming
  some of the Elvis approaches seems to be a good idea,
which
  we
  would
like
  to pursue.
 
  Our plan for the first patches is the following:
 
  1.Shared vhost thread between mutiple devices
  This patch creates a worker thread and worker queue
shared
across
multiple
  virtio devices
  We would like to modify the patch posted in
 
https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  to limit a vhost thread to serve multiple devices only
if
  they
  belong
to
  the same VM as Paolo suggested to avoid isolation or
  cgroups
  concerns.
 
  Another modification is related to the creation and
removal
  of
  vhost
  threads, which will be discussed next.

 I think this is an exceptionally bad idea.

 We shouldn't throw away isolation without exhausting
every
  other
 possibility.
   
Seems you have missed the important details here.
Anthony, we are aware you are concerned about isolation
and you believe we should not share a single vhost thread
  across
multiple VMs.  That's why Razya proposed to change the
patch
so we will serve multiple virtio devices using a single
vhost
thread
only if the devices belong to the same VM. This series of
  patches
will not allow two different VMs to share the same vhost
  thread.
So, I don't see why this will be throwing away isolation
and
  why
this could be a exceptionally bad idea.
   
By the way, I remember that during the KVM forum a similar
approach of having a single data plane thread for many
devices
was discussed
 We've seen very positive results from adding threads.  We
  should
also
 look at scheduling.
   
...and we have also seen exceptionally negative results
from
adding threads, both for vhost and data-plane. If you have
lot
  of
idle
time/cores
then it makes sense to run multiple threads. But IMHO in
many
scenarios
  you
don't have lot of idle time/cores.. and if you have them
you
  would
  probably
prefer to run more VMs/VCPUshosting a single SMP VM
when
  you
have
enough physical cores to run all the VCPU threads and the
I/O
threads
  is
not a
realistic scenario.
   
That's why we are proposing to implement a mechanism that
will
enable
the management stack to configure 1 thread per I/O device
(as
  it is
  today)
or 1 thread for many I/O devices (belonging to the same
VM).
   
 Once you are scheduling multiple guests in a single vhost
  device,
you
 now create a whole new class of DoS attacks in the best
case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in
a
  single
vhost thread. We are proposing to schedule multiple devices
belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  I assume that there are guests that have multiple vhost devices
  (net or scsi/tcm).

 These are kind of uncommon though.  In fact a kernel thread is
not a
 unit of isolation - cgroups supply isolation.
 If we had use_cgroups kind of like use_mm, we could thinkably
 do work for multiple VMs on the same thread.


  We can also extend the approach to consider
  multiqueue devices, so we can create 1 vhost thread 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote:
 
 
 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:59:38 PM:
 
 
  On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
  
  
   Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:
  
   
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
 Hi,

 Razya is out for a few days, so I will try to answer the questions
 as
   well
 as I can:

 Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57
 PM:

  From: Michael S. Tsirkin m...@redhat.com
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: Anthony Liguori anth...@codemonkey.ws,
 abel.gor...@gmail.com,
  as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
  IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel
 Nider/Haifa/
  IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
 Ladelsky/
  Haifa/IBM@IBMIL
  Date: 27/11/2013 01:08 AM
  Subject: Re: Elvis upstreaming plan
 
  On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
  
  
   Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
   08:05:00
 PM:
  
   
Razya Ladelsky ra...@il.ibm.com writes:
   
 edit
  
   That's why we are proposing to implement a mechanism that will
   enable
   the management stack to configure 1 thread per I/O device (as
 it is
 today)
   or 1 thread for many I/O devices (belonging to the same VM).
  
Once you are scheduling multiple guests in a single vhost
 device,
   you
now create a whole new class of DoS attacks in the best case
 scenario.
  
   Again, we are NOT proposing to schedule multiple guests in a
 single
   vhost thread. We are proposing to schedule multiple devices
   belonging
   to the same guest in a single (or multiple) vhost thread/s.
  
 
  I guess a question then becomes why have multiple devices?

 If you mean why serve multiple devices from a single thread the
   answer is
 that we cannot rely on the Linux scheduler which has no knowledge
 of
   I/O
 queues to do a decent job of scheduling I/O.  The idea is to take
 over
   the
 I/O scheduling responsibilities from the kernel's thread scheduler
 with
   a
 more efficient I/O scheduler inside each vhost thread.  So by
 combining
   all
 of the I/O devices from the same guest (disks, network cards, etc)
 in a
 single I/O thread, it allows us to provide better scheduling by
 giving
   us
 more knowledge of the nature of the work.  So now instead of
 relying on
   the
 linux scheduler to perform context switches between multiple vhost
   threads,
 we have a single thread context in which we can do the I/O
 scheduling
   more
 efficiently.  We can closely monitor the performance needs of each
   queue of
 each device inside the vhost thread which gives us much more
   information
 than relying on the kernel's thread scheduler.
 This does not expose any additional opportunities for attacks (DoS
 or
 other) than are already available since all of the I/O traffic
 belongs
   to a
 single guest.
 You can make the argument that with low I/O loads this mechanism
 may
   not
 make much difference.  However when you try to maximize the
 utilization
   of
 your hardware (such as in a commercial scenario) this technique can
   gain
 you a large benefit.

 Regards,

 Joel Nider
 Virtualization Research
 IBM Research and Development
 Haifa Research Lab
   
So all this would sound more convincing if we had sharing between VMs.
When it's only a single VM it's somehow less convincing, isn't it?
Of course if we would bypass a scheduler like this it becomes harder to
enforce cgroup limits.
  
   True, but here the issue becomes isolation/cgroups. We can start to show
   the value for VMs that have multiple devices / queues and then we could
   re-consider extending the mechanism for multiple VMs (at least as a
   experimental feature).
 
  Sorry, If it's unsafe we can't merge it even if it's experimental.
 
But it might be easier to give scheduler the info it needs to do what we
need.  Would an API that basically says run this kthread right now
do the trick?
  
   ...do you really believe it would be possible to push this kind of change
   to the Linux scheduler ? In addition, we need more than
   run this kthread right now because you need to monitor the virtio
   ring activity to specify when you will like to run a specific kthread
   and for how long.
 
  How long is easy - just call schedule. When sounds like specifying a
  deadline which sounds like a reasonable fit to how scheduler works now.
 
 ... but when you should call schedule actually depends on the I/O
 activity of the queues. The patches we shared constantly monitor the
 virtio rings (pending items

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 01:05:40PM +0200, Abel Gordon wrote:
   (CCing Eyal Moscovici who is actually prototyping with multiple
   policies and may want to join this thread)
  
   Starting with basic policies: we can use a single vhost thread
   and create new vhost threads if it becomes saturated and there
   are enough cpu cycles available in the system
   or if the latency (how long the requests in the virtio queues wait
   until they are handled) is too high.
   We can merge threads if the latency is already low or if the threads
   are not saturated.
  
   There is a hidden trade-off here: when you run more vhost threads you
   may actually be stealing cpu cycles from the vcpu threads and also
   increasing context switches. So, from the vhost perspective it may
   improve performance but from the vcpu threads perspective it may
   degrade performance.
 
  So this is a very interesting problem to solve but what does
  management know that suggests it can solve it better?
 
 Yep, and Eyal is currently working on this.
 What the management knows ? depends who the management is :)
 Could be just I/O activity (black-box: I/O request rate, I/O
 handling rate, latency)

We know much more about this than managament, don't we?

 or application performance (white-box).

This would have to come with a proposal for getting
this white-box info out of guest somehow.

-- 
MSR
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Stefan Hajnoczi
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
 Hi,
 
 Razya is out for a few days, so I will try to answer the questions as well
 as I can:
 
 Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
  From: Michael S. Tsirkin m...@redhat.com
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
  as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
  IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
  IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
  Haifa/IBM@IBMIL
  Date: 27/11/2013 01:08 AM
  Subject: Re: Elvis upstreaming plan
 
  On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
  
  
   Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
 PM:
  
   
Razya Ladelsky ra...@il.ibm.com writes:
   
 edit
  
   That's why we are proposing to implement a mechanism that will enable
   the management stack to configure 1 thread per I/O device (as it is
 today)
   or 1 thread for many I/O devices (belonging to the same VM).
  
Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case
 scenario.
  
   Again, we are NOT proposing to schedule multiple guests in a single
   vhost thread. We are proposing to schedule multiple devices belonging
   to the same guest in a single (or multiple) vhost thread/s.
  
 
  I guess a question then becomes why have multiple devices?
 
 If you mean why serve multiple devices from a single thread the answer is
 that we cannot rely on the Linux scheduler which has no knowledge of I/O
 queues to do a decent job of scheduling I/O.  The idea is to take over the
 I/O scheduling responsibilities from the kernel's thread scheduler with a
 more efficient I/O scheduler inside each vhost thread.  So by combining all
 of the I/O devices from the same guest (disks, network cards, etc) in a
 single I/O thread, it allows us to provide better scheduling by giving us
 more knowledge of the nature of the work.  So now instead of relying on the
 linux scheduler to perform context switches between multiple vhost threads,
 we have a single thread context in which we can do the I/O scheduling more
 efficiently.  We can closely monitor the performance needs of each queue of
 each device inside the vhost thread which gives us much more information
 than relying on the kernel's thread scheduler.

And now there are 2 performance-critical pieces that need to be
optimized/tuned instead of just 1:

1. Kernel infrastructure that QEMU and vhost use today but you decided
to bypass.
2. The new ELVIS code which only affects vhost devices in the same VM.

If you split the code paths it results in more effort in the long run
and the benefit seems quite limited once you acknowledge that isolation
is important.

Isn't the sane thing to do taking lessons from ELVIS improving existing
pieces instead of bypassing them?  That way both the single VM and
host-wide performance improves.  And as a bonus non-virtualization use
cases may also benefit.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 04:00:53PM +0100, Stefan Hajnoczi wrote:
 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
  
  Razya is out for a few days, so I will try to answer the questions as well
  as I can:
  
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
  
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device, you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
  
  If you mean why serve multiple devices from a single thread the answer is
  that we cannot rely on the Linux scheduler which has no knowledge of I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over the
  I/O scheduling responsibilities from the kernel's thread scheduler with a
  more efficient I/O scheduler inside each vhost thread.  So by combining all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving us
  more knowledge of the nature of the work.  So now instead of relying on the
  linux scheduler to perform context switches between multiple vhost threads,
  we have a single thread context in which we can do the I/O scheduling more
  efficiently.  We can closely monitor the performance needs of each queue of
  each device inside the vhost thread which gives us much more information
  than relying on the kernel's thread scheduler.
 
 And now there are 2 performance-critical pieces that need to be
 optimized/tuned instead of just 1:
 
 1. Kernel infrastructure that QEMU and vhost use today but you decided
 to bypass.
 2. The new ELVIS code which only affects vhost devices in the same VM.
 
 If you split the code paths it results in more effort in the long run
 and the benefit seems quite limited once you acknowledge that isolation
 is important.

 Isn't the sane thing to do taking lessons from ELVIS improving existing
 pieces instead of bypassing them?  That way both the single VM and
 host-wide performance improves.  And as a bonus non-virtualization use
 cases may also benefit.
 
 Stefan

I'm not sure about that. elvis is all about specific behaviour
patterns that are virtualization specific, and general claims
that we can improve scheduler for all workloads seem somewhat
optimistic.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Anthony Liguori
Abel Gordon ab...@il.ibm.com writes:

 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:


 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
 
  Razya is out for a few days, so I will try to answer the questions as
 well
  as I can:
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
 08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will
 enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
 you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
 belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  If you mean why serve multiple devices from a single thread the
 answer is
  that we cannot rely on the Linux scheduler which has no knowledge of
 I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over
 the
  I/O scheduling responsibilities from the kernel's thread scheduler with
 a
  more efficient I/O scheduler inside each vhost thread.  So by combining
 all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving
 us
  more knowledge of the nature of the work.  So now instead of relying on
 the
  linux scheduler to perform context switches between multiple vhost
 threads,
  we have a single thread context in which we can do the I/O scheduling
 more
  efficiently.  We can closely monitor the performance needs of each
 queue of
  each device inside the vhost thread which gives us much more
 information
  than relying on the kernel's thread scheduler.
  This does not expose any additional opportunities for attacks (DoS or
  other) than are already available since all of the I/O traffic belongs
 to a
  single guest.
  You can make the argument that with low I/O loads this mechanism may
 not
  make much difference.  However when you try to maximize the utilization
 of
  your hardware (such as in a commercial scenario) this technique can
 gain
  you a large benefit.
 
  Regards,
 
  Joel Nider
  Virtualization Research
  IBM Research and Development
  Haifa Research Lab

 So all this would sound more convincing if we had sharing between VMs.
 When it's only a single VM it's somehow less convincing, isn't it?
 Of course if we would bypass a scheduler like this it becomes harder to
 enforce cgroup limits.

 True, but here the issue becomes isolation/cgroups. We can start to show
 the value for VMs that have multiple devices / queues and then we could
 re-consider extending the mechanism for multiple VMs (at least as a
 experimental feature).

 But it might be easier to give scheduler the info it needs to do what we
 need.  Would an API that basically says run this kthread right now
 do the trick?

 ...do you really believe it would be possible to push this kind of change
 to the Linux scheduler ? In addition, we need more than
 run this kthread right now because you need to monitor the virtio
 ring activity to specify when you will like to run a specific kthread
 and for how long.

Paul Turner has a proposal for exactly this:

http://www.linuxplumbersconf.org/2013/ocw/sessions/1653

The video is up on Youtube I think. It definitely is a general problem
that is not at all virtual I/O specific.

Regards,

Anthony Liguori



 

 

 

   Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded
 image moved to file:
   E-mail: jo...@il.ibm.com
 pic39571.gif)IBM
 

 

 
 
 
 
  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team,
 which
  developed Elvis, presented by Abel Gordon at the last KVM
 forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum,
  upstreaming
  some of the Elvis approaches seems to be a good idea, which we
  would

Re: Elvis upstreaming plan

2013-11-27 Thread Joel Nider

Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM:

 From: Stefan Hajnoczi stefa...@gmail.com
 To: Joel Nider/Haifa/IBM@IBMIL,
 Cc: Michael S. Tsirkin m...@redhat.com, Abel Gordon/Haifa/
 IBM@IBMIL, abel.gor...@gmail.com, Anthony Liguori
 anth...@codemonkey.ws, as...@redhat.com, digitale...@google.com,
 Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com,
 jasow...@redhat.com, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
 Ladelsky/Haifa/IBM@IBMIL
 Date: 27/11/2013 05:00 PM
 Subject: Re: Elvis upstreaming plan

 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
 
  Razya is out for a few days, so I will try to answer the questions as
well
  as I can:
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will
enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  If you mean why serve multiple devices from a single thread the
answer is
  that we cannot rely on the Linux scheduler which has no knowledge of
I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over
the
  I/O scheduling responsibilities from the kernel's thread scheduler with
a
  more efficient I/O scheduler inside each vhost thread.  So by combining
all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving
us
  more knowledge of the nature of the work.  So now instead of relying on
the
  linux scheduler to perform context switches between multiple vhost
threads,
  we have a single thread context in which we can do the I/O scheduling
more
  efficiently.  We can closely monitor the performance needs of each
queue of
  each device inside the vhost thread which gives us much more
information
  than relying on the kernel's thread scheduler.

 And now there are 2 performance-critical pieces that need to be
 optimized/tuned instead of just 1:

 1. Kernel infrastructure that QEMU and vhost use today but you decided
 to bypass.
 2. The new ELVIS code which only affects vhost devices in the same VM.

 If you split the code paths it results in more effort in the long run
 and the benefit seems quite limited once you acknowledge that isolation
 is important.

Yes you are correct that there are now 2 performance-critical pieces of
code.  However what we are proposing is just proper module decoupling.  I
believe you will be hard pressed to make a good case that all of this logic
could be integrated into the Linux thread scheduler more efficiently.
Think of this as an I/O scheduler for virtualized guests.  I don't believe
anyone would try to integrate the Linux I/O schedulers with the Linux
thread scheduler, even though they are both performance-critical modules?
Even if we were to take the route of using these principles to improve the
existing scheduler, I have to ask: which scheduler?  If we spend this
effort on CFS (completely fair scheduler) but then someone switches their
thread scheduler to O(1) or some other scheduler, all of our advantage
would be lost.  We would then have to reimplement for every possible thread
scheduler.

I don't agree that we are losing isolation, even if you go with the full
ELVIS which was originally proposed.  But that is a discussion for another
day.  For now, let's agree that in this reduced ELVIS solution, no
isolation is lost, since each vhost thread is only dealing with I/O from
the same guest.

As for more effort - for whom do you mean?  Development time? Maintenance
effort? CPU time?  I would say all of those are actually less effort in the
long run. Dividing responsibility between modules with well-defined
interfaces reduces both development and maintenance effort. If we were to
modify the thread scheduler

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Stefan Hajnoczi stefa...@gmail.com wrote on 27/11/2013 05:00:53 PM:

 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
 
  Razya is out for a few days, so I will try to answer the questions as
well
  as I can:
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will
enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  If you mean why serve multiple devices from a single thread the
answer is
  that we cannot rely on the Linux scheduler which has no knowledge of
I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over
the
  I/O scheduling responsibilities from the kernel's thread scheduler with
a
  more efficient I/O scheduler inside each vhost thread.  So by combining
all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving
us
  more knowledge of the nature of the work.  So now instead of relying on
the
  linux scheduler to perform context switches between multiple vhost
threads,
  we have a single thread context in which we can do the I/O scheduling
more
  efficiently.  We can closely monitor the performance needs of each
queue of
  each device inside the vhost thread which gives us much more
information
  than relying on the kernel's thread scheduler.

 And now there are 2 performance-critical pieces that need to be
 optimized/tuned instead of just 1:

 1. Kernel infrastructure that QEMU and vhost use today but you decided
 to bypass.

We are NOT bypassing existing components. We are just changing the
threading
model: instead of having one vhost-thread per virtio device, we propose to
use
1 vhost thread to server devices belonging to the same VM. In addition, we
propose to add new features such as polling.

 2. The new ELVIS code which only affects vhost devices in the same VM.

Also existent vhost code (or any other user-space back-end) should be
optimized/tuned if you care about performance.


 If you split the code paths it results in more effort in the long run
 and the benefit seems quite limited once you acknowledge that isolation
 is important.

Isolation is important but the question is what isolation means ?
I personally don't believe that 2 kernel threads provide more
isolation than 1 kernel threat that changes the mm (use_mm) and
avoids queue starvation.
Anyway, we propose to start with the simple approach (not sharing
threads across VMs) but once we show the value for this case we
can discuss if it makes sense to extend the approach and share
threads between different VMs.


 Isn't the sane thing to do taking lessons from ELVIS improving existing
 pieces instead of bypassing them?  That way both the single VM and
 host-wide performance improves.  And as a bonus non-virtualization use
 cases may also benefit.

The model we are proposing are specific for I/O virtualization... not sure
if they are applicable to bare-metal.


 Stefan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Stefan Hajnoczi
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

This patch should probably do something portable instead of relying on
x86-only rdtscll().

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Anthony Liguori
Razya Ladelsky ra...@il.ibm.com writes:

 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
 developed Elvis, presented by Abel Gordon at the last KVM forum: 
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
 ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 


 According to the discussions that took place at the forum, upstreaming 
 some of the Elvis approaches seems to be a good idea, which we would like 
 to pursue.

 Our plan for the first patches is the following: 

 1.Shared vhost thread between mutiple devices 
 This patch creates a worker thread and worker queue shared across multiple 
 virtio devices 
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  
 to limit a vhost thread to serve multiple devices only if they belong to 
 the same VM as Paolo suggested to avoid isolation or cgroups concerns.

 Another modification is related to the creation and removal of vhost 
 threads, which will be discussed next.

I think this is an exceptionally bad idea.

We shouldn't throw away isolation without exhausting every other
possibility.

We've seen very positive results from adding threads.  We should also
look at scheduling.

Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case scenario.

 2. Sysfs mechanism to add and remove vhost threads 
 This patch allows us to add and remove vhost threads dynamically.

 A simpler way to control the creation of vhost threads is statically 
 determining the maximum number of virtio devices per worker via a kernel 
 module parameter (which is the way the previously mentioned patch is 
 currently implemented)

 I'd like to ask for advice here about the more preferable way to go:
 Although having the sysfs mechanism provides more flexibility, it may be a 
 good idea to start with a simple static parameter, and have the first 
 patches as simple as possible. What do you think?

 3.Add virtqueue polling mode to vhost 
 Have the vhost thread poll the virtqueues with high I/O rate for new 
 buffers , and avoid asking the guest to kick us.
 https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Ack on this.

Regards,

Anthony Liguori

 4. vhost statistics
 This patch introduces a set of statistics to monitor different performance 
 metrics of vhost and our polling and I/O scheduling mechanisms. The 
 statistics are exposed using debugfs and can be easily displayed with a 
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0


 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

 This patch improves the handling of the requests by the vhost thread, but 
 could perhaps be delayed to a 
 later time , and not submitted as one of the first Elvis patches.
 I'd love to hear some comments about whether this patch needs to be part 
 of the first submission.

 Any other feedback on this plan will be appreciated,
 Thank you,
 Razya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Abel Gordon


Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM:


 Razya Ladelsky ra...@il.ibm.com writes:

  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
  developed Elvis, presented by Abel Gordon at the last KVM forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum, upstreaming
  some of the Elvis approaches seems to be a good idea, which we would
like
  to pursue.
 
  Our plan for the first patches is the following:
 
  1.Shared vhost thread between mutiple devices
  This patch creates a worker thread and worker queue shared across
multiple
  virtio devices
  We would like to modify the patch posted in
  https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  to limit a vhost thread to serve multiple devices only if they belong
to
  the same VM as Paolo suggested to avoid isolation or cgroups concerns.
 
  Another modification is related to the creation and removal of vhost
  threads, which will be discussed next.

 I think this is an exceptionally bad idea.

 We shouldn't throw away isolation without exhausting every other
 possibility.

Seems you have missed the important details here.
Anthony, we are aware you are concerned about isolation
and you believe we should not share a single vhost thread across
multiple VMs.  That's why Razya proposed to change the patch
so we will serve multiple virtio devices using a single vhost thread
only if the devices belong to the same VM. This series of patches
will not allow two different VMs to share the same vhost thread.
So, I don't see why this will be throwing away isolation and why
this could be a exceptionally bad idea.

By the way, I remember that during the KVM forum a similar
approach of having a single data plane thread for many devices
was discussed

 We've seen very positive results from adding threads.  We should also
 look at scheduling.

...and we have also seen exceptionally negative results from
adding threads, both for vhost and data-plane. If you have lot of idle
time/cores
then it makes sense to run multiple threads. But IMHO in many scenarios you
don't have lot of idle time/cores.. and if you have them you would probably
prefer to run more VMs/VCPUshosting a single SMP VM when you have
enough physical cores to run all the VCPU threads and the I/O threads is
not a
realistic scenario.

That's why we are proposing to implement a mechanism that will enable
the management stack to configure 1 thread per I/O device (as it is today)
or 1 thread for many I/O devices (belonging to the same VM).

 Once you are scheduling multiple guests in a single vhost device, you
 now create a whole new class of DoS attacks in the best case scenario.

Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices belonging
to the same guest in a single (or multiple) vhost thread/s.


  2. Sysfs mechanism to add and remove vhost threads
  This patch allows us to add and remove vhost threads dynamically.
 
  A simpler way to control the creation of vhost threads is statically
  determining the maximum number of virtio devices per worker via a
kernel
  module parameter (which is the way the previously mentioned patch is
  currently implemented)
 
  I'd like to ask for advice here about the more preferable way to go:
  Although having the sysfs mechanism provides more flexibility, it may
be a
  good idea to start with a simple static parameter, and have the first
  patches as simple as possible. What do you think?
 
  3.Add virtqueue polling mode to vhost
  Have the vhost thread poll the virtqueues with high I/O rate for new
  buffers , and avoid asking the guest to kick us.
  https://github.com/abelg/virtual_io_acceleration/commit/
 26616133fafb7855cc80fac070b0572fd1aaf5d0

 Ack on this.

:)

Regards,
Abel.


 Regards,

 Anthony Liguori

  4. vhost statistics
  This patch introduces a set of statistics to monitor different
performance
  metrics of vhost and our polling and I/O scheduling mechanisms. The
  statistics are exposed using debugfs and can be easily displayed with a

  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
 
  5. Add heuristics to improve I/O scheduling
  This patch enhances the round-robin mechanism with a set of heuristics
to
  decide when to leave a virtqueue and proceed to the next.
  https://github.com/abelg/virtual_io_acceleration/commit/
 f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
 
  This patch improves the handling of the requests by the vhost thread,
but
  could perhaps be delayed to a
  later time , and not submitted as one of the first Elvis patches.
  I'd love to hear some comments about whether this patch needs to be

Re: Elvis upstreaming plan

2013-11-26 Thread Michael S. Tsirkin
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
 
 
 Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00 PM:
 
 
  Razya Ladelsky ra...@il.ibm.com writes:
 
   Hi all,
  
   I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
   developed Elvis, presented by Abel Gordon at the last KVM forum:
   ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
   ELVIS slides:
 https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
  
  
   According to the discussions that took place at the forum, upstreaming
   some of the Elvis approaches seems to be a good idea, which we would
 like
   to pursue.
  
   Our plan for the first patches is the following:
  
   1.Shared vhost thread between mutiple devices
   This patch creates a worker thread and worker queue shared across
 multiple
   virtio devices
   We would like to modify the patch posted in
   https://github.com/abelg/virtual_io_acceleration/commit/
  3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
   to limit a vhost thread to serve multiple devices only if they belong
 to
   the same VM as Paolo suggested to avoid isolation or cgroups concerns.
  
   Another modification is related to the creation and removal of vhost
   threads, which will be discussed next.
 
  I think this is an exceptionally bad idea.
 
  We shouldn't throw away isolation without exhausting every other
  possibility.
 
 Seems you have missed the important details here.
 Anthony, we are aware you are concerned about isolation
 and you believe we should not share a single vhost thread across
 multiple VMs.  That's why Razya proposed to change the patch
 so we will serve multiple virtio devices using a single vhost thread
 only if the devices belong to the same VM. This series of patches
 will not allow two different VMs to share the same vhost thread.
 So, I don't see why this will be throwing away isolation and why
 this could be a exceptionally bad idea.
 
 By the way, I remember that during the KVM forum a similar
 approach of having a single data plane thread for many devices
 was discussed
  We've seen very positive results from adding threads.  We should also
  look at scheduling.
 
 ...and we have also seen exceptionally negative results from
 adding threads, both for vhost and data-plane. If you have lot of idle
 time/cores
 then it makes sense to run multiple threads. But IMHO in many scenarios you
 don't have lot of idle time/cores.. and if you have them you would probably
 prefer to run more VMs/VCPUshosting a single SMP VM when you have
 enough physical cores to run all the VCPU threads and the I/O threads is
 not a
 realistic scenario.
 
 That's why we are proposing to implement a mechanism that will enable
 the management stack to configure 1 thread per I/O device (as it is today)
 or 1 thread for many I/O devices (belonging to the same VM).
 
  Once you are scheduling multiple guests in a single vhost device, you
  now create a whole new class of DoS attacks in the best case scenario.
 
 Again, we are NOT proposing to schedule multiple guests in a single
 vhost thread. We are proposing to schedule multiple devices belonging
 to the same guest in a single (or multiple) vhost thread/s.
 

I guess a question then becomes why have multiple devices?


 
   2. Sysfs mechanism to add and remove vhost threads
   This patch allows us to add and remove vhost threads dynamically.
  
   A simpler way to control the creation of vhost threads is statically
   determining the maximum number of virtio devices per worker via a
 kernel
   module parameter (which is the way the previously mentioned patch is
   currently implemented)
  
   I'd like to ask for advice here about the more preferable way to go:
   Although having the sysfs mechanism provides more flexibility, it may
 be a
   good idea to start with a simple static parameter, and have the first
   patches as simple as possible. What do you think?
  
   3.Add virtqueue polling mode to vhost
   Have the vhost thread poll the virtqueues with high I/O rate for new
   buffers , and avoid asking the guest to kick us.
   https://github.com/abelg/virtual_io_acceleration/commit/
  26616133fafb7855cc80fac070b0572fd1aaf5d0
 
  Ack on this.
 
 :)
 
 Regards,
 Abel.
 
 
  Regards,
 
  Anthony Liguori
 
   4. vhost statistics
   This patch introduces a set of statistics to monitor different
 performance
   metrics of vhost and our polling and I/O scheduling mechanisms. The
   statistics are exposed using debugfs and can be easily displayed with a
 
   Python script (vhost_stat, based on the old kvm_stats)
   https://github.com/abelg/virtual_io_acceleration/commit/
  ac14206ea56939ecc3608dc5f978b86fa322e7b0
  
  
   5. Add heuristics to improve I/O scheduling
   This patch enhances the round-robin mechanism with a set of heuristics
 to
   decide when to leave a virtqueue and proceed to the next.
   https://github.com/abelg/virtual_io_acceleration/commit/
  

Re: Elvis upstreaming plan

2013-11-26 Thread Bandan Das
Razya Ladelsky ra...@il.ibm.com writes:

 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
 developed Elvis, presented by Abel Gordon at the last KVM forum: 
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
 ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 


 According to the discussions that took place at the forum, upstreaming 
 some of the Elvis approaches seems to be a good idea, which we would like 
 to pursue.

 Our plan for the first patches is the following: 

 1.Shared vhost thread between mutiple devices 
 This patch creates a worker thread and worker queue shared across multiple 
 virtio devices 
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  
 to limit a vhost thread to serve multiple devices only if they belong to 
 the same VM as Paolo suggested to avoid isolation or cgroups concerns.

 Another modification is related to the creation and removal of vhost 
 threads, which will be discussed next.

 2. Sysfs mechanism to add and remove vhost threads 
 This patch allows us to add and remove vhost threads dynamically.

 A simpler way to control the creation of vhost threads is statically 
 determining the maximum number of virtio devices per worker via a kernel 
 module parameter (which is the way the previously mentioned patch is 
 currently implemented)

Does the sysfs interface aim to let the _user_ control the maximum number of 
devices per vhost thread or/and let the user create and  destroy 
worker threads at will ?

Setting the limit on the number of devices makes sense but I am not sure
if there is any reason to actually expose an interface to create or destroy 
workers. Also, it might be worthwhile to think if it's better to just let 
the worker thread stay around (hoping it might be used again in 
the future) rather then destroying it..

 I'd like to ask for advice here about the more preferable way to go:
 Although having the sysfs mechanism provides more flexibility, it may be a 
 good idea to start with a simple static parameter, and have the first 
 patches as simple as possible. What do you think?

I am actually inclined more towards a static limit. I think that in a 
typical setup, the user will set this for his/her environment just once 
at load time and forget about it.

Bandan

 3.Add virtqueue polling mode to vhost 
 Have the vhost thread poll the virtqueues with high I/O rate for new 
 buffers , and avoid asking the guest to kick us.
 https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

 4. vhost statistics
 This patch introduces a set of statistics to monitor different performance 
 metrics of vhost and our polling and I/O scheduling mechanisms. The 
 statistics are exposed using debugfs and can be easily displayed with a 
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0


 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

 This patch improves the handling of the requests by the vhost thread, but 
 could perhaps be delayed to a 
 later time , and not submitted as one of the first Elvis patches.
 I'd love to hear some comments about whether this patch needs to be part 
 of the first submission.

 Any other feedback on this plan will be appreciated,
 Thank you,
 Razya

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Jason Wang
On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
 developed Elvis, presented by Abel Gordon at the last KVM forum: 
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
 ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 


 According to the discussions that took place at the forum, upstreaming 
 some of the Elvis approaches seems to be a good idea, which we would like 
 to pursue.

 Our plan for the first patches is the following: 

 1.Shared vhost thread between mutiple devices 
 This patch creates a worker thread and worker queue shared across multiple 
 virtio devices 
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  
 to limit a vhost thread to serve multiple devices only if they belong to 
 the same VM as Paolo suggested to avoid isolation or cgroups concerns.

 Another modification is related to the creation and removal of vhost 
 threads, which will be discussed next.

 2. Sysfs mechanism to add and remove vhost threads 
 This patch allows us to add and remove vhost threads dynamically.

 A simpler way to control the creation of vhost threads is statically 
 determining the maximum number of virtio devices per worker via a kernel 
 module parameter (which is the way the previously mentioned patch is 
 currently implemented)

Any chance we can re-use the cwmq instead of inventing another
mechanism? Looks like there're lots of function duplication here. Bandan
has an RFC to do this.

 I'd like to ask for advice here about the more preferable way to go:
 Although having the sysfs mechanism provides more flexibility, it may be a 
 good idea to start with a simple static parameter, and have the first 
 patches as simple as possible. What do you think?

 3.Add virtqueue polling mode to vhost 
 Have the vhost thread poll the virtqueues with high I/O rate for new 
 buffers , and avoid asking the guest to kick us.
 https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Maybe we can make poll_stop_idle adaptive which may help the light load
case. Consider guest is often slow than vhost, if we just have one or
two vms, polling too much may waste cpu in this case.
 4. vhost statistics
 This patch introduces a set of statistics to monitor different performance 
 metrics of vhost and our polling and I/O scheduling mechanisms. The 
 statistics are exposed using debugfs and can be easily displayed with a 
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0

How about using trace points instead? Besides statistics, it can also
help more in debugging.

 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

 This patch improves the handling of the requests by the vhost thread, but 
 could perhaps be delayed to a 
 later time , and not submitted as one of the first Elvis patches.
 I'd love to hear some comments about whether this patch needs to be part 
 of the first submission.

 Any other feedback on this plan will be appreciated,
 Thank you,
 Razya

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
  4. vhost statistics
  This patch introduces a set of statistics to monitor different performance 
  metrics of vhost and our polling and I/O scheduling mechanisms. The 
  statistics are exposed using debugfs and can be easily displayed with a 
  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
 How about using trace points instead? Besides statistics, it can also
 help more in debugging.
Definitely. kvm_stats has moved to ftrace long time ago.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Joel Nider
Hi,

Razya is out for a few days, so I will try to answer the questions as well
as I can:

Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
 as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
 IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
 IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
 Haifa/IBM@IBMIL
 Date: 27/11/2013 01:08 AM
 Subject: Re: Elvis upstreaming plan

 On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
 
 
  Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013 08:05:00
PM:
 
  
   Razya Ladelsky ra...@il.ibm.com writes:
  
edit
 
  That's why we are proposing to implement a mechanism that will enable
  the management stack to configure 1 thread per I/O device (as it is
today)
  or 1 thread for many I/O devices (belonging to the same VM).
 
   Once you are scheduling multiple guests in a single vhost device, you
   now create a whole new class of DoS attacks in the best case
scenario.
 
  Again, we are NOT proposing to schedule multiple guests in a single
  vhost thread. We are proposing to schedule multiple devices belonging
  to the same guest in a single (or multiple) vhost thread/s.
 

 I guess a question then becomes why have multiple devices?

If you mean why serve multiple devices from a single thread the answer is
that we cannot rely on the Linux scheduler which has no knowledge of I/O
queues to do a decent job of scheduling I/O.  The idea is to take over the
I/O scheduling responsibilities from the kernel's thread scheduler with a
more efficient I/O scheduler inside each vhost thread.  So by combining all
of the I/O devices from the same guest (disks, network cards, etc) in a
single I/O thread, it allows us to provide better scheduling by giving us
more knowledge of the nature of the work.  So now instead of relying on the
linux scheduler to perform context switches between multiple vhost threads,
we have a single thread context in which we can do the I/O scheduling more
efficiently.  We can closely monitor the performance needs of each queue of
each device inside the vhost thread which gives us much more information
than relying on the kernel's thread scheduler.
This does not expose any additional opportunities for attacks (DoS or
other) than are already available since all of the I/O traffic belongs to a
single guest.
You can make the argument that with low I/O loads this mechanism may not
make much difference.  However when you try to maximize the utilization of
your hardware (such as in a commercial scenario) this technique can gain
you a large benefit.

Regards,

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab






 Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image moved 
to file: 
 E-mail: jo...@il.ibm.com  
pic31578.gif)IBM 








Hi all,
   
I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
developed Elvis, presented by Abel Gordon at the last KVM forum:
ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
ELVIS slides:
  https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
   
   
According to the discussions that took place at the forum,
upstreaming
some of the Elvis approaches seems to be a good idea, which we
would
  like
to pursue.
   
Our plan for the first patches is the following:
   
1.Shared vhost thread between mutiple devices
This patch creates a worker thread and worker queue shared across
  multiple
virtio devices
We would like to modify the patch posted in
https://github.com/abelg/virtual_io_acceleration/commit/
   3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
to limit a vhost thread to serve multiple devices only if they
belong
  to
the same VM as Paolo suggested to avoid isolation or cgroups
concerns.
   
Another modification is related to the creation and removal of
vhost
threads, which will be discussed next.
  
   I think this is an exceptionally bad idea.
  
   We shouldn't throw away isolation without exhausting every other
   possibility.
 
  Seems you have missed the important details here.
  Anthony, we are aware you are concerned about isolation
  and you believe we should not share a single vhost thread across
  multiple VMs.  That's why Razya proposed to change the patch
  so we

Re: Elvis upstreaming plan

2013-11-26 Thread Joel Nider


Gleb Natapov g...@redhat.com wrote on 27/11/2013 09:35:01 AM:

 From: Gleb Natapov g...@redhat.com
 To: Jason Wang jasow...@redhat.com,
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org,
 anth...@codemonkey.ws, Michael S. Tsirkin m...@redhat.com,
 pbonz...@redhat.com, as...@redhat.com, digitale...@google.com,
 abel.gor...@gmail.com, Abel Gordon/Haifa/IBM@IBMIL, Eran Raichstein/
 Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, b...@redhat.com
 Date: 27/11/2013 11:35 AM
 Subject: Re: Elvis upstreaming plan

 On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
   4. vhost statistics
   This patch introduces a set of statistics to monitor different
 performance
   metrics of vhost and our polling and I/O scheduling mechanisms. The
   statistics are exposed using debugfs and can be easily displayed with
a
   Python script (vhost_stat, based on the old kvm_stats)
   https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
  How about using trace points instead? Besides statistics, it can also
  help more in debugging.
 Definitely. kvm_stats has moved to ftrace long time ago.

 --
  Gleb.


Ok - we will look at this newer mechanism.

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab






 Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image moved 
to file: 
 E-mail: jo...@il.ibm.com  
pic56195.gif)IBM 





attachment: pic56195.gif

Re: Elvis upstreaming plan

2013-11-25 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 24/11/2013 12:26:15 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: kvm@vger.kernel.org, anth...@codemonkey.ws, g...@redhat.com, 
 pbonz...@redhat.com, as...@redhat.com, jasow...@redhat.com, 
 digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/
 IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL
 Date: 24/11/2013 12:22 PM
 Subject: Re: Elvis upstreaming plan
 
 On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
  Hi all,
  
  I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
  developed Elvis, presented by Abel Gordon at the last KVM forum: 
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
  ELVIS slides: 
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
  
  
  According to the discussions that took place at the forum, upstreaming 

  some of the Elvis approaches seems to be a good idea, which we would 
like 
  to pursue.
  
  Our plan for the first patches is the following: 
  
  1.Shared vhost thread between mutiple devices 
  This patch creates a worker thread and worker queue shared across 
multiple 
  virtio devices 
  We would like to modify the patch posted in
  https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
  to limit a vhost thread to serve multiple devices only if they belong 
to 
  the same VM as Paolo suggested to avoid isolation or cgroups concerns.
  
  Another modification is related to the creation and removal of vhost 
  threads, which will be discussed next.
 
  2. Sysfs mechanism to add and remove vhost threads 
  This patch allows us to add and remove vhost threads dynamically.
  
  A simpler way to control the creation of vhost threads is statically 
  determining the maximum number of virtio devices per worker via a 
kernel 
  module parameter (which is the way the previously mentioned patch is 
  currently implemented)
  
  I'd like to ask for advice here about the more preferable way to go:
  Although having the sysfs mechanism provides more flexibility, it may 
be a 
  good idea to start with a simple static parameter, and have the first 
  patches as simple as possible. What do you think?
  
  3.Add virtqueue polling mode to vhost 
  Have the vhost thread poll the virtqueues with high I/O rate for new 
  buffers , and avoid asking the guest to kick us.
  https://github.com/abelg/virtual_io_acceleration/commit/
 26616133fafb7855cc80fac070b0572fd1aaf5d0
  
  4. vhost statistics
  This patch introduces a set of statistics to monitor different 
performance 
  metrics of vhost and our polling and I/O scheduling mechanisms. The 
  statistics are exposed using debugfs and can be easily displayed with 
a 
  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0
  
  
  5. Add heuristics to improve I/O scheduling 
  This patch enhances the round-robin mechanism with a set of heuristics 
to 
  decide when to leave a virtqueue and proceed to the next.
  https://github.com/abelg/virtual_io_acceleration/commit/
 f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
  
  This patch improves the handling of the requests by the vhost thread, 
but 
  could perhaps be delayed to a 
  later time , and not submitted as one of the first Elvis patches.
  I'd love to hear some comments about whether this patch needs to be 
part 
  of the first submission.
  
  Any other feedback on this plan will be appreciated,
  Thank you,
  Razya
 
 
 How about we start with the stats patch?
 This will make it easier to evaluate the other patches.
 

Hi Michael,
Thank you for your quick reply.
Our plan was to send all these patches that contain the Elvis code.
We can start with the stats patch, however, many of the statistics there 
are related to the features that the other patches provide...
B.T.W. If you a chance to look at the rest of the patches,
I'd really appreciate your comments,
Thank you very much,
Razya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-24 Thread Michael S. Tsirkin
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
 Hi all,
 
 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
 developed Elvis, presented by Abel Gordon at the last KVM forum: 
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
 ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
 
 
 According to the discussions that took place at the forum, upstreaming 
 some of the Elvis approaches seems to be a good idea, which we would like 
 to pursue.
 
 Our plan for the first patches is the following: 
 
 1.Shared vhost thread between mutiple devices 
 This patch creates a worker thread and worker queue shared across multiple 
 virtio devices 
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  
 to limit a vhost thread to serve multiple devices only if they belong to 
 the same VM as Paolo suggested to avoid isolation or cgroups concerns.
 
 Another modification is related to the creation and removal of vhost 
 threads, which will be discussed next.

 2. Sysfs mechanism to add and remove vhost threads 
 This patch allows us to add and remove vhost threads dynamically.
 
 A simpler way to control the creation of vhost threads is statically 
 determining the maximum number of virtio devices per worker via a kernel 
 module parameter (which is the way the previously mentioned patch is 
 currently implemented)
 
 I'd like to ask for advice here about the more preferable way to go:
 Although having the sysfs mechanism provides more flexibility, it may be a 
 good idea to start with a simple static parameter, and have the first 
 patches as simple as possible. What do you think?
 
 3.Add virtqueue polling mode to vhost 
 Have the vhost thread poll the virtqueues with high I/O rate for new 
 buffers , and avoid asking the guest to kick us.
 https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0
 
 4. vhost statistics
 This patch introduces a set of statistics to monitor different performance 
 metrics of vhost and our polling and I/O scheduling mechanisms. The 
 statistics are exposed using debugfs and can be easily displayed with a 
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
 
 
 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
 
 This patch improves the handling of the requests by the vhost thread, but 
 could perhaps be delayed to a 
 later time , and not submitted as one of the first Elvis patches.
 I'd love to hear some comments about whether this patch needs to be part 
 of the first submission.
 
 Any other feedback on this plan will be appreciated,
 Thank you,
 Razya


How about we start with the stats patch?
This will make it easier to evaluate the other patches.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html