Re: Elvis upstreaming plan

2013-12-02 Thread Stefan Hajnoczi
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
> 
> 
> Stefan Hajnoczi  wrote on 27/11/2013 05:00:53 PM:
> 
> > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > Hi,
> > >
> > > Razya is out for a few days, so I will try to answer the questions as
> well
> > > as I can:
> > >
> > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > From: "Michael S. Tsirkin" 
> > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > > Haifa/IBM@IBMIL
> > > > Date: 27/11/2013 01:08 AM
> > > > Subject: Re: Elvis upstreaming plan
> > > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori  wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky  writes:
> > > > > >
> > > 
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > If you mean "why serve multiple devices from a single thread" the
> answer is
> > > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
> > > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
> > > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
> > > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
> > > of the I/O devices from the same guest (disks, network cards, etc) in a
> > > single I/O thread, it allows us to provide better scheduling by giving
> us
> > > more knowledge of the nature of the work.  So now instead of relying on
> the
> > > linux scheduler to perform context switches between multiple vhost
> threads,
> > > we have a single thread context in which we can do the I/O scheduling
> more
> > > efficiently.  We can closely monitor the performance needs of each
> queue of
> > > each device inside the vhost thread which gives us much more
> information
> > > than relying on the kernel's thread scheduler.
> >
> > And now there are 2 performance-critical pieces that need to be
> > optimized/tuned instead of just 1:
> >
> > 1. Kernel infrastructure that QEMU and vhost use today but you decided
> > to bypass.
> 
> We are NOT bypassing existing components. We are just changing the
> threading
> model: instead of having one vhost-thread per virtio device, we propose to
> use
> 1 vhost thread to server devices belonging to the same VM. In addition, we
> propose to add new features such as polling.

What I meant with "bypassing" is that reducing scope to single VMs
leaves multi-VM performance unchanged.  I know the original aim was to
improve multi-VM performance too and I hope that will be possible by
extending the current approach.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-28 Thread Michael S. Tsirkin
On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
> Isolation is important but the question is what isolation means ?

Mostly two things:
- Count resource usage against the correct cgroups,
  and limit it as appropriate
- If one user does something silly and is blocked,
  another user isn't affected


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-28 Thread Abel Gordon


Anthony Liguori  wrote on 28/11/2013 12:33:36 AM:

> From: Anthony Liguori 
> To: Abel Gordon/Haifa/IBM@IBMIL, "Michael S. Tsirkin" ,
> Cc: abel.gor...@gmail.com, as...@redhat.com, digitale...@google.com,
> Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com,
> jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL,
> kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL
> Date: 28/11/2013 12:33 AM
> Subject: Re: Elvis upstreaming plan
>
> Abel Gordon  writes:
>
> > "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
> >
> >>
> >> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> >> > Hi,
> >> >
> >> > Razya is out for a few days, so I will try to answer the questions
as
> > well
> >> > as I can:
> >> >
> >> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57
PM:
> >> >
> >> > > From: "Michael S. Tsirkin" 
> >> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> >> > > Cc: Anthony Liguori ,
abel.gor...@gmail.com,
> >> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> >> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> >> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
Ladelsky/
> >> > > Haifa/IBM@IBMIL
> >> > > Date: 27/11/2013 01:08 AM
> >> > > Subject: Re: Elvis upstreaming plan
> >> > >
> >> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >> > > >
> >> > > >
> >> > > > Anthony Liguori  wrote on 26/11/2013
> > 08:05:00
> >> > PM:
> >> > > >
> >> > > > >
> >> > > > > Razya Ladelsky  writes:
> >> > > > >
> >> > 
> >> > > >
> >> > > > That's why we are proposing to implement a mechanism that will
> > enable
> >> > > > the management stack to configure 1 thread per I/O device (as it
is
> >> > today)
> >> > > > or 1 thread for many I/O devices (belonging to the same VM).
> >> > > >
> >> > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> >> > > > > now create a whole new class of DoS attacks in the best case
> >> > scenario.
> >> > > >
> >> > > > Again, we are NOT proposing to schedule multiple guests in a
single
> >> > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> >> > > > to the same guest in a single (or multiple) vhost thread/s.
> >> > > >
> >> > >
> >> > > I guess a question then becomes why have multiple devices?
> >> >
> >> > If you mean "why serve multiple devices from a single thread" the
> > answer is
> >> > that we cannot rely on the Linux scheduler which has no knowledge of
> > I/O
> >> > queues to do a decent job of scheduling I/O.  The idea is to take
over
> > the
> >> > I/O scheduling responsibilities from the kernel's thread scheduler
with
> > a
> >> > more efficient I/O scheduler inside each vhost thread.  So by
combining
> > all
> >> > of the I/O devices from the same guest (disks, network cards, etc)
in a
> >> > single I/O thread, it allows us to provide better scheduling by
giving
> > us
> >> > more knowledge of the nature of the work.  So now instead of relying
on
> > the
> >> > linux scheduler to perform context switches between multiple vhost
> > threads,
> >> > we have a single thread context in which we can do the I/O
scheduling
> > more
> >> > efficiently.  We can closely monitor the performance needs of each
> > queue of
> >> > each device inside the vhost thread which gives us much more
> > information
> >> > than relying on the kernel's thread scheduler.
> >> > This does not expose any additional opportunities for attacks (DoS
or
> >> > other) than are already available since all of the I/O traffic
belongs
> > to a
> >> > single guest.
> >> > You can make the argument that with low I/O loads this mechanism may
> > not
> >> > make much difference.  However when you try to maximize the
utilization
> > of
> >> > your hardware (such as in a commercial scenario) this technique can
>

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Stefan Hajnoczi  wrote on 27/11/2013 05:00:53 PM:

> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" 
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori  wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky  writes:
> > > > >
> > 
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
>
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
>
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.

We are NOT bypassing existing components. We are just changing the
threading
model: instead of having one vhost-thread per virtio device, we propose to
use
1 vhost thread to server devices belonging to the same VM. In addition, we
propose to add new features such as polling.

> 2. The new ELVIS code which only affects vhost devices in the same VM.

Also existent vhost code (or any other user-space back-end) should be
optimized/tuned if you care about performance.

>
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.

Isolation is important but the question is what isolation means ?
I personally don't believe that 2 kernel threads provide more
isolation than 1 kernel threat that changes the mm (use_mm) and
avoids queue starvation.
Anyway, we propose to start with the simple approach (not sharing
threads across VMs) but once we show the value for this case we
can discuss if it makes sense to extend the approach and share
threads between different VMs.


> Isn't the sane thing to do taking lessons from ELVIS improving existing
> pieces instead of bypassing them?  That way both the single VM and
> host-wide performance improves.  And as a bonus non-virtualization use
> cases may also benefit.

The model we are proposing are specific for I/O virtualization... not sure
if they are applicable to bare-metal.

>
> Stefan
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Joel Nider

Stefan Hajnoczi  wrote on 27/11/2013 05:00:53 PM:

> From: Stefan Hajnoczi 
> To: Joel Nider/Haifa/IBM@IBMIL,
> Cc: "Michael S. Tsirkin" , Abel Gordon/Haifa/
> IBM@IBMIL, abel.gor...@gmail.com, Anthony Liguori
> , as...@redhat.com, digitale...@google.com,
> Eran Raichstein/Haifa/IBM@IBMIL, g...@redhat.com,
> jasow...@redhat.com, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
> Ladelsky/Haifa/IBM@IBMIL
> Date: 27/11/2013 05:00 PM
> Subject: Re: Elvis upstreaming plan
>
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" 
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori  wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky  writes:
> > > > >
> > 
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
>
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
>
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.
> 2. The new ELVIS code which only affects vhost devices in the same VM.
>
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.

Yes you are correct that there are now 2 performance-critical pieces of
code.  However what we are proposing is just proper module decoupling.  I
believe you will be hard pressed to make a good case that all of this logic
could be integrated into the Linux thread scheduler more efficiently.
Think of this as an I/O scheduler for virtualized guests.  I don't believe
anyone would try to integrate the Linux I/O schedulers with the Linux
thread scheduler, even though they are both performance-critical modules?
Even if we were to take the route of using these principles to improve the
existing scheduler, I have to ask: which scheduler?  If we spend this
effort on CFS (completely fair scheduler) but then someone switches their
thread scheduler to O(1) or some other scheduler, all of our advantage
would be lost.  We would then have to 

Re: Elvis upstreaming plan

2013-11-27 Thread Anthony Liguori
Abel Gordon  writes:

> "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
>
>>
>> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
>> > Hi,
>> >
>> > Razya is out for a few days, so I will try to answer the questions as
> well
>> > as I can:
>> >
>> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
>> >
>> > > From: "Michael S. Tsirkin" 
>> > > To: Abel Gordon/Haifa/IBM@IBMIL,
>> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
>> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
>> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
>> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
>> > > Haifa/IBM@IBMIL
>> > > Date: 27/11/2013 01:08 AM
>> > > Subject: Re: Elvis upstreaming plan
>> > >
>> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
>> > > >
>> > > >
>> > > > Anthony Liguori  wrote on 26/11/2013
> 08:05:00
>> > PM:
>> > > >
>> > > > >
>> > > > > Razya Ladelsky  writes:
>> > > > >
>> > 
>> > > >
>> > > > That's why we are proposing to implement a mechanism that will
> enable
>> > > > the management stack to configure 1 thread per I/O device (as it is
>> > today)
>> > > > or 1 thread for many I/O devices (belonging to the same VM).
>> > > >
>> > > > > Once you are scheduling multiple guests in a single vhost device,
> you
>> > > > > now create a whole new class of DoS attacks in the best case
>> > scenario.
>> > > >
>> > > > Again, we are NOT proposing to schedule multiple guests in a single
>> > > > vhost thread. We are proposing to schedule multiple devices
> belonging
>> > > > to the same guest in a single (or multiple) vhost thread/s.
>> > > >
>> > >
>> > > I guess a question then becomes why have multiple devices?
>> >
>> > If you mean "why serve multiple devices from a single thread" the
> answer is
>> > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
>> > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
>> > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
>> > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
>> > of the I/O devices from the same guest (disks, network cards, etc) in a
>> > single I/O thread, it allows us to provide better scheduling by giving
> us
>> > more knowledge of the nature of the work.  So now instead of relying on
> the
>> > linux scheduler to perform context switches between multiple vhost
> threads,
>> > we have a single thread context in which we can do the I/O scheduling
> more
>> > efficiently.  We can closely monitor the performance needs of each
> queue of
>> > each device inside the vhost thread which gives us much more
> information
>> > than relying on the kernel's thread scheduler.
>> > This does not expose any additional opportunities for attacks (DoS or
>> > other) than are already available since all of the I/O traffic belongs
> to a
>> > single guest.
>> > You can make the argument that with low I/O loads this mechanism may
> not
>> > make much difference.  However when you try to maximize the utilization
> of
>> > your hardware (such as in a commercial scenario) this technique can
> gain
>> > you a large benefit.
>> >
>> > Regards,
>> >
>> > Joel Nider
>> > Virtualization Research
>> > IBM Research and Development
>> > Haifa Research Lab
>>
>> So all this would sound more convincing if we had sharing between VMs.
>> When it's only a single VM it's somehow less convincing, isn't it?
>> Of course if we would bypass a scheduler like this it becomes harder to
>> enforce cgroup limits.
>
> True, but here the issue becomes isolation/cgroups. We can start to show
> the value for VMs that have multiple devices / queues and then we could
> re-consider extending the mechanism for multiple VMs (at least as a
> experimental feature).
>
>> But it might be easier to give scheduler the info it needs to do what we
>> need.  Would an API that basically says "r

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 04:00:53PM +0100, Stefan Hajnoczi wrote:
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> > 
> > Razya is out for a few days, so I will try to answer the questions as well
> > as I can:
> > 
> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> > 
> > > From: "Michael S. Tsirkin" 
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori  wrote on 26/11/2013 08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky  writes:
> > > > >
> > 
> > > >
> > > > That's why we are proposing to implement a mechanism that will enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> > 
> > If you mean "why serve multiple devices from a single thread" the answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over the
> > I/O scheduling responsibilities from the kernel's thread scheduler with a
> > more efficient I/O scheduler inside each vhost thread.  So by combining all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving us
> > more knowledge of the nature of the work.  So now instead of relying on the
> > linux scheduler to perform context switches between multiple vhost threads,
> > we have a single thread context in which we can do the I/O scheduling more
> > efficiently.  We can closely monitor the performance needs of each queue of
> > each device inside the vhost thread which gives us much more information
> > than relying on the kernel's thread scheduler.
> 
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
> 
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.
> 2. The new ELVIS code which only affects vhost devices in the same VM.
> 
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.
>
> Isn't the sane thing to do taking lessons from ELVIS improving existing
> pieces instead of bypassing them?  That way both the single VM and
> host-wide performance improves.  And as a bonus non-virtualization use
> cases may also benefit.
> 
> Stefan

I'm not sure about that. elvis is all about specific behaviour
patterns that are virtualization specific, and general claims
that we can improve scheduler for all workloads seem somewhat
optimistic.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Stefan Hajnoczi
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> Hi,
> 
> Razya is out for a few days, so I will try to answer the questions as well
> as I can:
> 
> "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> 
> > From: "Michael S. Tsirkin" 
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > Haifa/IBM@IBMIL
> > Date: 27/11/2013 01:08 AM
> > Subject: Re: Elvis upstreaming plan
> >
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori  wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky  writes:
> > > >
> 
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> If you mean "why serve multiple devices from a single thread" the answer is
> that we cannot rely on the Linux scheduler which has no knowledge of I/O
> queues to do a decent job of scheduling I/O.  The idea is to take over the
> I/O scheduling responsibilities from the kernel's thread scheduler with a
> more efficient I/O scheduler inside each vhost thread.  So by combining all
> of the I/O devices from the same guest (disks, network cards, etc) in a
> single I/O thread, it allows us to provide better scheduling by giving us
> more knowledge of the nature of the work.  So now instead of relying on the
> linux scheduler to perform context switches between multiple vhost threads,
> we have a single thread context in which we can do the I/O scheduling more
> efficiently.  We can closely monitor the performance needs of each queue of
> each device inside the vhost thread which gives us much more information
> than relying on the kernel's thread scheduler.

And now there are 2 performance-critical pieces that need to be
optimized/tuned instead of just 1:

1. Kernel infrastructure that QEMU and vhost use today but you decided
to bypass.
2. The new ELVIS code which only affects vhost devices in the same VM.

If you split the code paths it results in more effort in the long run
and the benefit seems quite limited once you acknowledge that isolation
is important.

Isn't the sane thing to do taking lessons from ELVIS improving existing
pieces instead of bypassing them?  That way both the single VM and
host-wide performance improves.  And as a bonus non-virtualization use
cases may also benefit.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 01:05:40PM +0200, Abel Gordon wrote:
> > > (CCing Eyal Moscovici who is actually prototyping with multiple
> > > policies and may want to join this thread)
> > >
> > > Starting with basic policies: we can use a single vhost thread
> > > and create new vhost threads if it becomes saturated and there
> > > are enough cpu cycles available in the system
> > > or if the latency (how long the requests in the virtio queues wait
> > > until they are handled) is too high.
> > > We can merge threads if the latency is already low or if the threads
> > > are not saturated.
> > >
> > > There is a hidden trade-off here: when you run more vhost threads you
> > > may actually be stealing cpu cycles from the vcpu threads and also
> > > increasing context switches. So, from the vhost perspective it may
> > > improve performance but from the vcpu threads perspective it may
> > > degrade performance.
> >
> > So this is a very interesting problem to solve but what does
> > management know that suggests it can solve it better?
> 
> Yep, and Eyal is currently working on this.
> What the management knows ? depends who the management is :)
> Could be just I/O activity (black-box: I/O request rate, I/O
> handling rate, latency)

We know much more about this than managament, don't we?

> or application performance (white-box).

This would have to come with a proposal for getting
this white-box info out of guest somehow.

-- 
MSR
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin"  wrote on 27/11/2013 12:59:38 PM:
> 
> 
> > On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > > > Hi,
> > > > >
> > > > > Razya is out for a few days, so I will try to answer the questions
> as
> > > well
> > > > > as I can:
> > > > >
> > > > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > From: "Michael S. Tsirkin" 
> > > > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > > > Cc: Anthony Liguori ,
> abel.gor...@gmail.com,
> > > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel
> Nider/Haifa/
> > > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
> Ladelsky/
> > > > > > Haifa/IBM@IBMIL
> > > > > > Date: 27/11/2013 01:08 AM
> > > > > > Subject: Re: Elvis upstreaming plan
> > > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori  wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky  writes:
> > > > > > > >
> > > > > 
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question then becomes why have multiple devices?
> > > > >
> > > > > If you mean "why serve multiple devices from a single thread" the
> > > answer is
> > > > > that we cannot rely on the Linux scheduler which has no knowledge
> of
> > > I/O
> > > > > queues to do a decent job of scheduling I/O.  The idea is to take
> over
> > > the
> > > > > I/O scheduling responsibilities from the kernel's thread scheduler
> with
> > > a
> > > > > more efficient I/O scheduler inside each vhost thread.  So by
> combining
> > > all
> > > > > of the I/O devices from the same guest (disks, network cards, etc)
> in a
> > > > > single I/O thread, it allows us to provide better scheduling by
> giving
> > > us
> > > > > more knowledge of the nature of the work.  So now instead of
> relying on
> > > the
> > > > > linux scheduler to perform context switches between multiple vhost
> > > threads,
> > > > > we have a single thread context in which we can do the I/O
> scheduling
> > > more
> > > > > efficiently.  We can closely monitor the performance needs of each
> > > queue of
> > > > > each device inside the vhost thread which gives us much more
> > > information
> > > > > than relying on the kernel's thread scheduler.
> > > > > This does not expose any additional opportunities for attacks (DoS
> or
> > > > > other) than are already available since all of the I/O traffic
> belongs
> > > to a
> > > > > single guest.
&

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 27/11/2013 01:03:25 PM:

>
> On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin"  wrote on 27/11/2013 12:29:43 PM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > "Michael S. Tsirkin"  wrote on 27/11/2013 11:21:00
AM:
> > > >
> > > > >
> > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > "Michael S. Tsirkin"  wrote on 26/11/2013
11:11:57
> > PM:
> > > > > >
> > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Anthony Liguori  wrote on 26/11/2013
> > > > 08:05:00
> > > > > > PM:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Razya Ladelsky  writes:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> > team,
> > > > which
> > > > > > > > > > developed Elvis, presented by Abel Gordon at the last
KVM
> > > > forum:
> > > > > > > > > > ELVIS video:
https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > > ELVIS slides:
> > > > > > > >
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > According to the discussions that took place at the
forum,
> > > > > > upstreaming
> > > > > > > > > > some of the Elvis approaches seems to be a good idea,
which
> > we
> > > > > > would
> > > > > > > > like
> > > > > > > > > > to pursue.
> > > > > > > > > >
> > > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > > >
> > > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > > This patch creates a worker thread and worker queue
shared
> > > > across
> > > > > > > > multiple
> > > > > > > > > > virtio devices
> > > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > >
https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > > to limit a vhost thread to serve multiple devices only
if
> > they
> > > > > > belong
> > > > > > > > to
> > > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> > cgroups
> > > > > > concerns.
> > > > > > > > > >
> > > > > > > > > > Another modification is related to the creation and
removal
> > of
> > > > > > vhost
> > > > > > > > > > threads, which will be discussed next.
> > > > > > > > >
> > > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > > >
> > > > > > > > > We shouldn't throw away isolation without exhausting
every
> > other
> > > > > > > > > possibility.
> > > > > > > >
> > > > > > > > Seems you have missed the important details here.
> > > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > > and you believe we should not share a single vhost thread
> > across
> > > > > > > > multiple VMs.  That's why Razya proposed to change the
patch
> > > > > > > > so we will serve multiple virtio devices using a single
vhost
> > > > thread
> > > > > > > > "only if the devices belong to the same VM". This series of
> > patches
> > > > > > > > will not allow two different VMs to share the same vhost
> > thread.
> > > > > > > > So, I don't see why this will be throwing away isolation
and
> > why
> > > > > > > > this could be a "exceptionally bad idea".
> > > > > > > >
> > > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > > approach of having a single data plane thread for many
devices
> > > > > > > > was discussed
> > > > > > > > > We've seen very positive results from adding threads.  We
> > should
> > > > also
> > > > > > > > > look at scheduling.
> > > > > > > >
> > > > > > > > ...and we have also seen exceptionally negative results
from
> > > > > > > > adding threads, both for vhost and data-plane. If you have
lot
> > of
> > > > idle
> > > > > > > > time/cores
> > > > > > > > then it makes sense to run multiple threads. But IMHO in
many
> > > > scenarios
> > > > > > you
> > > > > > > > don't have lot of idle time/cores.. and if you have them
you
> > would
> > > > > > probably
> > > > > > > > prefer to run more VMs/VCPUshosting a single SMP VM
when
> > you
> > > > have
> > > > > > > > enough physical cores to run all the VCPU threads and the
I/O
> > > > threads
> > > > > > is
> > > > > > > > not a
> > > > > > > > realistic scenario.
> > > > > > > >
> > > > > > > > That's why we are proposing to implement a mechanism that
will
> > > > enable
> > > > > > > > the management stack to configure 1 thread per I/O device
(as
> > it is
> > > > > > today)
> > > > > > > > or 1 thread for many I/O devices (belonging to the same
VM).
> > > > > > > >
> > > > > > > > > Once you are scheduling multiple guests in a single vhost
> > device,
> > > > you
> > > > > > > > > now create a whole new class of DoS attacks in the best
case
> > > > > > scenario.

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 27/11/2013 12:59:38 PM:


> On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > > Hi,
> > > >
> > > > Razya is out for a few days, so I will try to answer the questions
as
> > well
> > > > as I can:
> > > >
> > > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57
PM:
> > > >
> > > > > From: "Michael S. Tsirkin" 
> > > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > > Cc: Anthony Liguori ,
abel.gor...@gmail.com,
> > > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel
Nider/Haifa/
> > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya
Ladelsky/
> > > > > Haifa/IBM@IBMIL
> > > > > Date: 27/11/2013 01:08 AM
> > > > > Subject: Re: Elvis upstreaming plan
> > > > >
> > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > Anthony Liguori  wrote on 26/11/2013
> > 08:05:00
> > > > PM:
> > > > > >
> > > > > > >
> > > > > > > Razya Ladelsky  writes:
> > > > > > >
> > > > 
> > > > > >
> > > > > > That's why we are proposing to implement a mechanism that will
> > enable
> > > > > > the management stack to configure 1 thread per I/O device (as
it is
> > > > today)
> > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > >
> > > > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> > > > > > > now create a whole new class of DoS attacks in the best case
> > > > scenario.
> > > > > >
> > > > > > Again, we are NOT proposing to schedule multiple guests in a
single
> > > > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > >
> > > > >
> > > > > I guess a question then becomes why have multiple devices?
> > > >
> > > > If you mean "why serve multiple devices from a single thread" the
> > answer is
> > > > that we cannot rely on the Linux scheduler which has no knowledge
of
> > I/O
> > > > queues to do a decent job of scheduling I/O.  The idea is to take
over
> > the
> > > > I/O scheduling responsibilities from the kernel's thread scheduler
with
> > a
> > > > more efficient I/O scheduler inside each vhost thread.  So by
combining
> > all
> > > > of the I/O devices from the same guest (disks, network cards, etc)
in a
> > > > single I/O thread, it allows us to provide better scheduling by
giving
> > us
> > > > more knowledge of the nature of the work.  So now instead of
relying on
> > the
> > > > linux scheduler to perform context switches between multiple vhost
> > threads,
> > > > we have a single thread context in which we can do the I/O
scheduling
> > more
> > > > efficiently.  We can closely monitor the performance needs of each
> > queue of
> > > > each device inside the vhost thread which gives us much more
> > information
> > > > than relying on the kernel's thread scheduler.
> > > > This does not expose any additional opportunities for attacks (DoS
or
> > > > other) than are already available since all of the I/O traffic
belongs
> > to a
> > > > single guest.
> > > > You can make the argument that with low I/O loads this mechanism
may
> > not
> > > > make much difference.  However when you try to maximize the
utilization
> > of
> > > > your hardware (such as in a commercial scenario) this technique can
> > gain
> > > > you a large benefit.
> > > >
> > > > Regards,
> > > >
> > > > Joel Nider
> > > > Virtualization Research
> > > > IBM Research and Development
> > > > Haifa Research Lab
> > 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin"  wrote on 27/11/2013 12:29:43 PM:
> 
> >
> > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin"  wrote on 27/11/2013 11:21:00 AM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori  wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky  writes:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> team,
> > > which
> > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > > forum:
> > > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > ELVIS slides:
> > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the discussions that took place at the forum,
> > > > > upstreaming
> > > > > > > > > some of the Elvis approaches seems to be a good idea, which
> we
> > > > > would
> > > > > > > like
> > > > > > > > > to pursue.
> > > > > > > > >
> > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > >
> > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > This patch creates a worker thread and worker queue shared
> > > across
> > > > > > > multiple
> > > > > > > > > virtio devices
> > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > to limit a vhost thread to serve multiple devices only if
> they
> > > > > belong
> > > > > > > to
> > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> cgroups
> > > > > concerns.
> > > > > > > > >
> > > > > > > > > Another modification is related to the creation and removal
> of
> > > > > vhost
> > > > > > > > > threads, which will be discussed next.
> > > > > > > >
> > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > >
> > > > > > > > We shouldn't throw away isolation without exhausting every
> other
> > > > > > > > possibility.
> > > > > > >
> > > > > > > Seems you have missed the important details here.
> > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > and you believe we should not share a single vhost thread
> across
> > > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > > so we will serve multiple virtio devices using a single vhost
> > > thread
> > > > > > > "only if the devices belong to the same VM". This series of
> patches
> > > > > > > will not allow two different VMs to share the same vhost
> thread.
> > > > > > > So, I don't see why this will be throwing away isolation and
> why
> > > > > > > this could be a "exceptionally bad idea".
> > > > > > >
> > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > approach of having a single data plane thread for many devices
> > > > > > > was discussed
> > > > > > > > We've seen very positive results from adding threads.  We
> should
> > > also
> > > > > > > > look at scheduling.
> > > > > > >
> > > > > > > ...and we have also seen exceptionally negative results from
> > > > > > > adding threads, both for vhost and data-plane. If you have lot
> of
> > > idle
> > > > > > > time/cores
> > > > > > > then it makes sense to run multiple threads. But IMHO in many
> > > scenarios
> > > > > you
> > > > > > > don't have lot of idle time/cores.. and if you have them you
> would
> > > > > probably
> > > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when
> you
> > > have
> > > > > > > enough physical cores to run all the VCPU threads and the I/O
> > > threads
> > > > > is
> > > > > > > not a
> > > > > > > realistic scenario.
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
> 
> >
> > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > Hi,
> > >
> > > Razya is out for a few days, so I will try to answer the questions as
> well
> > > as I can:
> > >
> > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > From: "Michael S. Tsirkin" 
> > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > > Haifa/IBM@IBMIL
> > > > Date: 27/11/2013 01:08 AM
> > > > Subject: Re: Elvis upstreaming plan
> > > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori  wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky  writes:
> > > > > >
> > > 
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > If you mean "why serve multiple devices from a single thread" the
> answer is
> > > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
> > > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
> > > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
> > > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
> > > of the I/O devices from the same guest (disks, network cards, etc) in a
> > > single I/O thread, it allows us to provide better scheduling by giving
> us
> > > more knowledge of the nature of the work.  So now instead of relying on
> the
> > > linux scheduler to perform context switches between multiple vhost
> threads,
> > > we have a single thread context in which we can do the I/O scheduling
> more
> > > efficiently.  We can closely monitor the performance needs of each
> queue of
> > > each device inside the vhost thread which gives us much more
> information
> > > than relying on the kernel's thread scheduler.
> > > This does not expose any additional opportunities for attacks (DoS or
> > > other) than are already available since all of the I/O traffic belongs
> to a
> > > single guest.
> > > You can make the argument that with low I/O loads this mechanism may
> not
> > > make much difference.  However when you try to maximize the utilization
> of
> > > your hardware (such as in a commercial scenario) this technique can
> gain
> > > you a large benefit.
> > >
> > > Regards,
> > >
> > > Joel Nider
> > > Virtualization Research
> > > IBM Research and Development
> > > Haifa Research Lab
> >
> > So all this would sound more convincing if we had sharing between VMs.
> > When it's only a single VM it's somehow less convincing, isn't it?
> > Of course if we would bypass a scheduler like this it becomes harder to
> > enforce cgroup limits.
> 
> True, but here the issue becomes isolation/cgroups. We can start to show
> the value for VMs that have multiple devices / queues and then we could
> re-consider extending the mechanism for multiple VMs (at least as a
> experimental feature).

Sorry, If it

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 27/11/2013 12:29:43 PM:

>
> On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin"  wrote on 27/11/2013 11:21:00 AM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57
PM:
> > > >
> > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > Anthony Liguori  wrote on 26/11/2013
> > 08:05:00
> > > > PM:
> > > > > >
> > > > > > >
> > > > > > > Razya Ladelsky  writes:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
team,
> > which
> > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > forum:
> > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > ELVIS slides:
> > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > >
> > > > > > > >
> > > > > > > > According to the discussions that took place at the forum,
> > > > upstreaming
> > > > > > > > some of the Elvis approaches seems to be a good idea, which
we
> > > > would
> > > > > > like
> > > > > > > > to pursue.
> > > > > > > >
> > > > > > > > Our plan for the first patches is the following:
> > > > > > > >
> > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > This patch creates a worker thread and worker queue shared
> > across
> > > > > > multiple
> > > > > > > > virtio devices
> > > > > > > > We would like to modify the patch posted in
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > to limit a vhost thread to serve multiple devices only if
they
> > > > belong
> > > > > > to
> > > > > > > > the same VM as Paolo suggested to avoid isolation or
cgroups
> > > > concerns.
> > > > > > > >
> > > > > > > > Another modification is related to the creation and removal
of
> > > > vhost
> > > > > > > > threads, which will be discussed next.
> > > > > > >
> > > > > > > I think this is an exceptionally bad idea.
> > > > > > >
> > > > > > > We shouldn't throw away isolation without exhausting every
other
> > > > > > > possibility.
> > > > > >
> > > > > > Seems you have missed the important details here.
> > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > and you believe we should not share a single vhost thread
across
> > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > so we will serve multiple virtio devices using a single vhost
> > thread
> > > > > > "only if the devices belong to the same VM". This series of
patches
> > > > > > will not allow two different VMs to share the same vhost
thread.
> > > > > > So, I don't see why this will be throwing away isolation and
why
> > > > > > this could be a "exceptionally bad idea".
> > > > > >
> > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > approach of having a single data plane thread for many devices
> > > > > > was discussed
> > > > > > > We've seen very positive results from adding threads.  We
should
> > also
> > > > > > > look at scheduling.
> > > > > >
> > > > > > ...and we have also seen exceptionally negative results from
> > > > > > adding threads, both for vhost and data-plane. If you have lot
of
> > idle
> > > > > > time/cores
> > > > > > then it makes sense to run multiple threads. But IMHO in many
> > scenarios
> > > > you
> > > > > > don't have lot of idle time/cores.. and if you have them you
would
> > > > probably
> > > > > > prefer to run more VMs/VCPUshosting a single SMP VM when
you
> > have
> > > > > > enough physical cores to run all the VCPU threads and the I/O
> > threads
> > > > is
> > > > > > not a
> > > > > > realistic scenario.
> > > > > >
> > > > > > That's why we are proposing to implement a mechanism that will
> > enable
> > > > > > the management stack to configure 1 thread per I/O device (as
it is
> > > > today)
> > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > >
> > > > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> > > > > > > now create a whole new class of DoS attacks in the best case
> > > > scenario.
> > > > > >
> > > > > > Again, we are NOT proposing to schedule multiple guests in a
single
> > > > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > >
> > > > >
> > > > > I guess a question then becomes why have multiple devices?
> > > >
> > > > I assume that there are guests that have multiple vhost devices
> > > > (net or scsi/tcm).
> > >
> > > These are kind of uncommon though.  In fact a kernel thread is not a
> > > unit of isolation - cgroups supply isolation.
> > > If we had use_cgroups kind of like use_mm, we could thi

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:

>
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" 
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori  wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky  writes:
> > > > >
> > 
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
> > This does not expose any additional opportunities for attacks (DoS or
> > other) than are already available since all of the I/O traffic belongs
to a
> > single guest.
> > You can make the argument that with low I/O loads this mechanism may
not
> > make much difference.  However when you try to maximize the utilization
of
> > your hardware (such as in a commercial scenario) this technique can
gain
> > you a large benefit.
> >
> > Regards,
> >
> > Joel Nider
> > Virtualization Research
> > IBM Research and Development
> > Haifa Research Lab
>
> So all this would sound more convincing if we had sharing between VMs.
> When it's only a single VM it's somehow less convincing, isn't it?
> Of course if we would bypass a scheduler like this it becomes harder to
> enforce cgroup limits.

True, but here the issue becomes isolation/cgroups. We can start to show
the value for VMs that have multiple devices / queues and then we could
re-consider extending the mechanism for multiple VMs (at least as a
experimental feature).

> But it might be easier to give scheduler the info it needs to do what we
> need.  Would an API that basically says "run this kthread right now"
> do the trick?

...do you really believe it would be possible to push this kind of change
to the Linux scheduler ? In addition, we need more than
"run this kthread right now" because you need to monitor the virtio
ring activity to specify "when" you will like to run a "specific kthread"
and for "how long".

>
> >

> >

> >

> >  Phone: 972-4-829-6326 | Mobile: 972-54-3

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 12:18:51PM +0200, Abel Gordon wrote:
> 
> 
> Jason Wang  wrote on 27/11/2013 04:49:20 AM:
> 
> >
> > On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> > > Hi all,
> > >
> > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > ELVIS slides:
> https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > >
> > >
> > > According to the discussions that took place at the forum, upstreaming
> > > some of the Elvis approaches seems to be a good idea, which we would
> like
> > > to pursue.
> > >
> > > Our plan for the first patches is the following:
> > >
> > > 1.Shared vhost thread between mutiple devices
> > > This patch creates a worker thread and worker queue shared across
> multiple
> > > virtio devices
> > > We would like to modify the patch posted in
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > to limit a vhost thread to serve multiple devices only if they belong
> to
> > > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > >
> > > Another modification is related to the creation and removal of vhost
> > > threads, which will be discussed next.
> > >
> > > 2. Sysfs mechanism to add and remove vhost threads
> > > This patch allows us to add and remove vhost threads dynamically.
> > >
> > > A simpler way to control the creation of vhost threads is statically
> > > determining the maximum number of virtio devices per worker via a
> kernel
> > > module parameter (which is the way the previously mentioned patch is
> > > currently implemented)
> >
> > Any chance we can re-use the cwmq instead of inventing another
> > mechanism? Looks like there're lots of function duplication here. Bandan
> > has an RFC to do this.
> 
> Thanks for the suggestion. We should certainly take a look at Bandan's
> patches which I guess are:
> 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html
> 
> My only concern here is that we may not be able to easily implement
> our polling mechanism and heuristics with cwmq.

It's not so hard, to poll you just requeue work to make sure it's
re-invoked.

> > >
> > > I'd like to ask for advice here about the more preferable way to go:
> > > Although having the sysfs mechanism provides more flexibility, it may
> be a
> > > good idea to start with a simple static parameter, and have the first
> > > patches as simple as possible. What do you think?
> > >
> > > 3.Add virtqueue polling mode to vhost
> > > Have the vhost thread poll the virtqueues with high I/O rate for new
> > > buffers , and avoid asking the guest to kick us.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> >
> > Maybe we can make poll_stop_idle adaptive which may help the light load
> > case. Consider guest is often slow than vhost, if we just have one or
> > two vms, polling too much may waste cpu in this case.
> 
> Yes, make polling adaptive based on the amount of wasted cycles (cycles
> we did polling but didn't find new work) and I/O rate is a very good idea.
> Note we already measure and expose these values but we do not use them
> to adapt the polling mechanism.
> 
> Having said that, note that adaptive polling may be a bit tricky.
> Remember that the cycles we waste polling in the vhost thread actually
> improves the performance of the vcpu threads because the guest is no longer
> 
> require to kick (pio==exit) the host when vhost does polling. So even if
> we waste cycles in the vhost thread, we are saving cycles in the
> vcpu thread and improving performance.


So my suggestion would be:

- guest runs some kicks
- measures how long it took, e.g. kick = T cycles
- sends this info to host

host polls for at most fraction * T cycles


> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with a
> 
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> 
> Yep, we just had a discussion with Gleb about this :)
> 
> > >
> > > 5. Add heuristics to improve I/O scheduling
> > > This patch enhances the round-robin mechanism with a set of heuristics
> to
> > > decide when to leave a virtqueue and proceed to the next.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > >
> > > This patch improves the handling of the requests by the vhost thread,
> but
> > > could perhaps be delayed to a
> > > later time , and not submitted

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin"  wrote on 27/11/2013 11:21:00 AM:
> 
> >
> > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori  wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky  writes:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
> which
> > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> forum:
> > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > ELVIS slides:
> > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > >
> > > > > > >
> > > > > > > According to the discussions that took place at the forum,
> > > upstreaming
> > > > > > > some of the Elvis approaches seems to be a good idea, which we
> > > would
> > > > > like
> > > > > > > to pursue.
> > > > > > >
> > > > > > > Our plan for the first patches is the following:
> > > > > > >
> > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > This patch creates a worker thread and worker queue shared
> across
> > > > > multiple
> > > > > > > virtio devices
> > > > > > > We would like to modify the patch posted in
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > to limit a vhost thread to serve multiple devices only if they
> > > belong
> > > > > to
> > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > > concerns.
> > > > > > >
> > > > > > > Another modification is related to the creation and removal of
> > > vhost
> > > > > > > threads, which will be discussed next.
> > > > > >
> > > > > > I think this is an exceptionally bad idea.
> > > > > >
> > > > > > We shouldn't throw away isolation without exhausting every other
> > > > > > possibility.
> > > > >
> > > > > Seems you have missed the important details here.
> > > > > Anthony, we are aware you are concerned about isolation
> > > > > and you believe we should not share a single vhost thread across
> > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > so we will serve multiple virtio devices using a single vhost
> thread
> > > > > "only if the devices belong to the same VM". This series of patches
> > > > > will not allow two different VMs to share the same vhost thread.
> > > > > So, I don't see why this will be throwing away isolation and why
> > > > > this could be a "exceptionally bad idea".
> > > > >
> > > > > By the way, I remember that during the KVM forum a similar
> > > > > approach of having a single data plane thread for many devices
> > > > > was discussed
> > > > > > We've seen very positive results from adding threads.  We should
> also
> > > > > > look at scheduling.
> > > > >
> > > > > ...and we have also seen exceptionally negative results from
> > > > > adding threads, both for vhost and data-plane. If you have lot of
> idle
> > > > > time/cores
> > > > > then it makes sense to run multiple threads. But IMHO in many
> scenarios
> > > you
> > > > > don't have lot of idle time/cores.. and if you have them you would
> > > probably
> > > > > prefer to run more VMs/VCPUshosting a single SMP VM when you
> have
> > > > > enough physical cores to run all the VCPU threads and the I/O
> threads
> > > is
> > > > > not a
> > > > > realistic scenario.
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > I assume that there are guests that have multiple vhost devices
> > > (net or scsi/tcm).
> >
> > These are kind of uncommon though.  In fact a kernel thread is not a
> > unit of isolation - cgroups supply isolation.
> > If we had use_cgroups kind of like use_mm, we could thinkably
> > do work for multiple VMs on the same thread.
> >
> >
> > > We can also extend the approach to consider
> > > multiqueue devices, so we can create 1 vhost thread shared for all the
> > > queues,
> > > 1 vhost thread for each queue or a few threads for multiple queues. We
> > > could also share 

Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> Hi,
> 
> Razya is out for a few days, so I will try to answer the questions as well
> as I can:
> 
> "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> 
> > From: "Michael S. Tsirkin" 
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: Anthony Liguori , abel.gor...@gmail.com,
> > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> > Haifa/IBM@IBMIL
> > Date: 27/11/2013 01:08 AM
> > Subject: Re: Elvis upstreaming plan
> >
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori  wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky  writes:
> > > >
> 
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> If you mean "why serve multiple devices from a single thread" the answer is
> that we cannot rely on the Linux scheduler which has no knowledge of I/O
> queues to do a decent job of scheduling I/O.  The idea is to take over the
> I/O scheduling responsibilities from the kernel's thread scheduler with a
> more efficient I/O scheduler inside each vhost thread.  So by combining all
> of the I/O devices from the same guest (disks, network cards, etc) in a
> single I/O thread, it allows us to provide better scheduling by giving us
> more knowledge of the nature of the work.  So now instead of relying on the
> linux scheduler to perform context switches between multiple vhost threads,
> we have a single thread context in which we can do the I/O scheduling more
> efficiently.  We can closely monitor the performance needs of each queue of
> each device inside the vhost thread which gives us much more information
> than relying on the kernel's thread scheduler.
> This does not expose any additional opportunities for attacks (DoS or
> other) than are already available since all of the I/O traffic belongs to a
> single guest.
> You can make the argument that with low I/O loads this mechanism may not
> make much difference.  However when you try to maximize the utilization of
> your hardware (such as in a commercial scenario) this technique can gain
> you a large benefit.
> 
> Regards,
> 
> Joel Nider
> Virtualization Research
> IBM Research and Development
> Haifa Research Lab

So all this would sound more convincing if we had sharing between VMs.
When it's only a single VM it's somehow less convincing, isn't it?
Of course if we would bypass a scheduler like this it becomes harder to
enforce cgroup limits.
But it might be easier to give scheduler the info it needs to do what we
need.  Would an API that basically says "run this kthread right now"
do the trick?


>   
>   
>   
>   
>   
>   
>  Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image 
> moved to file: 
>  E-mail: jo...@il.ibm.com  
> pic39571.gif)IBM 
>   
>   
>   
>   
> 
> 
> 
> 
> > > > > Hi all,
> > > > >
> > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > ELVIS slides:
> > > https://driv

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Jason Wang  wrote on 27/11/2013 04:49:20 AM:

>
> On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> > Hi all,
> >
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> >
> >
> > According to the discussions that took place at the forum, upstreaming
> > some of the Elvis approaches seems to be a good idea, which we would
like
> > to pursue.
> >
> > Our plan for the first patches is the following:
> >
> > 1.Shared vhost thread between mutiple devices
> > This patch creates a worker thread and worker queue shared across
multiple
> > virtio devices
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > to limit a vhost thread to serve multiple devices only if they belong
to
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> >
> > Another modification is related to the creation and removal of vhost
> > threads, which will be discussed next.
> >
> > 2. Sysfs mechanism to add and remove vhost threads
> > This patch allows us to add and remove vhost threads dynamically.
> >
> > A simpler way to control the creation of vhost threads is statically
> > determining the maximum number of virtio devices per worker via a
kernel
> > module parameter (which is the way the previously mentioned patch is
> > currently implemented)
>
> Any chance we can re-use the cwmq instead of inventing another
> mechanism? Looks like there're lots of function duplication here. Bandan
> has an RFC to do this.

Thanks for the suggestion. We should certainly take a look at Bandan's
patches which I guess are:

http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html

My only concern here is that we may not be able to easily implement
our polling mechanism and heuristics with cwmq.

> >
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may
be a
> > good idea to start with a simple static parameter, and have the first
> > patches as simple as possible. What do you think?
> >
> > 3.Add virtqueue polling mode to vhost
> > Have the vhost thread poll the virtqueues with high I/O rate for new
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> Maybe we can make poll_stop_idle adaptive which may help the light load
> case. Consider guest is often slow than vhost, if we just have one or
> two vms, polling too much may waste cpu in this case.

Yes, make polling adaptive based on the amount of wasted cycles (cycles
we did polling but didn't find new work) and I/O rate is a very good idea.
Note we already measure and expose these values but we do not use them
to adapt the polling mechanism.

Having said that, note that adaptive polling may be a bit tricky.
Remember that the cycles we waste polling in the vhost thread actually
improves the performance of the vcpu threads because the guest is no longer

require to kick (pio==exit) the host when vhost does polling. So even if
we waste cycles in the vhost thread, we are saving cycles in the
vcpu thread and improving performance.

> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different
performance
> > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > statistics are exposed using debugfs and can be easily displayed with a

> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
> How about using trace points instead? Besides statistics, it can also
> help more in debugging.

Yep, we just had a discussion with Gleb about this :)

> >
> > 5. Add heuristics to improve I/O scheduling
> > This patch enhances the round-robin mechanism with a set of heuristics
to
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> >
> > This patch improves the handling of the requests by the vhost thread,
but
> > could perhaps be delayed to a
> > later time , and not submitted as one of the first Elvis patches.
> > I'd love to hear some comments about whether this patch needs to be
part
> > of the first submission.
> >
> > Any other feedback on this plan will be appreciated,
> > Thank you,
> > Razya
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord.

Re: Elvis upstreaming plan

2013-11-27 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 11:33:19AM +0200, Abel Gordon wrote:
> 
> 
> Gleb Natapov  wrote on 27/11/2013 11:21:59 AM:
> 
> 
> > On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov  wrote on 27/11/2013 09:35:01 AM:
> > >
> > > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > > > 4. vhost statistics
> > > > > > This patch introduces a set of statistics to monitor different
> > > > performance
> > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
> The
> > > > > > statistics are exposed using debugfs and can be easily displayed
> with
> > > a
> > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > >
> > > > > How about using trace points instead? Besides statistics, it can
> also
> > > > > help more in debugging.
> > > > Definitely. kvm_stats has moved to ftrace long time ago.
> > > >
> > >
> > > We should use trace points for debugging information  but IMHO we
> should
> > > have a dedicated (and different) mechanism to expose data that can be
> > > easily consumed by a user-space (policy) application to control how
> many
> > > vhost threads we need or any other vhost feature we may introduce
> > > (e.g. polling). That's why we proposed something like vhost_stat
> > > based on sysfs.
> > >
> > > This is not like kvm_stat that can be replaced with tracepoints. Here
> > > we will like to expose data to "control" the system. So I would
> > > say what we are trying to do something that resembles the ksm interface
> > > implemented under /sys/kernel/mm/ksm/
> > There are control operation and there are performance/statistic
> > gathering operations use /sys for former and ftrace for later. The fact
> > that you need /sys interface for other things does not mean you can
> > abuse it for statistics too.
> 
> Agree. Any statistics that we add for debugging purposes should be
> implemented
> using tracepoints. But control and related data interfaces (that are not
> for
> debugging purposes) should be in sysfs. Look for example at
Yes things that are not for statistics only and part of control interface
that management will use should not use ftrace (I do not think adding
more knobs is a good idea, but this is for vhost maintainer to decide),
but ksm predates ftrace, so some things below could have been implemented
as ftrace points.

>  /sys/kernel/mm/ksm/full_scans
>  /sys/kernel/mm/ksm/pages_shared
>  /sys/kernel/mm/ksm/pages_sharing
>  /sys/kernel/mm/ksm/pages_to_scan
>  /sys/kernel/mm/ksm/pages_unshared
>  /sys/kernel/mm/ksm/pages_volatile
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 27/11/2013 11:21:00 AM:

>
> On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori  wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky  writes:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
which
> > > > > > developed Elvis, presented by Abel Gordon at the last KVM
forum:
> > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > ELVIS slides:
> > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > >
> > > > > >
> > > > > > According to the discussions that took place at the forum,
> > upstreaming
> > > > > > some of the Elvis approaches seems to be a good idea, which we
> > would
> > > > like
> > > > > > to pursue.
> > > > > >
> > > > > > Our plan for the first patches is the following:
> > > > > >
> > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > This patch creates a worker thread and worker queue shared
across
> > > > multiple
> > > > > > virtio devices
> > > > > > We would like to modify the patch posted in
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > to limit a vhost thread to serve multiple devices only if they
> > belong
> > > > to
> > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > concerns.
> > > > > >
> > > > > > Another modification is related to the creation and removal of
> > vhost
> > > > > > threads, which will be discussed next.
> > > > >
> > > > > I think this is an exceptionally bad idea.
> > > > >
> > > > > We shouldn't throw away isolation without exhausting every other
> > > > > possibility.
> > > >
> > > > Seems you have missed the important details here.
> > > > Anthony, we are aware you are concerned about isolation
> > > > and you believe we should not share a single vhost thread across
> > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > so we will serve multiple virtio devices using a single vhost
thread
> > > > "only if the devices belong to the same VM". This series of patches
> > > > will not allow two different VMs to share the same vhost thread.
> > > > So, I don't see why this will be throwing away isolation and why
> > > > this could be a "exceptionally bad idea".
> > > >
> > > > By the way, I remember that during the KVM forum a similar
> > > > approach of having a single data plane thread for many devices
> > > > was discussed
> > > > > We've seen very positive results from adding threads.  We should
also
> > > > > look at scheduling.
> > > >
> > > > ...and we have also seen exceptionally negative results from
> > > > adding threads, both for vhost and data-plane. If you have lot of
idle
> > > > time/cores
> > > > then it makes sense to run multiple threads. But IMHO in many
scenarios
> > you
> > > > don't have lot of idle time/cores.. and if you have them you would
> > probably
> > > > prefer to run more VMs/VCPUshosting a single SMP VM when you
have
> > > > enough physical cores to run all the VCPU threads and the I/O
threads
> > is
> > > > not a
> > > > realistic scenario.
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > I assume that there are guests that have multiple vhost devices
> > (net or scsi/tcm).
>
> These are kind of uncommon though.  In fact a kernel thread is not a
> unit of isolation - cgroups supply isolation.
> If we had use_cgroups kind of like use_mm, we could thinkably
> do work for multiple VMs on the same thread.
>
>
> > We can also extend the approach to consider
> > multiqueue devices, so we can create 1 vhost thread shared for all the
> > queues,
> > 1 vhost thread for each queue or a few threads for multiple queues. We
> > could also share a thread across multiple queues even if they do not
belong
> > to the same device.
> >
> > Remember the experiments Shirley Ma did with the split
> > tx/rx ? If we have a control interface we could support both
> > approaches: different threads or a single thread.
>
>
> I'm a bit concerned about interface managing specifi

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Gleb Natapov  wrote on 27/11/2013 11:21:59 AM:


> On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> >
> >
> > Gleb Natapov  wrote on 27/11/2013 09:35:01 AM:
> >
> > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > > 4. vhost statistics
> > > > > This patch introduces a set of statistics to monitor different
> > > performance
> > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
The
> > > > > statistics are exposed using debugfs and can be easily displayed
with
> > a
> > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > >
> > > > How about using trace points instead? Besides statistics, it can
also
> > > > help more in debugging.
> > > Definitely. kvm_stats has moved to ftrace long time ago.
> > >
> >
> > We should use trace points for debugging information  but IMHO we
should
> > have a dedicated (and different) mechanism to expose data that can be
> > easily consumed by a user-space (policy) application to control how
many
> > vhost threads we need or any other vhost feature we may introduce
> > (e.g. polling). That's why we proposed something like vhost_stat
> > based on sysfs.
> >
> > This is not like kvm_stat that can be replaced with tracepoints. Here
> > we will like to expose data to "control" the system. So I would
> > say what we are trying to do something that resembles the ksm interface
> > implemented under /sys/kernel/mm/ksm/
> There are control operation and there are performance/statistic
> gathering operations use /sys for former and ftrace for later. The fact
> that you need /sys interface for other things does not mean you can
> abuse it for statistics too.

Agree. Any statistics that we add for debugging purposes should be
implemented
using tracepoints. But control and related data interfaces (that are not
for
debugging purposes) should be in sysfs. Look for example at
 /sys/kernel/mm/ksm/full_scans
 /sys/kernel/mm/ksm/pages_shared
 /sys/kernel/mm/ksm/pages_sharing
 /sys/kernel/mm/ksm/pages_to_scan
 /sys/kernel/mm/ksm/pages_unshared
 /sys/kernel/mm/ksm/pages_volatile


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> 
> 
> Gleb Natapov  wrote on 27/11/2013 09:35:01 AM:
> 
> > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > 4. vhost statistics
> > > > This patch introduces a set of statistics to monitor different
> > performance
> > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > statistics are exposed using debugfs and can be easily displayed with
> a
> > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > >
> > > How about using trace points instead? Besides statistics, it can also
> > > help more in debugging.
> > Definitely. kvm_stats has moved to ftrace long time ago.
> >
> 
> We should use trace points for debugging information  but IMHO we should
> have a dedicated (and different) mechanism to expose data that can be
> easily consumed by a user-space (policy) application to control how many
> vhost threads we need or any other vhost feature we may introduce
> (e.g. polling). That's why we proposed something like vhost_stat
> based on sysfs.
> 
> This is not like kvm_stat that can be replaced with tracepoints. Here
> we will like to expose data to "control" the system. So I would
> say what we are trying to do something that resembles the ksm interface
> implemented under /sys/kernel/mm/ksm/
There are control operation and there are performance/statistic
gathering operations use /sys for former and ftrace for later. The fact
that you need /sys interface for other things does not mean you can
abuse it for statistics too.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


Gleb Natapov  wrote on 27/11/2013 09:35:01 AM:

> On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with
a
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> Definitely. kvm_stats has moved to ftrace long time ago.
>

We should use trace points for debugging information  but IMHO we should
have a dedicated (and different) mechanism to expose data that can be
easily consumed by a user-space (policy) application to control how many
vhost threads we need or any other vhost feature we may introduce
(e.g. polling). That's why we proposed something like vhost_stat
based on sysfs.

This is not like kvm_stat that can be replaced with tracepoints. Here
we will like to expose data to "control" the system. So I would
say what we are trying to do something that resembles the ksm interface
implemented under /sys/kernel/mm/ksm/

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Michael S. Tsirkin
On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
> 
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori  wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky  writes:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > ELVIS slides:
> > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > >
> > > > >
> > > > > According to the discussions that took place at the forum,
> upstreaming
> > > > > some of the Elvis approaches seems to be a good idea, which we
> would
> > > like
> > > > > to pursue.
> > > > >
> > > > > Our plan for the first patches is the following:
> > > > >
> > > > > 1.Shared vhost thread between mutiple devices
> > > > > This patch creates a worker thread and worker queue shared across
> > > multiple
> > > > > virtio devices
> > > > > We would like to modify the patch posted in
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > to limit a vhost thread to serve multiple devices only if they
> belong
> > > to
> > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> concerns.
> > > > >
> > > > > Another modification is related to the creation and removal of
> vhost
> > > > > threads, which will be discussed next.
> > > >
> > > > I think this is an exceptionally bad idea.
> > > >
> > > > We shouldn't throw away isolation without exhausting every other
> > > > possibility.
> > >
> > > Seems you have missed the important details here.
> > > Anthony, we are aware you are concerned about isolation
> > > and you believe we should not share a single vhost thread across
> > > multiple VMs.  That's why Razya proposed to change the patch
> > > so we will serve multiple virtio devices using a single vhost thread
> > > "only if the devices belong to the same VM". This series of patches
> > > will not allow two different VMs to share the same vhost thread.
> > > So, I don't see why this will be throwing away isolation and why
> > > this could be a "exceptionally bad idea".
> > >
> > > By the way, I remember that during the KVM forum a similar
> > > approach of having a single data plane thread for many devices
> > > was discussed
> > > > We've seen very positive results from adding threads.  We should also
> > > > look at scheduling.
> > >
> > > ...and we have also seen exceptionally negative results from
> > > adding threads, both for vhost and data-plane. If you have lot of idle
> > > time/cores
> > > then it makes sense to run multiple threads. But IMHO in many scenarios
> you
> > > don't have lot of idle time/cores.. and if you have them you would
> probably
> > > prefer to run more VMs/VCPUshosting a single SMP VM when you have
> > > enough physical cores to run all the VCPU threads and the I/O threads
> is
> > > not a
> > > realistic scenario.
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> I assume that there are guests that have multiple vhost devices
> (net or scsi/tcm).

These are kind of uncommon though.  In fact a kernel thread is not a
unit of isolation - cgroups supply isolation.
If we had use_cgroups kind of like use_mm, we could thinkably
do work for multiple VMs on the same thread.


> We can also extend the approach to consider
> multiqueue devices, so we can create 1 vhost thread shared for all the
> queues,
> 1 vhost thread for each queue or a few threads for multiple queues. We
> could also share a thread across multiple queues even if they do not belong
> to the same device.
> 
> Remember the experiments Shirley Ma did with the split
> tx/rx ? If we have a control interface we could support both
> approaches: different threads or a single thread.


I'm a bit concerned about interface managing specific
threads being so low level.
What exactly is it that management knows that makes it
efficient to group threads together?
That host is over-committed so we should use less CPU?
I'd like the interface to express that knowledge.


> >
> >
> > > >
> > > > > 2. Sysfs mechanism to add and remove 

Re: Elvis upstreaming plan

2013-11-27 Thread Abel Gordon


"Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:

> On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >
> >
> > Anthony Liguori  wrote on 26/11/2013 08:05:00
PM:
> >
> > >
> > > Razya Ladelsky  writes:
> > >
> > > > Hi all,
> > > >
> > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > ELVIS slides:
> > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > >
> > > >
> > > > According to the discussions that took place at the forum,
upstreaming
> > > > some of the Elvis approaches seems to be a good idea, which we
would
> > like
> > > > to pursue.
> > > >
> > > > Our plan for the first patches is the following:
> > > >
> > > > 1.Shared vhost thread between mutiple devices
> > > > This patch creates a worker thread and worker queue shared across
> > multiple
> > > > virtio devices
> > > > We would like to modify the patch posted in
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > to limit a vhost thread to serve multiple devices only if they
belong
> > to
> > > > the same VM as Paolo suggested to avoid isolation or cgroups
concerns.
> > > >
> > > > Another modification is related to the creation and removal of
vhost
> > > > threads, which will be discussed next.
> > >
> > > I think this is an exceptionally bad idea.
> > >
> > > We shouldn't throw away isolation without exhausting every other
> > > possibility.
> >
> > Seems you have missed the important details here.
> > Anthony, we are aware you are concerned about isolation
> > and you believe we should not share a single vhost thread across
> > multiple VMs.  That's why Razya proposed to change the patch
> > so we will serve multiple virtio devices using a single vhost thread
> > "only if the devices belong to the same VM". This series of patches
> > will not allow two different VMs to share the same vhost thread.
> > So, I don't see why this will be throwing away isolation and why
> > this could be a "exceptionally bad idea".
> >
> > By the way, I remember that during the KVM forum a similar
> > approach of having a single data plane thread for many devices
> > was discussed
> > > We've seen very positive results from adding threads.  We should also
> > > look at scheduling.
> >
> > ...and we have also seen exceptionally negative results from
> > adding threads, both for vhost and data-plane. If you have lot of idle
> > time/cores
> > then it makes sense to run multiple threads. But IMHO in many scenarios
you
> > don't have lot of idle time/cores.. and if you have them you would
probably
> > prefer to run more VMs/VCPUshosting a single SMP VM when you have
> > enough physical cores to run all the VCPU threads and the I/O threads
is
> > not a
> > realistic scenario.
> >
> > That's why we are proposing to implement a mechanism that will enable
> > the management stack to configure 1 thread per I/O device (as it is
today)
> > or 1 thread for many I/O devices (belonging to the same VM).
> >
> > > Once you are scheduling multiple guests in a single vhost device, you
> > > now create a whole new class of DoS attacks in the best case
scenario.
> >
> > Again, we are NOT proposing to schedule multiple guests in a single
> > vhost thread. We are proposing to schedule multiple devices belonging
> > to the same guest in a single (or multiple) vhost thread/s.
> >
>
> I guess a question then becomes why have multiple devices?

I assume that there are guests that have multiple vhost devices
(net or scsi/tcm). We can also extend the approach to consider
multiqueue devices, so we can create 1 vhost thread shared for all the
queues,
1 vhost thread for each queue or a few threads for multiple queues. We
could also share a thread across multiple queues even if they do not belong
to the same device.

Remember the experiments Shirley Ma did with the split
tx/rx ? If we have a control interface we could support both
approaches: different threads or a single thread.

>
>
> > >
> > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > This patch allows us to add and remove vhost threads dynamically.
> > > >
> > > > A simpler way to control the creation of vhost threads is
statically
> > > > determining the maximum number of virtio devices per worker via a
> > kernel
> > > > module parameter (which is the way the previously mentioned patch
is
> > > > currently implemented)
> > > >
> > > > I'd like to ask for advice here about the more preferable way to
go:
> > > > Although having the sysfs mechanism provides more flexibility, it
may
> > be a
> > > > good idea to start with a simple static parameter, and have the
first
> > > > patches as simple as possible. What do you think?
> > > >
> > > > 3.Add virtqueue polling mode to vhost
> > > > Have the vhost thread poll the virtqueues with high I/O rate

Re: Elvis upstreaming plan

2013-11-26 Thread Joel Nider


Gleb Natapov  wrote on 27/11/2013 09:35:01 AM:

> From: Gleb Natapov 
> To: Jason Wang ,
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org,
> anth...@codemonkey.ws, "Michael S. Tsirkin" ,
> pbonz...@redhat.com, as...@redhat.com, digitale...@google.com,
> abel.gor...@gmail.com, Abel Gordon/Haifa/IBM@IBMIL, Eran Raichstein/
> Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, b...@redhat.com
> Date: 27/11/2013 11:35 AM
> Subject: Re: Elvis upstreaming plan
>
> On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with
a
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> Definitely. kvm_stats has moved to ftrace long time ago.
>
> --
>  Gleb.
>

Ok - we will look at this newer mechanism.

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab






 Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image moved 
to file: 
 E-mail: jo...@il.ibm.com  
pic56195.gif)IBM 





<>

Re: Elvis upstreaming plan

2013-11-26 Thread Joel Nider
Hi,

Razya is out for a few days, so I will try to answer the questions as well
as I can:

"Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:

> From: "Michael S. Tsirkin" 
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: Anthony Liguori , abel.gor...@gmail.com,
> as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
> IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
> IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
> Haifa/IBM@IBMIL
> Date: 27/11/2013 01:08 AM
> Subject: Re: Elvis upstreaming plan
>
> On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >
> >
> > Anthony Liguori  wrote on 26/11/2013 08:05:00
PM:
> >
> > >
> > > Razya Ladelsky  writes:
> > >

> >
> > That's why we are proposing to implement a mechanism that will enable
> > the management stack to configure 1 thread per I/O device (as it is
today)
> > or 1 thread for many I/O devices (belonging to the same VM).
> >
> > > Once you are scheduling multiple guests in a single vhost device, you
> > > now create a whole new class of DoS attacks in the best case
scenario.
> >
> > Again, we are NOT proposing to schedule multiple guests in a single
> > vhost thread. We are proposing to schedule multiple devices belonging
> > to the same guest in a single (or multiple) vhost thread/s.
> >
>
> I guess a question then becomes why have multiple devices?

If you mean "why serve multiple devices from a single thread" the answer is
that we cannot rely on the Linux scheduler which has no knowledge of I/O
queues to do a decent job of scheduling I/O.  The idea is to take over the
I/O scheduling responsibilities from the kernel's thread scheduler with a
more efficient I/O scheduler inside each vhost thread.  So by combining all
of the I/O devices from the same guest (disks, network cards, etc) in a
single I/O thread, it allows us to provide better scheduling by giving us
more knowledge of the nature of the work.  So now instead of relying on the
linux scheduler to perform context switches between multiple vhost threads,
we have a single thread context in which we can do the I/O scheduling more
efficiently.  We can closely monitor the performance needs of each queue of
each device inside the vhost thread which gives us much more information
than relying on the kernel's thread scheduler.
This does not expose any additional opportunities for attacks (DoS or
other) than are already available since all of the I/O traffic belongs to a
single guest.
You can make the argument that with low I/O loads this mechanism may not
make much difference.  However when you try to maximize the utilization of
your hardware (such as in a commercial scenario) this technique can gain
you a large benefit.

Regards,

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab






 Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded image moved 
to file: 
 E-mail: jo...@il.ibm.com  
pic31578.gif)IBM 








> > > > Hi all,
> > > >
> > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > ELVIS slides:
> > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > >
> > > >
> > > > According to the discussions that took place at the forum,
upstreaming
> > > > some of the Elvis approaches seems to be a good idea, which we
would
> > like
> > > > to pursue.
> > > >
> > > > Our plan for the first patches is the following:
> > > >
> > > > 1.Shared vhost thread between mutiple devices
> > > > This patch creates a worker thread and worker queue shared across
> > multiple
> > > > virtio devices
> > > > We would like to modify the patch posted in
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > to limit a vhost thread to serve multiple devices only if they
belong
> > to
> > > > the same VM as Paolo s

Re: Elvis upstreaming plan

2013-11-26 Thread Gleb Natapov
On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different performance 
> > metrics of vhost and our polling and I/O scheduling mechanisms. The 
> > statistics are exposed using debugfs and can be easily displayed with a 
> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
> 
> How about using trace points instead? Besides statistics, it can also
> help more in debugging.
Definitely. kvm_stats has moved to ftrace long time ago.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Jason Wang
On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>  
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)

Any chance we can re-use the cwmq instead of inventing another
mechanism? Looks like there're lots of function duplication here. Bandan
has an RFC to do this.
>
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
>
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Maybe we can make poll_stop_idle adaptive which may help the light load
case. Consider guest is often slow than vhost, if we just have one or
two vms, polling too much may waste cpu in this case.
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0

How about using trace points instead? Besides statistics, it can also
help more in debugging.
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Bandan Das
Razya Ladelsky  writes:

> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>  
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)

Does the sysfs interface aim to let the _user_ control the maximum number of 
devices per vhost thread or/and let the user create and  destroy 
worker threads at will ?

Setting the limit on the number of devices makes sense but I am not sure
if there is any reason to actually expose an interface to create or destroy 
workers. Also, it might be worthwhile to think if it's better to just let 
the worker thread stay around (hoping it might be used again in 
the future) rather then destroying it..

> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?

I am actually inclined more towards a static limit. I think that in a 
typical setup, the user will set this for his/her environment just once 
at load time and forget about it.

Bandan

> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Michael S. Tsirkin
On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> 
> 
> Anthony Liguori  wrote on 26/11/2013 08:05:00 PM:
> 
> >
> > Razya Ladelsky  writes:
> >
> > > Hi all,
> > >
> > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > ELVIS slides:
> https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > >
> > >
> > > According to the discussions that took place at the forum, upstreaming
> > > some of the Elvis approaches seems to be a good idea, which we would
> like
> > > to pursue.
> > >
> > > Our plan for the first patches is the following:
> > >
> > > 1.Shared vhost thread between mutiple devices
> > > This patch creates a worker thread and worker queue shared across
> multiple
> > > virtio devices
> > > We would like to modify the patch posted in
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > to limit a vhost thread to serve multiple devices only if they belong
> to
> > > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > >
> > > Another modification is related to the creation and removal of vhost
> > > threads, which will be discussed next.
> >
> > I think this is an exceptionally bad idea.
> >
> > We shouldn't throw away isolation without exhausting every other
> > possibility.
> 
> Seems you have missed the important details here.
> Anthony, we are aware you are concerned about isolation
> and you believe we should not share a single vhost thread across
> multiple VMs.  That's why Razya proposed to change the patch
> so we will serve multiple virtio devices using a single vhost thread
> "only if the devices belong to the same VM". This series of patches
> will not allow two different VMs to share the same vhost thread.
> So, I don't see why this will be throwing away isolation and why
> this could be a "exceptionally bad idea".
> 
> By the way, I remember that during the KVM forum a similar
> approach of having a single data plane thread for many devices
> was discussed
> > We've seen very positive results from adding threads.  We should also
> > look at scheduling.
> 
> ...and we have also seen exceptionally negative results from
> adding threads, both for vhost and data-plane. If you have lot of idle
> time/cores
> then it makes sense to run multiple threads. But IMHO in many scenarios you
> don't have lot of idle time/cores.. and if you have them you would probably
> prefer to run more VMs/VCPUshosting a single SMP VM when you have
> enough physical cores to run all the VCPU threads and the I/O threads is
> not a
> realistic scenario.
> 
> That's why we are proposing to implement a mechanism that will enable
> the management stack to configure 1 thread per I/O device (as it is today)
> or 1 thread for many I/O devices (belonging to the same VM).
> 
> > Once you are scheduling multiple guests in a single vhost device, you
> > now create a whole new class of DoS attacks in the best case scenario.
> 
> Again, we are NOT proposing to schedule multiple guests in a single
> vhost thread. We are proposing to schedule multiple devices belonging
> to the same guest in a single (or multiple) vhost thread/s.
> 

I guess a question then becomes why have multiple devices?


> >
> > > 2. Sysfs mechanism to add and remove vhost threads
> > > This patch allows us to add and remove vhost threads dynamically.
> > >
> > > A simpler way to control the creation of vhost threads is statically
> > > determining the maximum number of virtio devices per worker via a
> kernel
> > > module parameter (which is the way the previously mentioned patch is
> > > currently implemented)
> > >
> > > I'd like to ask for advice here about the more preferable way to go:
> > > Although having the sysfs mechanism provides more flexibility, it may
> be a
> > > good idea to start with a simple static parameter, and have the first
> > > patches as simple as possible. What do you think?
> > >
> > > 3.Add virtqueue polling mode to vhost
> > > Have the vhost thread poll the virtqueues with high I/O rate for new
> > > buffers , and avoid asking the guest to kick us.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> >
> > Ack on this.
> 
> :)
> 
> Regards,
> Abel.
> 
> >
> > Regards,
> >
> > Anthony Liguori
> >
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with a
> 
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > >
> > >
> > > 5. Add heuristics to improve I/O scheduling
> > > This patch enhances

Re: Elvis upstreaming plan

2013-11-26 Thread Abel Gordon


Anthony Liguori  wrote on 26/11/2013 08:05:00 PM:

>
> Razya Ladelsky  writes:
>
> > Hi all,
> >
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> >
> >
> > According to the discussions that took place at the forum, upstreaming
> > some of the Elvis approaches seems to be a good idea, which we would
like
> > to pursue.
> >
> > Our plan for the first patches is the following:
> >
> > 1.Shared vhost thread between mutiple devices
> > This patch creates a worker thread and worker queue shared across
multiple
> > virtio devices
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > to limit a vhost thread to serve multiple devices only if they belong
to
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> >
> > Another modification is related to the creation and removal of vhost
> > threads, which will be discussed next.
>
> I think this is an exceptionally bad idea.
>
> We shouldn't throw away isolation without exhausting every other
> possibility.

Seems you have missed the important details here.
Anthony, we are aware you are concerned about isolation
and you believe we should not share a single vhost thread across
multiple VMs.  That's why Razya proposed to change the patch
so we will serve multiple virtio devices using a single vhost thread
"only if the devices belong to the same VM". This series of patches
will not allow two different VMs to share the same vhost thread.
So, I don't see why this will be throwing away isolation and why
this could be a "exceptionally bad idea".

By the way, I remember that during the KVM forum a similar
approach of having a single data plane thread for many devices
was discussed

> We've seen very positive results from adding threads.  We should also
> look at scheduling.

...and we have also seen exceptionally negative results from
adding threads, both for vhost and data-plane. If you have lot of idle
time/cores
then it makes sense to run multiple threads. But IMHO in many scenarios you
don't have lot of idle time/cores.. and if you have them you would probably
prefer to run more VMs/VCPUshosting a single SMP VM when you have
enough physical cores to run all the VCPU threads and the I/O threads is
not a
realistic scenario.

That's why we are proposing to implement a mechanism that will enable
the management stack to configure 1 thread per I/O device (as it is today)
or 1 thread for many I/O devices (belonging to the same VM).

> Once you are scheduling multiple guests in a single vhost device, you
> now create a whole new class of DoS attacks in the best case scenario.

Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices belonging
to the same guest in a single (or multiple) vhost thread/s.

>
> > 2. Sysfs mechanism to add and remove vhost threads
> > This patch allows us to add and remove vhost threads dynamically.
> >
> > A simpler way to control the creation of vhost threads is statically
> > determining the maximum number of virtio devices per worker via a
kernel
> > module parameter (which is the way the previously mentioned patch is
> > currently implemented)
> >
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may
be a
> > good idea to start with a simple static parameter, and have the first
> > patches as simple as possible. What do you think?
> >
> > 3.Add virtqueue polling mode to vhost
> > Have the vhost thread poll the virtqueues with high I/O rate for new
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> Ack on this.

:)

Regards,
Abel.

>
> Regards,
>
> Anthony Liguori
>
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different
performance
> > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > statistics are exposed using debugfs and can be easily displayed with a

> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> >
> > 5. Add heuristics to improve I/O scheduling
> > This patch enhances the round-robin mechanism with a set of heuristics
to
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> >
> > This patch improves the handling of the requests by the vhost thread,
but
> > could perhaps be delayed to a
> > later time , and not submitted as

Re: Elvis upstreaming plan

2013-11-26 Thread Anthony Liguori
Razya Ladelsky  writes:

> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>  
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.

I think this is an exceptionally bad idea.

We shouldn't throw away isolation without exhausting every other
possibility.

We've seen very positive results from adding threads.  We should also
look at scheduling.

Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case scenario.

> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)
>
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
>
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Ack on this.

Regards,

Anthony Liguori

> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-26 Thread Stefan Hajnoczi
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

This patch should probably do something portable instead of relying on
x86-only rdtscll().

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-25 Thread Razya Ladelsky
"Michael S. Tsirkin"  wrote on 24/11/2013 12:26:15 PM:

> From: "Michael S. Tsirkin" 
> To: Razya Ladelsky/Haifa/IBM@IBMIL, 
> Cc: kvm@vger.kernel.org, anth...@codemonkey.ws, g...@redhat.com, 
> pbonz...@redhat.com, as...@redhat.com, jasow...@redhat.com, 
> digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/
> IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL
> Date: 24/11/2013 12:22 PM
> Subject: Re: Elvis upstreaming plan
> 
> On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> > Hi all,
> > 
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> > developed Elvis, presented by Abel Gordon at the last KVM forum: 
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> > ELVIS slides: 
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
> > 
> > 
> > According to the discussions that took place at the forum, upstreaming 

> > some of the Elvis approaches seems to be a good idea, which we would 
like 
> > to pursue.
> > 
> > Our plan for the first patches is the following: 
> > 
> > 1.Shared vhost thread between mutiple devices 
> > This patch creates a worker thread and worker queue shared across 
multiple 
> > virtio devices 
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> > to limit a vhost thread to serve multiple devices only if they belong 
to 
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > 
> > Another modification is related to the creation and removal of vhost 
> > threads, which will be discussed next.
> >
> > 2. Sysfs mechanism to add and remove vhost threads 
> > This patch allows us to add and remove vhost threads dynamically.
> > 
> > A simpler way to control the creation of vhost threads is statically 
> > determining the maximum number of virtio devices per worker via a 
kernel 
> > module parameter (which is the way the previously mentioned patch is 
> > currently implemented)
> > 
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may 
be a 
> > good idea to start with a simple static parameter, and have the first 
> > patches as simple as possible. What do you think?
> > 
> > 3.Add virtqueue polling mode to vhost 
> > Have the vhost thread poll the virtqueues with high I/O rate for new 
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > 
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different 
performance 
> > metrics of vhost and our polling and I/O scheduling mechanisms. The 
> > statistics are exposed using debugfs and can be easily displayed with 
a 
> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > 
> > 
> > 5. Add heuristics to improve I/O scheduling 
> > This patch enhances the round-robin mechanism with a set of heuristics 
to 
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > 
> > This patch improves the handling of the requests by the vhost thread, 
but 
> > could perhaps be delayed to a 
> > later time , and not submitted as one of the first Elvis patches.
> > I'd love to hear some comments about whether this patch needs to be 
part 
> > of the first submission.
> > 
> > Any other feedback on this plan will be appreciated,
> > Thank you,
> > Razya
> 
> 
> How about we start with the stats patch?
> This will make it easier to evaluate the other patches.
> 

Hi Michael,
Thank you for your quick reply.
Our plan was to send all these patches that contain the Elvis code.
We can start with the stats patch, however, many of the statistics there 
are related to the features that the other patches provide...
B.T.W. If you a chance to look at the rest of the patches,
I'd really appreciate your comments,
Thank you very much,
Razya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-24 Thread Michael S. Tsirkin
On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> Hi all,
> 
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
> 
> 
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
> 
> Our plan for the first patches is the following: 
> 
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>  
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> 
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
> 
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)
> 
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
> 
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0
> 
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
> 
> 
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> 
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
> 
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya


How about we start with the stats patch?
This will make it easier to evaluate the other patches.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html