subject:"\[net\-next RFC PATCH 5\/5\] virtio\-net\: flow director support"

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-08 Thread Sridhar Samudrala


On 12/7/2011 3:02 AM, Jason Wang wrote:

On 12/06/2011 11:42 PM, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  
wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang
wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be 
the same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost 
or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle 
packets to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.


Even if we have per-cpu workthread, only one socket is used to queue 
the packet then, so a multiple queue(sockets) tap/macvtap is still 
needed.
I think so.  We need per-cpu tap/macvtap sockets along with per-cpu 
vhost threads.

This will parallelize all the way from physical NIC to vhost.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 12:10 PM, Jason Wang  wrote:
> On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
> [...]
>
>>> >  Consider the complexity of the host nic each with their own steering
>>> >  features,  this series make the first step with minimal effort to try
>>> > to let
>>> >  guest driver and host tap/macvtap co-operate like what physical nic
>>> > does.
>>> >  There may be other method, but performance numbers is also needed to
>>> > give
>>> >  the answer.
>>
>> I agree that performance results for this need to be shown.
>>
>> My original point is really that it's not a good idea to take
>> individual steps without a good big picture because this will change
>> the virtio-net device specification.  If this turns out to be a dead
>> end then hosts will need to continue to support the interface forever
>> (legacy guests could still try to use it).  So please first explain
>> what the full stack picture is going to look like and how you think it
>> will lead to better performance.  You don't need to have all the code
>> or evidence, but just enough explanation so we see where this is all
>> going.
>
> I think I mention too little in the cover message.
>
> There's no much changes with Krishna's series except the method that
> choosing a rx virtqueue. Since original series use different hash methods in
> host (rxhash) and guest (txhash), a different virtqueue were chose for a
> flow which could lead packets of a flow to be handled by different vhost
> thread and vcpu. This may damage the performance.
>
> This series tries to let one vhost thread to process the packets of a flow
> and also let the packets to be sent directly to a vcpu local to the thread
> process the data. This is done by letting guest tell the desired queue form
> which it want to receive the pakcet of a dedicated flow.
>
> So passing the hash from host to guest is needed to get the same hash in the
> two sides. Then a guest programmable hash to queue table were introduced and
> guest co-operate with the host through accelerate RFS in guest.

Thanks for the context, that helps me understand it a little better.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
[...]

>  Consider the complexity of the host nic each with their own steering
>  features,  this series make the first step with minimal effort to try to let
>  guest driver and host tap/macvtap co-operate like what physical nic does.
>  There may be other method, but performance numbers is also needed to give
>  the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

I think I mention too little in the cover message.

There's no much changes with Krishna's series except the method that 
choosing a rx virtqueue. Since original series use different hash 
methods in host (rxhash) and guest (txhash), a different virtqueue were 
chose for a flow which could lead packets of a flow to be handled by 
different vhost thread and vcpu. This may damage the performance.

This series tries to let one vhost thread to process the packets of a 
flow and also let the packets to be sent directly to a vcpu local to the 
thread process the data. This is done by letting guest tell the desired 
queue form which it want to receive the pakcet of a dedicated flow.

So passing the hash from host to guest is needed to get the same hash in 
the two sides. Then a guest programmable hash to queue table were 
introduced and guest co-operate with the host through accelerate RFS in 
guest.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang


On 12/07/2011 07:10 AM, Sridhar Samudrala wrote:

On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang   
wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason 
Wang wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?
Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be 
the same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of 
vhost or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.



Yes. Also the current model of  a vhost thread per VM's interface
doesn't help with packet steering
all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle
packets to/from physical NIC's
TX/RX queues.
Currently we have a single vhost thread for a VM's i/f
that handles all the packets from
various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.

At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing
perf. improvements as long as TX and RX threads are on the same 
cpu-socket.


I emulated this through my multi-queue series in the past, looks like it 
damages the performance of single stream especially guest tx.


On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

But this may not be scalable long term when we want to support a large 
number of VMs each

having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang


On 12/06/2011 11:42 PM, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang
wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be the 
same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost 
or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle 
packets to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.


Even if we have per-cpu workthread, only one socket is used to queue the 
packet then, so a multiple queue(sockets) tap/macvtap is still needed.


Thanks
Sridhar



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 3:03 AM, Jason Wang  wrote:
> On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:
>>>
>>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang
  wrote:
>
> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>
>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
>>  wrote:

 The vcpus are just threads and may not be bound to physical CPUs, so
 what is the big picture here?  Is the guest even in the position to
 set the best queue mappings today?
>>>
>>>
>>> Not sure it could publish the best mapping but the idea is to make sure
>>> the
>>> packets of a flow were handled by the same guest vcpu and may be the same
>>> vhost thread in order to eliminate the packet reordering and lock
>>> contention. But this assumption does not take the bouncing of vhost or
>>> vcpu
>>> threads which would also affect the result.
>>
>> Okay, this is why I'd like to know what the big picture here is.  What
>> solution are you proposing?  How are we going to have everything from
>> guest application, guest kernel, host threads, and host NIC driver
>> play along so we get the right steering up the entire stack.  I think
>> there needs to be an answer to that before changing virtio-net to add
>> any steering mechanism.
>
>
> Consider the complexity of the host nic each with their own steering
> features,  this series make the first step with minimal effort to try to let
> guest driver and host tap/macvtap co-operate like what physical nic does.
> There may be other method, but performance numbers is also needed to give
> the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Jason Wang


On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Consider the complexity of the host nic each with their own steering 
features,  this series make the first step with minimal effort to try to 
let guest driver and host tap/macvtap co-operate like what physical nic 
does. There may be other method, but performance numbers is also needed 
to give the answer.


Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Sridhar Samudrala


On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang   wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?

Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.



Yes. Also the current model of  a vhost thread per VM's interface
doesn't help with packet steering
all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle
packets to/from physical NIC's
TX/RX queues.
Currently we have a single vhost thread for a VM's i/f
that handles all the packets from
various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.

At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing

perf. improvements as long as TX and RX threads are on the same cpu-socket.


On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

But this may not be scalable long term when we want to support a large 
number of VMs each

having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Ben Hutchings

On Tue, 2011-12-06 at 15:25 +0800, Jason Wang wrote:
> On 12/06/2011 04:42 AM, Ben Hutchings wrote:
[...]
> > This is not a proper implementation of ndo_rx_flow_steer.  If you steer
> > a flow by changing the RSS table this can easily cause packet reordering
> > in other flows.  The filtering should be more precise, ideally matching
> > exactly a single flow by e.g. VID and IP 5-tuple.
> >
> > I think you need to add a second hash table which records exactly which
> > flow is supposed to be steered.  Also, you must call
> > rps_may_expire_flow() to check whether an entry in this table may be
> > replaced; otherwise you can cause packet reordering in the flow that was
> > previously being steered.
> >
> > Finally, this function must return the table index it assigned, so that
> > rps_may_expire_flow() works.
> 
> Thanks for the explanation, how about document this briefly in scaling.txt?
[...]

I believe scaling.txt is intended for users/administrators, not
developers.

The documentation for implementers of accelerated RFS is in the comment
for struct net_device_ops and the commit message adding it.  But I
really should improve that comment.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Michael S. Tsirkin

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:
> On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
> >On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:
> >>On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
> >>>On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote:
> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
> >On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
> >  wrote:
> >>>The vcpus are just threads and may not be bound to physical CPUs, so
> >>>what is the big picture here?  Is the guest even in the position to
> >>>set the best queue mappings today?
> >>
> >>Not sure it could publish the best mapping but the idea is to make sure the
> >>packets of a flow were handled by the same guest vcpu and may be the same
> >>vhost thread in order to eliminate the packet reordering and lock
> >>contention. But this assumption does not take the bouncing of vhost or vcpu
> >>threads which would also affect the result.
> >Okay, this is why I'd like to know what the big picture here is.  What
> >solution are you proposing?  How are we going to have everything from
> >guest application, guest kernel, host threads, and host NIC driver
> >play along so we get the right steering up the entire stack.  I think
> >there needs to be an answer to that before changing virtio-net to add
> >any steering mechanism.
> >
> >
> Yes. Also the current model of  a vhost thread per VM's interface
> doesn't help with packet steering
> all the way from the guest to the host physical NIC.
> 
> I think we need to have vhost thread(s) per-CPU that can handle
> packets to/from physical NIC's
> TX/RX queues.
> Currently we have a single vhost thread for a VM's i/f
> that handles all the packets from
> various flows coming from a multi-queue physical NIC.
> 
> Thanks
> Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
   this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV

On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Sridhar Samudrala


On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle packets 
to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Stefan Hajnoczi

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:
> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang  wrote:
>>>
>>> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:
>> The vcpus are just threads and may not be bound to physical CPUs, so
>> what is the big picture here?  Is the guest even in the position to
>> set the best queue mappings today?
>
>
> Not sure it could publish the best mapping but the idea is to make sure the
> packets of a flow were handled by the same guest vcpu and may be the same
> vhost thread in order to eliminate the packet reordering and lock
> contention. But this assumption does not take the bouncing of vhost or vcpu
> threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Jason Wang


On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang  wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangwrote:

+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi->vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev->config->set(vdev,
+ offsetof(struct virtio_net_config_fd,
addr),
+&pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.


Using control virtqueue is more suitable but there's are also some problems:

One problem is the interface,  if we use control virtqueue, we need a
interface between the backend and tap/macvtap to change the flow mapping.
But qemu and vhost_net only know about the file descriptor, more
informations or interfaces need to be exposed in order to let ethtool or
ioctl work.

QEMU could provide map a shared page with tap/macvtap.  The difference
would be that the guest<->host interface is still virtio and QEMU
pokes values into the shared page on behalf of the guest.


This makes sense.


Another problem is the delay introduced by ctrl vq, as the ctrl vq would be
used in the critical path in guest and it use busy wait to get the response,
the delay is not neglectable.

Then you need to find a better way of doing this.  Can the host
automatically associate the flow from the tx virtqueue packets are
transmitted on?  Does it make sense to add a virtio_net_hdr field that
updates the queue mapping?


It can but it can not properly handling the the packet re-ordering 
caused by the moving of guest applications among guest cpus. One more 
problem for virtio_net_hdr is we need to build a empty packet when there 
no other packet to send.


One solution is to introduce unblock cmd for ctrl vq.

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure 
the packets of a flow were handled by the same guest vcpu and may be the 
same vhost thread in order to eliminate the packet reordering and lock 
contention. But this assumption does not take the bouncing of vhost or 
vcpu threads which would also affect the result.


Anyway, the mapping from guest was an important reference.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Stefan Hajnoczi

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang  wrote:
> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>
>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang  wrote:
>>>
>>> +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
>>> +{
>>> +       struct virtnet_info *vi = netdev_priv(dev);
>>> +       struct virtio_device *vdev = vi->vdev;
>>> +
>>> +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
>>> +               vdev->config->set(vdev,
>>> +                                 offsetof(struct virtio_net_config_fd,
>>> addr),
>>> +&pfn, sizeof(u32));
>>
>> Please use the virtio model (i.e. virtqueues) instead of shared
>> memory.  Mapping a page breaks the virtio abstraction.
>
>
> Using control virtqueue is more suitable but there's are also some problems:
>
> One problem is the interface,  if we use control virtqueue, we need a
> interface between the backend and tap/macvtap to change the flow mapping.
> But qemu and vhost_net only know about the file descriptor, more
> informations or interfaces need to be exposed in order to let ethtool or
> ioctl work.

QEMU could provide map a shared page with tap/macvtap.  The difference
would be that the guest<->host interface is still virtio and QEMU
pokes values into the shared page on behalf of the guest.

> Another problem is the delay introduced by ctrl vq, as the ctrl vq would be
> used in the critical path in guest and it use busy wait to get the response,
> the delay is not neglectable.

Then you need to find a better way of doing this.  Can the host
automatically associate the flow from the tx virtqueue packets are
transmitted on?  Does it make sense to add a virtio_net_hdr field that
updates the queue mapping?

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang


On 12/06/2011 04:42 AM, Ben Hutchings wrote:

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

[...]

+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
+   return -EPROTONOSUPPORT;

Why only IPv4?


Oops, IPv6 should work also.

+   table = kmap_atomic(vi->fd_page);
+   table[skb->rxhash&  TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.


Thanks for the explanation, how about document this briefly in scaling.txt?

+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq>= dev->real_num_tx_queues))
+   txq -= dev->real_num_tx_queues;
+   return txq;
+}

[...]

Don't do this, let XPS handle it.

Ben.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang


On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang  wrote:

+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi->vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev->config->set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+&pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.


Using control virtqueue is more suitable but there's are also some problems:

One problem is the interface,  if we use control virtqueue, we need a 
interface between the backend and tap/macvtap to change the flow 
mapping. But qemu and vhost_net only know about the file descriptor, 
more informations or interfaces need to be exposed in order to let 
ethtool or ioctl work.


Another problem is the delay introduced by ctrl vq, as the ctrl vq would 
be used in the critical path in guest and it use busy wait to get the 
response, the delay is not neglectable.



Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Ben Hutchings

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:
> In order to let the packets of a flow to be passed to the desired
> guest cpu, we can co-operate with devices through programming the flow
> director which was just a hash to queue table.
> 
> This kinds of co-operation is done through the accelerate RFS support,
> a device specific flow sterring method virtnet_fd() is used to modify
> the flow director based on rfs mapping. The desired queue were
> calculated through reverse mapping of the irq affinity table. In order
> to parallelize the ingress path, irq affinity of rx queue were also
> provides by the driver.
> 
> In addition to accelerate RFS, we can also use the guest scheduler to
> balance the load of TX and reduce the lock contention on egress path,
> so the processor_id() were used to tx queue selection.
[...]
> +#ifdef CONFIG_RFS_ACCEL
> +
> +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
> +u16 rxq_index, u32 flow_id)
> +{
> + struct virtnet_info *vi = netdev_priv(net_dev);
> + u16 *table = NULL;
> +
> + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
> + return -EPROTONOSUPPORT;

Why only IPv4?

> + table = kmap_atomic(vi->fd_page);
> + table[skb->rxhash & TAP_HASH_MASK] = rxq_index;
> + kunmap_atomic(table);
> +
> + return 0;
> +}
> +#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.

> +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> +{
> + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> +smp_processor_id();
> +
> + /* As we make use of the accelerate rfs which let the scheduler to
> +  * balance the load, it make sense to choose the tx queue also based on
> +  * theprocessor id?
> +  */
> + while (unlikely(txq >= dev->real_num_tx_queues))
> + txq -= dev->real_num_tx_queues;
> + return txq;
> +}
[...]

Don't do this, let XPS handle it.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Stefan Hajnoczi

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang  wrote:
> +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
> +{
> +       struct virtnet_info *vi = netdev_priv(dev);
> +       struct virtio_device *vdev = vi->vdev;
> +
> +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
> +               vdev->config->set(vdev,
> +                                 offsetof(struct virtio_net_config_fd, addr),
> +                                 &pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c   |  165 +++-
 include/linux/virtio_net.h |6 ++
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d871f8..89bb5e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -40,6 +44,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 #define VIRTNET_DRIVER_VERSION "1.0.0"
+#define TAP_HASH_MASK 0xFF
 
 struct virtnet_send_stats {
struct u64_stats_sync syncp;
@@ -89,6 +94,9 @@ struct receive_queue {
 
/* Active rx statistics */
struct virtnet_recv_stats __percpu *stats;
+
+   /* FIXME: per vector instead of per queue ?? */
+   cpumask_var_t affinity_mask;
 };
 
 struct virtnet_info {
@@ -110,6 +118,11 @@ struct virtnet_info {
 
/* Host will pass rxhash to us. */
bool has_rxhash;
+
+   /* A page of flow director */
+   struct page *fd_page;
+
+   cpumask_var_t affinity_mask;
 };
 
 struct skb_vnet_hdr {
@@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
if (vi->has_rxhash)
skb->rxhash = hdr->rhdr.rxhash;
 
+   skb_record_rx_queue(skb, rq->vq->queue_index / 2);
netif_receive_skb(skb);
return;
 
@@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi->vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev->config->set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+ &pfn, sizeof(u32));
+   }
+   return 0;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, 
int new_mtu)
return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash)
+   return -EPROTONOSUPPORT;
+
+   table = kmap_atomic(vi->fd_page);
+   table[skb->rxhash & TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif
+
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq >= dev->real_num_tx_queues))
+   txq -= dev->real_num_tx_queues;
+   return txq;
+}
+
 static const struct net_device_ops virtnet_netdev = {
.ndo_open= virtnet_open,
.ndo_stop= virtnet_close,
@@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+   .ndo_select_queue= virtnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+   .ndo_rx_flow_steer   = virtnet_fd,
+#endif
 };
 
 static void virtnet_update_status(struct virtnet_info *vi)
@@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi)
return ret;
 }
 
+static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
+{
+#ifdef CONFIG_RFS_ACCEL

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

[net-next RFC PATCH 5/5] virtio-net: flow director support

19 matches

Site Navigation

Mail list logo

Footer information