Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-24 Thread Michael S. Tsirkin
On Thu, Apr 20, 2017 at 11:34:57AM -0400, Vlad Yasevich wrote:
> > - For 1.1, do we really want something like vnet header? AFAIK, it was not 
> > used by modern
> > NICs, is this better to pack all meta-data into descriptor itself? This may 
> > need a some
> > changes in tun/macvtap, but looks more PCIE friendly.
> 
> That would really be ideal and I've looked at this.

We already have at least 16 unused bits in the used ring
(head is 16 bit we are using 32 for it).

-- 
MST


Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-23 Thread Jason Wang



On 2017年04月21日 21:08, Vlad Yasevich wrote:

On 04/21/2017 12:05 AM, Jason Wang wrote:

On 2017年04月20日 23:34, Vlad Yasevich wrote:

On 04/17/2017 11:01 PM, Jason Wang wrote:

On 2017年04月16日 00:38, Vladislav Yasevich wrote:

Curreclty virtion net header is fixed size and adding things to it is rather
difficult to do.  This series attempt to add the infrastructure as well as some
extensions that try to resolve some deficiencies we currently have.

First, vnet header only has space for 16 flags.  This may not be enough
in the future.  The extensions will provide space for 32 possbile extension
flags and 32 possible extensions.   These flags will be carried in the
first pseudo extension header, the presense of which will be determined by
the flag in the virtio net header.

The extensions themselves will immidiately follow the extension header itself.
They will be added to the packet in the same order as they appear in the
extension flags.  No padding is placed between the extensions and any
extensions negotiated, but not used need by a given packet will convert to
trailing padding.

Do we need a explicit padding (e.g an extension) which could be controlled by 
each side?

I don't think so.  The size of the vnet header is set based on the extensions 
negotiated.
The one part I am not crazy about is that in the case of packet not using any 
extensions,
the data is still placed after the entire vnet header, which essentially adds a 
lot
of padding.  However, that's really no different then if we simply grew the 
vnet header.

The other thing I've tried before is putting extensions into their own sg 
buffer, but that
made it slower.h

Yes.


For example:
| vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |

Just some rough thoughts:

- Is this better to use TLV instead of bitmap here? One advantage of TLV is 
that the
length is not limited by the length of bitmap.

but the disadvantage is that we add at least 4 bytes per extension of just TL 
data.  That
makes this thing even longer.

Yes, and it looks like the length is still limited by e.g the length of T.

Not only that, but it is also limited by the skb->cb as a whole.  So adding 
putting
extensions into a TLV style means we have less extensions for now, until we get 
rid of
skb->cb usage.


- For 1.1, do we really want something like vnet header? AFAIK, it was not used 
by modern
NICs, is this better to pack all meta-data into descriptor itself? This may 
need a some
changes in tun/macvtap, but looks more PCIE friendly.

That would really be ideal and I've looked at this.  There are small issues of 
exposing
the 'net metadata' of the descriptor to taps so they can be filled in.  The 
alternative
is to use a different control structure for tap->qemu|vhost channel (that can be
implementation specific) and have qemu|vhost populate the 'net metadata' of the 
descriptor.

Yes, this needs some thought. For vhost, things looks a little bit easier, we 
can probably
use msg_control.


We can use msg_control in qemu as well, can't we?


AFAIK, it needs some changes since we don't export socket to userspace.


  It really is a question of who is doing
the work and the number of copies.

I can take a closer look of how it would look if we extend the descriptor with 
type
specific data.  I don't know if other users of virtio would benefit from it?


Not sure, but we can have a common descriptor header followed by device 
specific meta data. This probably need some prototype benchmarking to 
see the benefits first.


Thanks


Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-21 Thread Vlad Yasevich
On 04/21/2017 12:05 AM, Jason Wang wrote:
> 
> 
> On 2017年04月20日 23:34, Vlad Yasevich wrote:
>> On 04/17/2017 11:01 PM, Jason Wang wrote:
>>>
>>> On 2017年04月16日 00:38, Vladislav Yasevich wrote:
 Curreclty virtion net header is fixed size and adding things to it is 
 rather
 difficult to do.  This series attempt to add the infrastructure as well as 
 some
 extensions that try to resolve some deficiencies we currently have.

 First, vnet header only has space for 16 flags.  This may not be enough
 in the future.  The extensions will provide space for 32 possbile extension
 flags and 32 possible extensions.   These flags will be carried in the
 first pseudo extension header, the presense of which will be determined by
 the flag in the virtio net header.

 The extensions themselves will immidiately follow the extension header 
 itself.
 They will be added to the packet in the same order as they appear in the
 extension flags.  No padding is placed between the extensions and any
 extensions negotiated, but not used need by a given packet will convert to
 trailing padding.
>>> Do we need a explicit padding (e.g an extension) which could be controlled 
>>> by each side?
>> I don't think so.  The size of the vnet header is set based on the 
>> extensions negotiated.
>> The one part I am not crazy about is that in the case of packet not using 
>> any extensions,
>> the data is still placed after the entire vnet header, which essentially 
>> adds a lot
>> of padding.  However, that's really no different then if we simply grew the 
>> vnet header.
>>
>> The other thing I've tried before is putting extensions into their own sg 
>> buffer, but that
>> made it slower.h
> 
> Yes.
> 
>>
 For example:
| vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet 
 data |
>>> Just some rough thoughts:
>>>
>>> - Is this better to use TLV instead of bitmap here? One advantage of TLV is 
>>> that the
>>> length is not limited by the length of bitmap.
>> but the disadvantage is that we add at least 4 bytes per extension of just 
>> TL data.  That
>> makes this thing even longer.
> 
> Yes, and it looks like the length is still limited by e.g the length of T.

Not only that, but it is also limited by the skb->cb as a whole.  So adding 
putting
extensions into a TLV style means we have less extensions for now, until we get 
rid of
skb->cb usage.

> 
>>
>>> - For 1.1, do we really want something like vnet header? AFAIK, it was not 
>>> used by modern
>>> NICs, is this better to pack all meta-data into descriptor itself? This may 
>>> need a some
>>> changes in tun/macvtap, but looks more PCIE friendly.
>> That would really be ideal and I've looked at this.  There are small issues 
>> of exposing
>> the 'net metadata' of the descriptor to taps so they can be filled in.  The 
>> alternative
>> is to use a different control structure for tap->qemu|vhost channel (that 
>> can be
>> implementation specific) and have qemu|vhost populate the 'net metadata' of 
>> the descriptor.
> 
> Yes, this needs some thought. For vhost, things looks a little bit easier, we 
> can probably
> use msg_control.
> 

We can use msg_control in qemu as well, can't we?  It really is a question of 
who is doing
the work and the number of copies.

I can take a closer look of how it would look if we extend the descriptor with 
type
specific data.  I don't know if other users of virtio would benefit from it?

-vlad
> Thanks
> 
>> Thanks
>> -vlad
>>
>>> Thanks
>>>
 Extensions proposed in this series are:
- IPv6 fragment id extension
  * Currently, the guest generated fragment id is discarded and the host
generates an IPv6 fragment id if the packet has to be fragmented.  
 The
code attempts to add time based perturbation to id generation to 
 make
it harder to guess the next fragment id to be used.  However, doing 
 this
on the host may result is less perturbation (due to differnet 
 timing)
and might make id guessing easier.  Ideally, the ids generated by 
 the
guest should be used.  One could also argue that we a "violating" 
 the
IPv6 protocol in the if the _strict_ interpretation of the spec.

- VLAN header acceleration
  * Currently virtio doesn't not do vlan header acceleration and instead
uses software tagging.  One of the first things that the host will 
 do is
strip the vlan header out.  When passing the packet the a guest the
vlan header is re-inserted in to the packet.  We can skip all that 
 work
if we can pass the vlan data in accelearted format.  Then the host 
 will
not do any extra work.  However, so far, this yeilded a very small
perf bump (only ~1%).  I am still looking into this.

- UDP tunnel offload
  * 

Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-20 Thread Jason Wang



On 2017年04月20日 23:34, Vlad Yasevich wrote:

On 04/17/2017 11:01 PM, Jason Wang wrote:


On 2017年04月16日 00:38, Vladislav Yasevich wrote:

Curreclty virtion net header is fixed size and adding things to it is rather
difficult to do.  This series attempt to add the infrastructure as well as some
extensions that try to resolve some deficiencies we currently have.

First, vnet header only has space for 16 flags.  This may not be enough
in the future.  The extensions will provide space for 32 possbile extension
flags and 32 possible extensions.   These flags will be carried in the
first pseudo extension header, the presense of which will be determined by
the flag in the virtio net header.

The extensions themselves will immidiately follow the extension header itself.
They will be added to the packet in the same order as they appear in the
extension flags.  No padding is placed between the extensions and any
extensions negotiated, but not used need by a given packet will convert to
trailing padding.

Do we need a explicit padding (e.g an extension) which could be controlled by 
each side?

I don't think so.  The size of the vnet header is set based on the extensions 
negotiated.
The one part I am not crazy about is that in the case of packet not using any 
extensions,
the data is still placed after the entire vnet header, which essentially adds a 
lot
of padding.  However, that's really no different then if we simply grew the 
vnet header.

The other thing I've tried before is putting extensions into their own sg 
buffer, but that
made it slower.h


Yes.




For example:
   | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |

Just some rough thoughts:

- Is this better to use TLV instead of bitmap here? One advantage of TLV is 
that the
length is not limited by the length of bitmap.

but the disadvantage is that we add at least 4 bytes per extension of just TL 
data.  That
makes this thing even longer.


Yes, and it looks like the length is still limited by e.g the length of T.




- For 1.1, do we really want something like vnet header? AFAIK, it was not used 
by modern
NICs, is this better to pack all meta-data into descriptor itself? This may 
need a some
changes in tun/macvtap, but looks more PCIE friendly.

That would really be ideal and I've looked at this.  There are small issues of 
exposing
the 'net metadata' of the descriptor to taps so they can be filled in.  The 
alternative
is to use a different control structure for tap->qemu|vhost channel (that can be
implementation specific) and have qemu|vhost populate the 'net metadata' of the 
descriptor.


Yes, this needs some thought. For vhost, things looks a little bit 
easier, we can probably use msg_control.


Thanks


Thanks
-vlad


Thanks


Extensions proposed in this series are:
   - IPv6 fragment id extension
 * Currently, the guest generated fragment id is discarded and the host
   generates an IPv6 fragment id if the packet has to be fragmented.  The
   code attempts to add time based perturbation to id generation to make
   it harder to guess the next fragment id to be used.  However, doing this
   on the host may result is less perturbation (due to differnet timing)
   and might make id guessing easier.  Ideally, the ids generated by the
   guest should be used.  One could also argue that we a "violating" the
   IPv6 protocol in the if the _strict_ interpretation of the spec.

   - VLAN header acceleration
 * Currently virtio doesn't not do vlan header acceleration and instead
   uses software tagging.  One of the first things that the host will do is
   strip the vlan header out.  When passing the packet the a guest the
   vlan header is re-inserted in to the packet.  We can skip all that work
   if we can pass the vlan data in accelearted format.  Then the host will
   not do any extra work.  However, so far, this yeilded a very small
   perf bump (only ~1%).  I am still looking into this.

   - UDP tunnel offload
 * Similar to vlan acceleration, with this extension we can pass additional
   data to host for support GSO with udp tunnel and possible other
   encapsulations.  This yeilds a significant perfromance improvement
  (still testing remote checksum code).

An addition extension that is unfinished (due to still testing for any
side-effects) is checksum passthrough to support drivers that set
CHECKSUM_COMPLETE.  This would eliminate the need for guests to compute
the software checksum.

This series only takes care of virtio net.  I have addition patches for the
host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback
on the general approach first.

Vladislav Yasevich (6):
virtio-net: Remove the use the padded vnet_header structure
virtio-net: make header length handling uniform
virtio_net: Add basic skeleton for handling vnet header extensions.
virtio-net: Add support for IPv6 fragment id vnet header e

Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-20 Thread Vlad Yasevich
On 04/17/2017 11:01 PM, Jason Wang wrote:
> 
> 
> On 2017年04月16日 00:38, Vladislav Yasevich wrote:
>> Curreclty virtion net header is fixed size and adding things to it is rather
>> difficult to do.  This series attempt to add the infrastructure as well as 
>> some
>> extensions that try to resolve some deficiencies we currently have.
>>
>> First, vnet header only has space for 16 flags.  This may not be enough
>> in the future.  The extensions will provide space for 32 possbile extension
>> flags and 32 possible extensions.   These flags will be carried in the
>> first pseudo extension header, the presense of which will be determined by
>> the flag in the virtio net header.
>>
>> The extensions themselves will immidiately follow the extension header 
>> itself.
>> They will be added to the packet in the same order as they appear in the
>> extension flags.  No padding is placed between the extensions and any
>> extensions negotiated, but not used need by a given packet will convert to
>> trailing padding.
> 
> Do we need a explicit padding (e.g an extension) which could be controlled by 
> each side?

I don't think so.  The size of the vnet header is set based on the extensions 
negotiated.
The one part I am not crazy about is that in the case of packet not using any 
extensions,
the data is still placed after the entire vnet header, which essentially adds a 
lot
of padding.  However, that's really no different then if we simply grew the 
vnet header.

The other thing I've tried before is putting extensions into their own sg 
buffer, but that
made it slower.

> 
>>
>> For example:
>>   | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data 
>> |
> 
> Just some rough thoughts:
> 
> - Is this better to use TLV instead of bitmap here? One advantage of TLV is 
> that the
> length is not limited by the length of bitmap.

but the disadvantage is that we add at least 4 bytes per extension of just TL 
data.  That
makes this thing even longer.

> - For 1.1, do we really want something like vnet header? AFAIK, it was not 
> used by modern
> NICs, is this better to pack all meta-data into descriptor itself? This may 
> need a some
> changes in tun/macvtap, but looks more PCIE friendly.

That would really be ideal and I've looked at this.  There are small issues of 
exposing
the 'net metadata' of the descriptor to taps so they can be filled in.  The 
alternative
is to use a different control structure for tap->qemu|vhost channel (that can be
implementation specific) and have qemu|vhost populate the 'net metadata' of the 
descriptor.

Thanks
-vlad

> 
> Thanks
> 
>>
>> Extensions proposed in this series are:
>>   - IPv6 fragment id extension
>> * Currently, the guest generated fragment id is discarded and the host
>>   generates an IPv6 fragment id if the packet has to be fragmented.  The
>>   code attempts to add time based perturbation to id generation to make
>>   it harder to guess the next fragment id to be used.  However, doing 
>> this
>>   on the host may result is less perturbation (due to differnet timing)
>>   and might make id guessing easier.  Ideally, the ids generated by the
>>   guest should be used.  One could also argue that we a "violating" the
>>   IPv6 protocol in the if the _strict_ interpretation of the spec.
>>
>>   - VLAN header acceleration
>> * Currently virtio doesn't not do vlan header acceleration and instead
>>   uses software tagging.  One of the first things that the host will do 
>> is
>>   strip the vlan header out.  When passing the packet the a guest the
>>   vlan header is re-inserted in to the packet.  We can skip all that work
>>   if we can pass the vlan data in accelearted format.  Then the host will
>>   not do any extra work.  However, so far, this yeilded a very small
>>   perf bump (only ~1%).  I am still looking into this.
>>
>>   - UDP tunnel offload
>> * Similar to vlan acceleration, with this extension we can pass 
>> additional
>>   data to host for support GSO with udp tunnel and possible other
>>   encapsulations.  This yeilds a significant perfromance improvement
>>  (still testing remote checksum code).
>>
>> An addition extension that is unfinished (due to still testing for any
>> side-effects) is checksum passthrough to support drivers that set
>> CHECKSUM_COMPLETE.  This would eliminate the need for guests to compute
>> the software checksum.
>>
>> This series only takes care of virtio net.  I have addition patches for the
>> host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback
>> on the general approach first.
>>
>> Vladislav Yasevich (6):
>>virtio-net: Remove the use the padded vnet_header structure
>>virtio-net: make header length handling uniform
>>virtio_net: Add basic skeleton for handling vnet header extensions.
>>virtio-net: Add support for IPv6 fragment id vnet header extension.
>>virtio-net: Add support for vlan ac

Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-17 Thread Jason Wang



On 2017年04月16日 00:38, Vladislav Yasevich wrote:

Curreclty virtion net header is fixed size and adding things to it is rather
difficult to do.  This series attempt to add the infrastructure as well as some
extensions that try to resolve some deficiencies we currently have.

First, vnet header only has space for 16 flags.  This may not be enough
in the future.  The extensions will provide space for 32 possbile extension
flags and 32 possible extensions.   These flags will be carried in the
first pseudo extension header, the presense of which will be determined by
the flag in the virtio net header.

The extensions themselves will immidiately follow the extension header itself.
They will be added to the packet in the same order as they appear in the
extension flags.  No padding is placed between the extensions and any
extensions negotiated, but not used need by a given packet will convert to
trailing padding.


Do we need a explicit padding (e.g an extension) which could be 
controlled by each side?




For example:
  | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |


Just some rough thoughts:

- Is this better to use TLV instead of bitmap here? One advantage of TLV 
is that the length is not limited by the length of bitmap.
- For 1.1, do we really want something like vnet header? AFAIK, it was 
not used by modern NICs, is this better to pack all meta-data into 
descriptor itself? This may need a some changes in tun/macvtap, but 
looks more PCIE friendly.


Thanks



Extensions proposed in this series are:
  - IPv6 fragment id extension
* Currently, the guest generated fragment id is discarded and the host
  generates an IPv6 fragment id if the packet has to be fragmented.  The
  code attempts to add time based perturbation to id generation to make
  it harder to guess the next fragment id to be used.  However, doing this
  on the host may result is less perturbation (due to differnet timing)
  and might make id guessing easier.  Ideally, the ids generated by the
  guest should be used.  One could also argue that we a "violating" the
  IPv6 protocol in the if the _strict_ interpretation of the spec.

  - VLAN header acceleration
* Currently virtio doesn't not do vlan header acceleration and instead
  uses software tagging.  One of the first things that the host will do is
  strip the vlan header out.  When passing the packet the a guest the
  vlan header is re-inserted in to the packet.  We can skip all that work
  if we can pass the vlan data in accelearted format.  Then the host will
  not do any extra work.  However, so far, this yeilded a very small
  perf bump (only ~1%).  I am still looking into this.

  - UDP tunnel offload
* Similar to vlan acceleration, with this extension we can pass additional
  data to host for support GSO with udp tunnel and possible other
  encapsulations.  This yeilds a significant perfromance improvement
 (still testing remote checksum code).

An addition extension that is unfinished (due to still testing for any
side-effects) is checksum passthrough to support drivers that set
CHECKSUM_COMPLETE.  This would eliminate the need for guests to compute
the software checksum.

This series only takes care of virtio net.  I have addition patches for the
host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback
on the general approach first.

Vladislav Yasevich (6):
   virtio-net: Remove the use the padded vnet_header structure
   virtio-net: make header length handling uniform
   virtio_net: Add basic skeleton for handling vnet header extensions.
   virtio-net: Add support for IPv6 fragment id vnet header extension.
   virtio-net: Add support for vlan acceleration vnet header extension.
   virtio-net: Add support for UDP tunnel offload and extension.

  drivers/net/virtio_net.c| 132 +---
  include/linux/skbuff.h  |   5 ++
  include/linux/virtio_net.h  |  91 ++-
  include/uapi/linux/virtio_net.h |  38 
  4 files changed, 242 insertions(+), 24 deletions(-)





[PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-15 Thread Vladislav Yasevich
Curreclty virtion net header is fixed size and adding things to it is rather
difficult to do.  This series attempt to add the infrastructure as well as some
extensions that try to resolve some deficiencies we currently have.

First, vnet header only has space for 16 flags.  This may not be enough
in the future.  The extensions will provide space for 32 possbile extension
flags and 32 possible extensions.   These flags will be carried in the
first pseudo extension header, the presense of which will be determined by
the flag in the virtio net header.

The extensions themselves will immidiately follow the extension header itself.
They will be added to the packet in the same order as they appear in the
extension flags.  No padding is placed between the extensions and any
extensions negotiated, but not used need by a given packet will convert to
trailing padding.

For example:
 | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |

Extensions proposed in this series are:
 - IPv6 fragment id extension
   * Currently, the guest generated fragment id is discarded and the host
 generates an IPv6 fragment id if the packet has to be fragmented.  The
 code attempts to add time based perturbation to id generation to make
 it harder to guess the next fragment id to be used.  However, doing this
 on the host may result is less perturbation (due to differnet timing)
 and might make id guessing easier.  Ideally, the ids generated by the
 guest should be used.  One could also argue that we a "violating" the
 IPv6 protocol in the if the _strict_ interpretation of the spec.

 - VLAN header acceleration
   * Currently virtio doesn't not do vlan header acceleration and instead
 uses software tagging.  One of the first things that the host will do is
 strip the vlan header out.  When passing the packet the a guest the
 vlan header is re-inserted in to the packet.  We can skip all that work
 if we can pass the vlan data in accelearted format.  Then the host will
 not do any extra work.  However, so far, this yeilded a very small
 perf bump (only ~1%).  I am still looking into this.

 - UDP tunnel offload
   * Similar to vlan acceleration, with this extension we can pass additional
 data to host for support GSO with udp tunnel and possible other
 encapsulations.  This yeilds a significant perfromance improvement
(still testing remote checksum code).

An addition extension that is unfinished (due to still testing for any
side-effects) is checksum passthrough to support drivers that set
CHECKSUM_COMPLETE.  This would eliminate the need for guests to compute
the software checksum.

This series only takes care of virtio net.  I have addition patches for the
host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback
on the general approach first.

Vladislav Yasevich (6):
  virtio-net: Remove the use the padded vnet_header structure
  virtio-net: make header length handling uniform
  virtio_net: Add basic skeleton for handling vnet header extensions.
  virtio-net: Add support for IPv6 fragment id vnet header extension.
  virtio-net: Add support for vlan acceleration vnet header extension.
  virtio-net: Add support for UDP tunnel offload and extension.

 drivers/net/virtio_net.c| 132 +---
 include/linux/skbuff.h  |   5 ++
 include/linux/virtio_net.h  |  91 ++-
 include/uapi/linux/virtio_net.h |  38 
 4 files changed, 242 insertions(+), 24 deletions(-)

-- 
2.7.4



Vladislav Yasevich (6):
  virtio-net: Remove the use the padded vnet_header structure
  virtio-net: make header length handling uniform
  virtio_net: Add basic skeleton for handling vnet header extensions.
  virtio-net: Add support for IPv6 fragment id vnet header extension.
  virtio-net: Add support for vlan acceleration vnet header extension.
  virtio: Add support for UDP tunnel offload and extension.

 drivers/net/virtio_net.c| 121 ++--
 include/linux/skbuff.h  |   5 ++
 include/linux/virtio_net.h  |  91 +-
 include/uapi/linux/virtio_net.h |  38 +
 4 files changed, 236 insertions(+), 19 deletions(-)

-- 
2.7.4