[virtio-dev] Re: [virtio-comment] 回复:[virtio-dev] [PATCH v8] virtio_net: support for split transport header

2023-02-02 Thread Michael S. Tsirkin
On Thu, Feb 02, 2023 at 04:20:57AM +, Parav Pandit wrote:
> 
> 
> > From: Heng Qi 
> > Sent: Wednesday, February 1, 2023 10:45 PM
> > 
> > 在 2023/2/2 上午9:45, Parav Pandit 写道:
> > >> From: Heng Qi 
> > >> Sent: Wednesday, February 1, 2023 8:18 AM
> > > [..]
> > >
> > >>> Response
> > >>>
> > >>> Page alignment requirements should not come from the virtio spec.
> > >>> There are a variety of cases which may use non page aligned data 
> > >>> buffers.
> > >>> a. A kernel only consumer can use it who doesn't have mmap requirement.
> > >>> b. A VQ accessible directly in user space may also use it without
> > >>> page
> > >> alignment.
> > >>> c. A system with 64k page size, page aligned memory has a fair
> > >>> amount of
> > >> wastage.
> > >>> d. iouring example you pointed, also has non page aligned use.
> > >>>
> > >>> So let the driver deal with alignment restriction, outside of the 
> > >>> virtio spec.
> > >>>
> > >>> In header data split cases, data buffers utilization is more
> > >>> important than the
> > >> tiny header buffers utilization.
> > >>> How about if the headers do not interfere with the data buffers?
> > >>>
> > >>> In other words, say a given RQ has optionally linked to a circular
> > >>> queue of
> > >> header buffers.
> > >>> All header buffers are of the same size, supplied one time.
> > >>> This header size and circular q address is configured one time at RQ
> > >>> creation
> > >> time.
> > >>> With this the device doesn't need to process header buffer size
> > >>> every single
> > >> incoming packet.
> > >>> Data buffers can continue as chains or merged mode can be supported.
> > >>> When the received packet’s header cannot fit, it continues as-is in
> > >>> the data
> > >> buffer.
> > >>> Virtio net hdr as suggest indicates usage of hdr buffer offset/index.
> > >> This is a good direction, thanks a lot for your comments, the new
> > >> split header method might look like this:
> > >> When allocating the receive queue, allocate a circular queue storing
> > >> the header buffers at the driver layer, which is shared with the
> > >> device, and set the length of the header buffer. The length of the
> > >> circular queue is the same as the length of the descriptor table.
> > >>
> > >> 1. When a data packet is successfully split, all virtio-net-hdr +
> > >> transport headers are stored in the header buffer, and
> > >> VIRTIO_NET_HDR_F_SPLIT_HEADER is set in the flag of virtio-net-hdr;
> > >> If the header buffer is not enough to store virtio- net-hdr +
> > >> transport header, then roll back, that is, not split the packet;
> > >>
> > > Right.
> > >
> > >> 2. When a packet is not split by the device for some reason,
> > >> virtio-net-hdr is placed in the header buffer, and the transport
> > >> header and payload are placed in the buffer. At this time, xdp is not
> > >> supported, because xdp requires the transport header to be placed in the
> > first buffer, the header buffer.
> > > I do not understand why the xdp is not supported in this case.
> > > Isn't it the case today, where xdp_prepare_buff() gets the buffer with 
> > > offset
> > after the net hdr?
> > > In the above case you mentioned xdp gets to used the packet buffer where
> > pkt hdr + data is located.
> > 
> > The head buffer and data buffer are not continuous, but xdp requires the 
> > data
> > memory in the linear area to be continuous. When virtio-net-hdr is stored in
> > the header buffer, and the transport header and payload are in the data 
> > buffer,
> > the requirements of xdp are not met at this time. Many xdp kern programs and
> > xdp core layers also require the transport header to be placed in the linear
> > region.
> >
> Lets keep aside a virtio net hdr for a moment.
> Pkt HDR (L2 to L4) is in buffer_type_1.
> Pkt Data (after L4) is in buffer_type_2.
> They are not contiguous.
> In this case XDP doesn’t work.
> Did I understand you correctly?
> If so, there is ongoing work to support multi buffer XDP.
> We should look forward to have that support it in the future in OS.
>  
> > >> Then why
> > >> not start storing packets from the header buffer? If this is the
> > >> case, imagine a
> > >> situation: If all small packets are put into the header buffers, and
> > >> virtio net driver receives packets according to the descriptor table
> > >> and used_ring, but the data buffers are empty at this time, then can't 
> > >> seem
> > to continue?
> > > In future when such mode is extended, vq mechanism will be extended to
> > handle desc table and used ring anyway.
> > 
> > Yes, for example, we can support the descriptor table to allow placing a 
> > buffer
> > with a length of 0 as a placeholder to deal with the above problem.
> > 
> > >
> > >> It seems that
> > >> the operation of circular queue has become a producer-consumer
> > >> problem again.
> > >>
> > > I didn’t follow.
> > >
> > > I tend to imagine that driver consumed meta data (vnet_hdr and used
> > > ring) should be placed together in one buffer and

[virtio-dev] RE: [virtio-comment] 回复:[virtio-dev] [PATCH v8] virtio_net: support for split transport header

2023-02-01 Thread Parav Pandit


> From: Heng Qi 
> Sent: Wednesday, February 1, 2023 10:45 PM
> 
> 在 2023/2/2 上午9:45, Parav Pandit 写道:
> >> From: Heng Qi 
> >> Sent: Wednesday, February 1, 2023 8:18 AM
> > [..]
> >
> >>> Response
> >>>
> >>> Page alignment requirements should not come from the virtio spec.
> >>> There are a variety of cases which may use non page aligned data buffers.
> >>> a. A kernel only consumer can use it who doesn't have mmap requirement.
> >>> b. A VQ accessible directly in user space may also use it without
> >>> page
> >> alignment.
> >>> c. A system with 64k page size, page aligned memory has a fair
> >>> amount of
> >> wastage.
> >>> d. iouring example you pointed, also has non page aligned use.
> >>>
> >>> So let the driver deal with alignment restriction, outside of the virtio 
> >>> spec.
> >>>
> >>> In header data split cases, data buffers utilization is more
> >>> important than the
> >> tiny header buffers utilization.
> >>> How about if the headers do not interfere with the data buffers?
> >>>
> >>> In other words, say a given RQ has optionally linked to a circular
> >>> queue of
> >> header buffers.
> >>> All header buffers are of the same size, supplied one time.
> >>> This header size and circular q address is configured one time at RQ
> >>> creation
> >> time.
> >>> With this the device doesn't need to process header buffer size
> >>> every single
> >> incoming packet.
> >>> Data buffers can continue as chains or merged mode can be supported.
> >>> When the received packet’s header cannot fit, it continues as-is in
> >>> the data
> >> buffer.
> >>> Virtio net hdr as suggest indicates usage of hdr buffer offset/index.
> >> This is a good direction, thanks a lot for your comments, the new
> >> split header method might look like this:
> >> When allocating the receive queue, allocate a circular queue storing
> >> the header buffers at the driver layer, which is shared with the
> >> device, and set the length of the header buffer. The length of the
> >> circular queue is the same as the length of the descriptor table.
> >>
> >> 1. When a data packet is successfully split, all virtio-net-hdr +
> >> transport headers are stored in the header buffer, and
> >> VIRTIO_NET_HDR_F_SPLIT_HEADER is set in the flag of virtio-net-hdr;
> >> If the header buffer is not enough to store virtio- net-hdr +
> >> transport header, then roll back, that is, not split the packet;
> >>
> > Right.
> >
> >> 2. When a packet is not split by the device for some reason,
> >> virtio-net-hdr is placed in the header buffer, and the transport
> >> header and payload are placed in the buffer. At this time, xdp is not
> >> supported, because xdp requires the transport header to be placed in the
> first buffer, the header buffer.
> > I do not understand why the xdp is not supported in this case.
> > Isn't it the case today, where xdp_prepare_buff() gets the buffer with 
> > offset
> after the net hdr?
> > In the above case you mentioned xdp gets to used the packet buffer where
> pkt hdr + data is located.
> 
> The head buffer and data buffer are not continuous, but xdp requires the data
> memory in the linear area to be continuous. When virtio-net-hdr is stored in
> the header buffer, and the transport header and payload are in the data 
> buffer,
> the requirements of xdp are not met at this time. Many xdp kern programs and
> xdp core layers also require the transport header to be placed in the linear
> region.
>
Lets keep aside a virtio net hdr for a moment.
Pkt HDR (L2 to L4) is in buffer_type_1.
Pkt Data (after L4) is in buffer_type_2.
They are not contiguous.
In this case XDP doesn’t work.
Did I understand you correctly?
If so, there is ongoing work to support multi buffer XDP.
We should look forward to have that support it in the future in OS.
 
> >> Then why
> >> not start storing packets from the header buffer? If this is the
> >> case, imagine a
> >> situation: If all small packets are put into the header buffers, and
> >> virtio net driver receives packets according to the descriptor table
> >> and used_ring, but the data buffers are empty at this time, then can't seem
> to continue?
> > In future when such mode is extended, vq mechanism will be extended to
> handle desc table and used ring anyway.
> 
> Yes, for example, we can support the descriptor table to allow placing a 
> buffer
> with a length of 0 as a placeholder to deal with the above problem.
> 
> >
> >> It seems that
> >> the operation of circular queue has become a producer-consumer
> >> problem again.
> >>
> > I didn’t follow.
> >
> > I tend to imagine that driver consumed meta data (vnet_hdr and used
> > ring) should be placed together in one buffer and
> 
> Sorry, I didn't understand what you said, can you explain more clearly?
> Do you mean that
> virtio-ner-hdr and some other metadata are placed in a buffer? How are used
> ring and virtio-net-hdr placed together in one buffer?
>
We can extend virtio net used ring which will consist of,
struct 

[virtio-dev] RE: [virtio-comment] 回复:[virtio-dev] [PATCH v8] virtio_net: support for split transport header

2023-02-01 Thread Parav Pandit

> From: Heng Qi 
> Sent: Wednesday, February 1, 2023 8:18 AM

[..]

> > Response
> >
> > Page alignment requirements should not come from the virtio spec.
> > There are a variety of cases which may use non page aligned data buffers.
> > a. A kernel only consumer can use it who doesn't have mmap requirement.
> > b. A VQ accessible directly in user space may also use it without page
> alignment.
> > c. A system with 64k page size, page aligned memory has a fair amount of
> wastage.
> > d. iouring example you pointed, also has non page aligned use.
> >
> > So let the driver deal with alignment restriction, outside of the virtio 
> > spec.
> >
> > In header data split cases, data buffers utilization is more important than 
> > the
> tiny header buffers utilization.
> > How about if the headers do not interfere with the data buffers?
> >
> > In other words, say a given RQ has optionally linked to a circular queue of
> header buffers.
> > All header buffers are of the same size, supplied one time.
> > This header size and circular q address is configured one time at RQ 
> > creation
> time.
> >
> > With this the device doesn't need to process header buffer size every single
> incoming packet.
> > Data buffers can continue as chains or merged mode can be supported.
> > When the received packet’s header cannot fit, it continues as-is in the data
> buffer.
> > Virtio net hdr as suggest indicates usage of hdr buffer offset/index.
> 
> This is a good direction, thanks a lot for your comments, the new split header
> method might look like this:
> When allocating the receive queue, allocate a circular queue storing the
> header buffers at the driver layer, which is shared with the device, and set 
> the
> length of the header buffer. The length of the circular queue is the same as 
> the
> length of the descriptor table.
> 
> 1. When a data packet is successfully split, all virtio-net-hdr + transport 
> headers
> are stored in the header buffer, and VIRTIO_NET_HDR_F_SPLIT_HEADER is set
> in the flag of virtio-net-hdr; If the header buffer is not enough to store 
> virtio-
> net-hdr + transport header, then roll back, that is, not split the packet;
> 
Right.

> 2. When a packet is not split by the device for some reason, virtio-net-hdr is
> placed in the header buffer, and the transport header and payload are placed
> in the buffer. At this time, xdp is not supported, because xdp requires the
> transport header to be placed in the first buffer, the header buffer. 
I do not understand why the xdp is not supported in this case.
Isn't it the case today, where xdp_prepare_buff() gets the buffer with offset 
after the net hdr?
In the above case you mentioned xdp gets to used the packet buffer where pkt 
hdr + data is located.

> Then why
> not start storing packets from the header buffer? If this is the case, 
> imagine a
> situation: If all small packets are put into the header buffers, and virtio 
> net
> driver receives packets according to the descriptor table and used_ring, but 
> the
> data buffers are empty at this time, then can't seem to continue? 
In future when such mode is extended, vq mechanism will be extended to handle 
desc table and used ring anyway.

> It seems that
> the operation of circular queue has become a producer-consumer problem
> again.
> 
I didn’t follow.

I tend to imagine that driver consumed meta data (vnet_hdr and used ring) 
should be placed together in one buffer and
Actual packet header and data belongs to other buffer.
This way there is clear layering of ownership. But before we explore this 
option, lets understand your above point.

> 3. The meaning of hdr_len in virtio-net-hdr:
>   i. When splitting successfully: hdr_len represents the effective length 
> of the
> header in the header buffer.
>   ii. On unsuccessful split: hdr_len is 0.
> 
> What do you think of this plan? Or is there a better one?
> 
> Thanks.
> 
> >
> > This method has few benefits on perf and buffer efficiency as below.
> > 1. Data buffers can be directly mapped at best utilization 2. Device
> > doesn't need to match up per packet header sizes and descriptor sizes,
> > efficient for device to implement 3. No need to keep reposting the header
> buffers, only its tail index to be updated.
> > Directly gives 50% cycle reduction on buffer traversing on driver side on rx
> path.
> > 4. Ability to share this header buffer queue among multiple RQs if needed.
> > 5. In the future there may be an extension to place tiny whole packets that
> can fit in the header buffer also to contain the rest of the data.
> > 6. Device can always fall back to place packet header in data buffer
> > when header buffer is not available or smaller than newer protocol 7.
> > Because the header buffer comes virtually contiguous memory and not
> > intermixed with data buffers, there isn't small per header allocations 8. 
> > Also
> works in both chained and merged mode 9. memory utilization for an RQ of
> depth 256, with 4K page size 

[virtio-dev] RE: [virtio-comment] 回复:[virtio-dev] [PATCH v8] virtio_net: support for split transport header

2023-01-31 Thread Parav Pandit
Hi Heng Qi,

Sorry for the joining this conversation little late.
Your email has very useful summary.

Unfortunately, non-text content (HTML content) doesn’t get achieved.
So, changing the format to text to capture your useful comments.
If you can change your email client settings to be text mode, it will be easier 
to converse.

We have equal interest in having efficient split hdr support and do together 
with you.
Please find response "response" tag below at the end of email to avoid top 
posting.

From: virtio-comm...@lists.oasis-open.org  
On Behalf Of hengqi
Sent: Tuesday, January 31, 2023 4:23 AM
To: virtio-dev ; virtio-comment 

Cc: Michael S. Tsirkin ; Jason Wang ; 
Cornelia Huck ; Kangjie Xu ; 
Xuan Zhuo 
Subject: [virtio-comment] 回复:[virtio-dev] [PATCH v8] virtio_net: support for 
split transport header

Hi, all.

Split header is a technique with important applications, such as Eric 
(https://lwn.net/Articles/754681/)
and Jonathan Lemon (https://lore.kernel.org/io-uring/20221007211713.170714-1- 
mailto:jonathan.le...@gmail.com/T/#m678770d1fa7040fd76ed35026b93dfcbf25f6196)
realize the zero-copy technology respectively, they all have one thing in 
common that the header and
the payload need to be in separate buffers, and Eric's method requires the 
payload to be page-aligned.

We implemented zero-copy on the virtio-net driver according to Eric's method. 
The commands and
environment are as follows:
# environment
VM1<>vhost-user<->OVS<->vhost-user<>VM2
cpu Model name: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
kernel version 6.0

# commands (linux/tools/testing/selftests/net)
./tcp_mmap -s -z -4 -p 1000 &
./tcp_mmap -H 10.0.0.2 -z -4 -p 1000

The performance data is as follows (implemented according to the split header 
v7 version,
https://lists.oasis-open.org/archives/virtio-dev/202209/msg4.html):
# direct copy
17.6604 s 10.08 s
# zero copy
1.9 GB/s 3.3 GB/s

We discussed a lot before, the core point is the choice of method A and method 
C, we seem
to be unable to reach an agreement on this point, seeing the above summary and 
previous discussion (https://lists.oasis-open.org/archives/virtio-dev 
/202210/msg00017.html),
how can we resolve this conflict and let this important feature continue?
I really need your help. Cc Jason, Michael, Cornelia, Xuan.

Thanks.
--
发件人:Heng Qi <mailto:hen...@linux.alibaba.com>
发送时间:2022年10月20日(星期四) 16:34
收件人:Jason Wang <mailto:jasow...@redhat.com>
抄 送:Michael S. Tsirkin <mailto:m...@redhat.com>; Xuan Zhuo 
<mailto:xuanz...@linux.alibaba.com>; Virtio-Dev 
<mailto:virtio-dev@lists.oasis-open.org>; Kangjie Xu 
<mailto:kangjie...@linux.alibaba.com>
主 题:Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mailto:m...@redhat.com> 
>wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin 
><mailto:m...@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
>conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
>can just
> > > > > to not split the header when the layout doesn't fit for header 
>splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-20 Thread Heng Qi
On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
> 
> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first 
> > \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 


We've talked a bit more about split header so far, but there still seem to
be some issues, so let's recap.

一、 Method Discussion Review

In order to adapt to the Eric's tcp receive interface to achieve zero copy,
header and payload are required to be stored separately, and the payload is
stored in a paged alignment way. Therefore, we have discussed several options
for split header as follows:

1: method A ( depend on the descriptor chain )
| receive buffer| 
|  0th descriptor

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-17 Thread Xuan Zhuo



FYI:

A new rx zerocopy idea, which is different from Eric's tcp mmap zerocopy, the
buffer comes from user mode and put to device, no longer requires page 
alignment,
but still depends on split header.

   
https://lore.kernel.org/io-uring/20221007211713.170714-1-jonathan.le...@gmail.com/T/#m678770d1fa7040fd76ed35026b93dfcbf25f6196


Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-13 Thread Michael S. Tsirkin
On Thu, Oct 13, 2022 at 02:47:55PM +0800, Jason Wang wrote:
> On Wed, Oct 12, 2022 at 1:05 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> > > On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > > > Jason I think the issue with previous proposals is that 
> > > > > > > > > > they conflict
> > > > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that 
> > > > > > > > > > giving the
> > > > > > > > > > driver flexibility in arranging the packet in memory is 
> > > > > > > > > > benefitial.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, but I didn't found how it can conflict the any_layout. 
> > > > > > > > > Device can just
> > > > > > > > > to not split the header when the layout doesn't fit for 
> > > > > > > > > header splitting.
> > > > > > > > > (And this seems the case even if we're using buffers).
> > > > > > > >
> > > > > > > > Well spec says:
> > > > > > > >
> > > > > > > > indicates to both the device and the driver that no
> > > > > > > > assumptions were made about framing.
> > > > > > > >
> > > > > > > > if device assumes that descriptor boundaries are where
> > > > > > > > driver wants packet to be stored that is clearly
> > > > > > > > an assumption.
> > > > > > >
> > > > > > > Yes but what I want to say is, the device can choose to not split 
> > > > > > > the
> > > > > > > packet if the framing doesn't fit. Does it still comply with the 
> > > > > > > above
> > > > > > > description?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > > > For example, if driver wants to split the header at some specific
> > > > > > offset this is already possible without extra functionality.
> > > > >
> > > > > I'm not sure how this would work without the support from the device.
> > > > > This probably can only work if:
> > > > >
> > > > > 1) the driver know what kind of packet it can receive
> > > > > 2) protocol have fixed length of the header
> > > > >
> > > > > This is probably not true consider:
> > > > >
> > > > > 1) TCP and UDP have different header length
> > > > > 2) IPv6 has an variable length of the header
> > > >
> > > > We currently say:
> > > >
> > > > If a receive packet is spread over multiple buffers, the device
> > > > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > > 1 buffers) completely up to the full length of each buffer
> > > > supplied by the driver.
> > > >
> > > > the idea is basically just to lift this requirement for specific
> > > > packets. Things certainly worked before, this is just an
> > > > optimization.
> > >
> > > The problem is, the offset is not fixed and can be determined by the
> > > driver. It's variable so it requires the device to parse the packet to
> > > know.
> > >
> > > Thanks
> >
> > IIUC device parsing packet and splitting the header is exactly what the
> > feature is supposed to do. It's benefitial if device is a hardware one.
> 
> Right, so in any way (even with a mergeable buffer), if the device
> parses and splits, we need to tweak the section you quote above from
> the spec, since there will be some space left in the buffer other than
> the last one?
> 
> Thanks

Exactly. My point was, we know relaxing this does not conflict with
other parts of the spec since we didn't use to require this.

> >
> >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Let's keep it that way.
> > > > > >
> > > > > > Now, let's formulate what are some of the problems with the current 
> > > > > > way.
> > > > > >
> > > > > >
> > > > > >
> > > > > > A- mergeable buffers is even more flexible, since a single packet
> > > > > >   is built up of multiple buffers. And in theory device can
> > > > > >   choose arbitrary set of buffers to store a packet.
> > > > > >   So you could supply a small buffer for headers followed by a 
> > > > > > bigger
> > > > > >   one for payload, in theory even without any changes.
> > > > > >   Problem 1: However since this is not how devices currently 
> > > > > > operate,
> > > > > >   a feature bit would be helpful.
> > > > >
> > > > > How do we know the bigger buffer is sufficient for the packet? If we
> > > > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > > > effort of the mergeable buffer:
> > > > >
> > > > > header buffer #1
> > > > > payload buffer #1
> > > > > header buffer #2
> > > > > payload buffer #2
> > > > >
> > > > > Is the device expected to
> > > > >

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-13 Thread Jason Wang
On Wed, Oct 12, 2022 at 1:05 PM Michael S. Tsirkin  wrote:
>
> On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> > On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin  wrote:
> > >
> > > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > > > > conflict
> > > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that 
> > > > > > > > > giving the
> > > > > > > > > driver flexibility in arranging the packet in memory is 
> > > > > > > > > benefitial.
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, but I didn't found how it can conflict the any_layout. 
> > > > > > > > Device can just
> > > > > > > > to not split the header when the layout doesn't fit for header 
> > > > > > > > splitting.
> > > > > > > > (And this seems the case even if we're using buffers).
> > > > > > >
> > > > > > > Well spec says:
> > > > > > >
> > > > > > > indicates to both the device and the driver that no
> > > > > > > assumptions were made about framing.
> > > > > > >
> > > > > > > if device assumes that descriptor boundaries are where
> > > > > > > driver wants packet to be stored that is clearly
> > > > > > > an assumption.
> > > > > >
> > > > > > Yes but what I want to say is, the device can choose to not split 
> > > > > > the
> > > > > > packet if the framing doesn't fit. Does it still comply with the 
> > > > > > above
> > > > > > description?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > > For example, if driver wants to split the header at some specific
> > > > > offset this is already possible without extra functionality.
> > > >
> > > > I'm not sure how this would work without the support from the device.
> > > > This probably can only work if:
> > > >
> > > > 1) the driver know what kind of packet it can receive
> > > > 2) protocol have fixed length of the header
> > > >
> > > > This is probably not true consider:
> > > >
> > > > 1) TCP and UDP have different header length
> > > > 2) IPv6 has an variable length of the header
> > >
> > > We currently say:
> > >
> > > If a receive packet is spread over multiple buffers, the device
> > > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > 1 buffers) completely up to the full length of each buffer
> > > supplied by the driver.
> > >
> > > the idea is basically just to lift this requirement for specific
> > > packets. Things certainly worked before, this is just an
> > > optimization.
> >
> > The problem is, the offset is not fixed and can be determined by the
> > driver. It's variable so it requires the device to parse the packet to
> > know.
> >
> > Thanks
>
> IIUC device parsing packet and splitting the header is exactly what the
> feature is supposed to do. It's benefitial if device is a hardware one.

Right, so in any way (even with a mergeable buffer), if the device
parses and splits, we need to tweak the section you quote above from
the spec, since there will be some space left in the buffer other than
the last one?

Thanks

>
>
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Let's keep it that way.
> > > > >
> > > > > Now, let's formulate what are some of the problems with the current 
> > > > > way.
> > > > >
> > > > >
> > > > >
> > > > > A- mergeable buffers is even more flexible, since a single packet
> > > > >   is built up of multiple buffers. And in theory device can
> > > > >   choose arbitrary set of buffers to store a packet.
> > > > >   So you could supply a small buffer for headers followed by a bigger
> > > > >   one for payload, in theory even without any changes.
> > > > >   Problem 1: However since this is not how devices currently operate,
> > > > >   a feature bit would be helpful.
> > > >
> > > > How do we know the bigger buffer is sufficient for the packet? If we
> > > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > > effort of the mergeable buffer:
> > > >
> > > > header buffer #1
> > > > payload buffer #1
> > > > header buffer #2
> > > > payload buffer #2
> > > >
> > > > Is the device expected to
> > > >
> > > > 1) fill payload in header buffer #2, this breaks the effort that we
> > > > want to make payload page aligned
> > > > 2) skip header buffer #2, in this case, the device assumes the framing
> > > > when it breaks any layout
> > > >
> > > > >
> > > > >   Problem 2: Also, in the past we found it useful to be able to 
> > > > > figure out whether
> > > > >   packet fits in a single buffer without looking at the header.
> > > > >   For this reason, we 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-11 Thread Michael S. Tsirkin
On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin  wrote:
> >
> > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > > > conflict
> > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving 
> > > > > > > > the
> > > > > > > > driver flexibility in arranging the packet in memory is 
> > > > > > > > benefitial.
> > > > > > >
> > > > > > >
> > > > > > > Yes, but I didn't found how it can conflict the any_layout. 
> > > > > > > Device can just
> > > > > > > to not split the header when the layout doesn't fit for header 
> > > > > > > splitting.
> > > > > > > (And this seems the case even if we're using buffers).
> > > > > >
> > > > > > Well spec says:
> > > > > >
> > > > > > indicates to both the device and the driver that no
> > > > > > assumptions were made about framing.
> > > > > >
> > > > > > if device assumes that descriptor boundaries are where
> > > > > > driver wants packet to be stored that is clearly
> > > > > > an assumption.
> > > > >
> > > > > Yes but what I want to say is, the device can choose to not split the
> > > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > > description?
> > > > >
> > > > > Thanks
> > > >
> > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > For example, if driver wants to split the header at some specific
> > > > offset this is already possible without extra functionality.
> > >
> > > I'm not sure how this would work without the support from the device.
> > > This probably can only work if:
> > >
> > > 1) the driver know what kind of packet it can receive
> > > 2) protocol have fixed length of the header
> > >
> > > This is probably not true consider:
> > >
> > > 1) TCP and UDP have different header length
> > > 2) IPv6 has an variable length of the header
> >
> > We currently say:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> > the idea is basically just to lift this requirement for specific
> > packets. Things certainly worked before, this is just an
> > optimization.
> 
> The problem is, the offset is not fixed and can be determined by the
> driver. It's variable so it requires the device to parse the packet to
> know.
> 
> Thanks

IIUC device parsing packet and splitting the header is exactly what the
feature is supposed to do. It's benefitial if device is a hardware one.


> >
> >
> >
> > >
> > > >
> > > > Let's keep it that way.
> > > >
> > > > Now, let's formulate what are some of the problems with the current way.
> > > >
> > > >
> > > >
> > > > A- mergeable buffers is even more flexible, since a single packet
> > > >   is built up of multiple buffers. And in theory device can
> > > >   choose arbitrary set of buffers to store a packet.
> > > >   So you could supply a small buffer for headers followed by a bigger
> > > >   one for payload, in theory even without any changes.
> > > >   Problem 1: However since this is not how devices currently operate,
> > > >   a feature bit would be helpful.
> > >
> > > How do we know the bigger buffer is sufficient for the packet? If we
> > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > effort of the mergeable buffer:
> > >
> > > header buffer #1
> > > payload buffer #1
> > > header buffer #2
> > > payload buffer #2
> > >
> > > Is the device expected to
> > >
> > > 1) fill payload in header buffer #2, this breaks the effort that we
> > > want to make payload page aligned
> > > 2) skip header buffer #2, in this case, the device assumes the framing
> > > when it breaks any layout
> > >
> > > >
> > > >   Problem 2: Also, in the past we found it useful to be able to figure 
> > > > out whether
> > > >   packet fits in a single buffer without looking at the header.
> > > >   For this reason, we have this text:
> > > >
> > > > If a receive packet is spread over multiple buffers, the device
> > > > MUST use all buffers but the last (i.e. the first 
> > > > \field{num_buffers} -
> > > > 1 buffers) completely up to the full length of each buffer
> > > > supplied by the driver.
> > > >
> > > >   if we want to keep this optimization and allow using a separate
> > > >   buffer for headers, then I think we could rely on the feature bit
> > > >   from Problem 1 and just make an exception for the first buffer.
> > > >   

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-11 Thread Jason Wang
On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin  wrote:
>
> On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > > conflict
> > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > driver flexibility in arranging the packet in memory is 
> > > > > > > benefitial.
> > > > > >
> > > > > >
> > > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > > can just
> > > > > > to not split the header when the layout doesn't fit for header 
> > > > > > splitting.
> > > > > > (And this seems the case even if we're using buffers).
> > > > >
> > > > > Well spec says:
> > > > >
> > > > > indicates to both the device and the driver that no
> > > > > assumptions were made about framing.
> > > > >
> > > > > if device assumes that descriptor boundaries are where
> > > > > driver wants packet to be stored that is clearly
> > > > > an assumption.
> > > >
> > > > Yes but what I want to say is, the device can choose to not split the
> > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > description?
> > > >
> > > > Thanks
> > >
> > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > For example, if driver wants to split the header at some specific
> > > offset this is already possible without extra functionality.
> >
> > I'm not sure how this would work without the support from the device.
> > This probably can only work if:
> >
> > 1) the driver know what kind of packet it can receive
> > 2) protocol have fixed length of the header
> >
> > This is probably not true consider:
> >
> > 1) TCP and UDP have different header length
> > 2) IPv6 has an variable length of the header
>
> We currently say:
>
> If a receive packet is spread over multiple buffers, the device
> MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> 1 buffers) completely up to the full length of each buffer
> supplied by the driver.
>
> the idea is basically just to lift this requirement for specific
> packets. Things certainly worked before, this is just an
> optimization.

The problem is, the offset is not fixed and can be determined by the
driver. It's variable so it requires the device to parse the packet to
know.

Thanks

>
>
>
> >
> > >
> > > Let's keep it that way.
> > >
> > > Now, let's formulate what are some of the problems with the current way.
> > >
> > >
> > >
> > > A- mergeable buffers is even more flexible, since a single packet
> > >   is built up of multiple buffers. And in theory device can
> > >   choose arbitrary set of buffers to store a packet.
> > >   So you could supply a small buffer for headers followed by a bigger
> > >   one for payload, in theory even without any changes.
> > >   Problem 1: However since this is not how devices currently operate,
> > >   a feature bit would be helpful.
> >
> > How do we know the bigger buffer is sufficient for the packet? If we
> > try to allocate 64K (not sufficient for the future even) it breaks the
> > effort of the mergeable buffer:
> >
> > header buffer #1
> > payload buffer #1
> > header buffer #2
> > payload buffer #2
> >
> > Is the device expected to
> >
> > 1) fill payload in header buffer #2, this breaks the effort that we
> > want to make payload page aligned
> > 2) skip header buffer #2, in this case, the device assumes the framing
> > when it breaks any layout
> >
> > >
> > >   Problem 2: Also, in the past we found it useful to be able to figure 
> > > out whether
> > >   packet fits in a single buffer without looking at the header.
> > >   For this reason, we have this text:
> > >
> > > If a receive packet is spread over multiple buffers, the device
> > > MUST use all buffers but the last (i.e. the first 
> > > \field{num_buffers} -
> > > 1 buffers) completely up to the full length of each buffer
> > > supplied by the driver.
> > >
> > >   if we want to keep this optimization and allow using a separate
> > >   buffer for headers, then I think we could rely on the feature bit
> > >   from Problem 1 and just make an exception for the first buffer.
> > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > >   confusion.
> > >
> > >
> > >
> > >
> > >
> > > B- without mergeable, there's no flexibility. In particular, there can
> > > not be uninitialized space between header and data.
> >
> > I had two questions
> >
> > 1) why is this not a problem of mergeable? There's no guarantee that
> > the header is just the length of what the driver allocates for header
> > buffer anyhow
> >
> > E.g the header 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-10 Thread Michael S. Tsirkin
On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header

We currently say:

If a receive packet is spread over multiple buffers, the device
MUST use all buffers but the last (i.e. the first \field{num_buffers} -
1 buffers) completely up to the full length of each buffer
supplied by the driver.

the idea is basically just to lift this requirement for specific
packets. Things certainly worked before, this is just an
optimization.



> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first 
> > \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 
> > If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> > \field{len} is particularly useful
> > 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-08 Thread Xuan Zhuo
On Sat, 8 Oct 2022 12:37:45 +0800, Jason Wang  wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
>
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
>
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
>
> This is probably not true consider:
>
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
>
>
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
>
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
>
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
>
> Is the device expected to
>
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
>
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first 
> > \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
>
> I had two questions
>
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
>
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
>
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
>
> > If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> > \field{len} is particularly useful
> > for drivers using untrusted buffers: if a driver does not know 
> > exactly
> > how much has been written by the device, the driver would have to 
> > zero
> > the buffer in advance to ensure no data leakage occurs.
> >
> > For example, a network driver may hand a received buffer directly to
> > an unprivileged userspace application.  If the network device 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-07 Thread Jason Wang
On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
>
> On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > >
> > > >
> > > > Yes, but I didn't found how it can conflict the any_layout. Device can 
> > > > just
> > > > to not split the header when the layout doesn't fit for header 
> > > > splitting.
> > > > (And this seems the case even if we're using buffers).
> > >
> > > Well spec says:
> > >
> > > indicates to both the device and the driver that no
> > > assumptions were made about framing.
> > >
> > > if device assumes that descriptor boundaries are where
> > > driver wants packet to be stored that is clearly
> > > an assumption.
> >
> > Yes but what I want to say is, the device can choose to not split the
> > packet if the framing doesn't fit. Does it still comply with the above
> > description?
> >
> > Thanks
>
> The point of ANY_LAYOUT is to give drivers maximum flexibility.
> For example, if driver wants to split the header at some specific
> offset this is already possible without extra functionality.

I'm not sure how this would work without the support from the device.
This probably can only work if:

1) the driver know what kind of packet it can receive
2) protocol have fixed length of the header

This is probably not true consider:

1) TCP and UDP have different header length
2) IPv6 has an variable length of the header


>
> Let's keep it that way.
>
> Now, let's formulate what are some of the problems with the current way.
>
>
>
> A- mergeable buffers is even more flexible, since a single packet
>   is built up of multiple buffers. And in theory device can
>   choose arbitrary set of buffers to store a packet.
>   So you could supply a small buffer for headers followed by a bigger
>   one for payload, in theory even without any changes.
>   Problem 1: However since this is not how devices currently operate,
>   a feature bit would be helpful.

How do we know the bigger buffer is sufficient for the packet? If we
try to allocate 64K (not sufficient for the future even) it breaks the
effort of the mergeable buffer:

header buffer #1
payload buffer #1
header buffer #2
payload buffer #2

Is the device expected to

1) fill payload in header buffer #2, this breaks the effort that we
want to make payload page aligned
2) skip header buffer #2, in this case, the device assumes the framing
when it breaks any layout

>
>   Problem 2: Also, in the past we found it useful to be able to figure out 
> whether
>   packet fits in a single buffer without looking at the header.
>   For this reason, we have this text:
>
> If a receive packet is spread over multiple buffers, the device
> MUST use all buffers but the last (i.e. the first \field{num_buffers} 
> -
> 1 buffers) completely up to the full length of each buffer
> supplied by the driver.
>
>   if we want to keep this optimization and allow using a separate
>   buffer for headers, then I think we could rely on the feature bit
>   from Problem 1 and just make an exception for the first buffer.
>   Also num_buffers is then always >= 2, maybe state this to avoid
>   confusion.
>
>
>
>
>
> B- without mergeable, there's no flexibility. In particular, there can
> not be uninitialized space between header and data.

I had two questions

1) why is this not a problem of mergeable? There's no guarantee that
the header is just the length of what the driver allocates for header
buffer anyhow

E.g the header length could be smaller than the header buffer, the
device still needs to skip part of the space in the header buffer.

2) it should be the responsibility of the driver to handle the
uninitialized space, it should do anything that is necessary for
security, more below

> If we had flexibility here, this could be
> helpful for alignment, security, etc.
> Unfortunately, our hands are tied:
>
>
> \field{len} is particularly useful
> for drivers using untrusted buffers: if a driver does not know exactly
> how much has been written by the device, the driver would have to zero
> the buffer in advance to ensure no data leakage occurs.
>
> For example, a network driver may hand a received buffer directly to
> an unprivileged userspace application.  If the network device has not
> overwritten the bytes which were in that buffer, this could leak the
> contents of freed memory from other processes to the application.

This should be a bug of the driver. For example, if the driver wants
to implement zerocopy, it must guarantee that every byte was written
by the device before 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-29 Thread Xuan Zhuo
On Thu, 29 Sep 2022 06:06:41 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Sep 29, 2022 at 04:24:02PM +0800, Xuan Zhuo wrote:
> > On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > > conflict
> > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > driver flexibility in arranging the packet in memory is 
> > > > > > > benefitial.
> > > > > >
> > > > > >
> > > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > > can just
> > > > > > to not split the header when the layout doesn't fit for header 
> > > > > > splitting.
> > > > > > (And this seems the case even if we're using buffers).
> > > > >
> > > > > Well spec says:
> > > > >
> > > > > indicates to both the device and the driver that no
> > > > > assumptions were made about framing.
> > > > >
> > > > > if device assumes that descriptor boundaries are where
> > > > > driver wants packet to be stored that is clearly
> > > > > an assumption.
> > > >
> > > > Yes but what I want to say is, the device can choose to not split the
> > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > description?
> > > >
> > > > Thanks
> > >
> > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > For example, if driver wants to split the header at some specific
> > > offset this is already possible without extra functionality.
> > >
> > > Let's keep it that way.
> > >
> > > Now, let's formulate what are some of the problems with the current way.
> > >
> > >
> > >
> > > A- mergeable buffers is even more flexible, since a single packet
> > >   is built up of multiple buffers.
> >
> > If I understand correctly, this is our v8.
>
> I think it is, or at least close.
>
> > > And in theory device can
> > >   choose arbitrary set of buffers to store a packet.
> > >   So you could supply a small buffer for headers followed by a bigger
> > >   one for payload, in theory even without any changes.
> >
> > This is very interesting, I did not think of this point.
> > This is helpful to reduce the waste of memory.
>
> Hmm good point. I should add: since we are no longer fully using the
> first buffer the feature has a memory cost and - in case of a cache
> pressure - can degrade performance rather than improve it.
> Thus, allowing flexibility for both devices and drivers at runtime
> rather than fixing things through features thus sounds like a good idea.
>
>
> > >   Problem 1: However since this is not how devices currently operate,
> > >   a feature bit would be helpful.
> > >
> > >   Problem 2: Also, in the past we found it useful to be able to figure 
> > > out whether
> > >   packet fits in a single buffer without looking at the header.
> > >   For this reason, we have this text:
> > >
> > >   If a receive packet is spread over multiple buffers, the device
> > >   MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > >   1 buffers) completely up to the full length of each buffer
> > >   supplied by the driver.
> > >
> > >   if we want to keep this optimization and allow using a separate
> > >   buffer for headers, then I think we could rely on the feature bit
> > >   from Problem 1 and just make an exception for the first buffer.
> > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > >   confusion.
> > >
> >
> > Yes, I think this is feasible.
>
>
> Thinking more about this, a question is what to do about packets without
> split header. I can see several options
> A- add some buffers just for the non split case. without in-order they
>   can just stay available. with in-order they have to be recycled which
>   might be expensive.
>   They will also occupy space in the ring. more memory costs.
>   Don't much like it for above reasons.
>   OTOH then I wanted to work on partial-order anyway. Maybe it's time
>   to prioritize that work.

Yes, I also prefer order processing, although we mess up the order, which is
good in some cases.

>
> B- write all of the packet in the payload buffer and just skip header buffer 
> (e.g. make it 0 size?)
>   payload within packet will be misaligned then. Do we care? maybe not -
>   this was supposed to be an exception. In this case:
>   Problem B: where should virtio net header
>   go then? we can put it in the header buffer still, or we can
>   store it linear with the packet.  or we can leave both options, up to 
> device.
>   what is better might depend on
>   different factors.  any chance of performance testing?


avail ring:
| small buffer | page | small buffer | page | small buffer | page | small 
buffer | page |

In this case, I think it is a good 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-29 Thread Michael S. Tsirkin
On Thu, Sep 29, 2022 at 04:24:02PM +0800, Xuan Zhuo wrote:
> On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers.
> 
> If I understand correctly, this is our v8.

I think it is, or at least close.

> > And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> 
> This is very interesting, I did not think of this point.
> This is helpful to reduce the waste of memory.

Hmm good point. I should add: since we are no longer fully using the
first buffer the feature has a memory cost and - in case of a cache
pressure - can degrade performance rather than improve it.
Thus, allowing flexibility for both devices and drivers at runtime
rather than fixing things through features thus sounds like a good idea.


> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> 
> Yes, I think this is feasible.


Thinking more about this, a question is what to do about packets without
split header. I can see several options
A- add some buffers just for the non split case. without in-order they
  can just stay available. with in-order they have to be recycled which
  might be expensive.
  They will also occupy space in the ring. more memory costs.
  Don't much like it for above reasons.
  OTOH then I wanted to work on partial-order anyway. Maybe it's time
  to prioritize that work.

B- write all of the packet in the payload buffer and just skip header buffer 
(e.g. make it 0 size?)
  payload within packet will be misaligned then. Do we care? maybe not -
  this was supposed to be an exception. In this case:
  Problem B: where should virtio net header
  go then? we can put it in the header buffer still, or we can
  store it linear with the packet.  or we can leave both options, up to device.
  what is better might depend on
  different factors.  any chance of performance testing?

> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data. If we had flexibility 
> > here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> > \field{len} is particularly useful
> > for drivers using untrusted buffers: if a driver does not know exactly
> > how much has been written by the device, the driver would have to zero
> > the buffer in advance to 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-29 Thread Xuan Zhuo
On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > >
> > > >
> > > > Yes, but I didn't found how it can conflict the any_layout. Device can 
> > > > just
> > > > to not split the header when the layout doesn't fit for header 
> > > > splitting.
> > > > (And this seems the case even if we're using buffers).
> > >
> > > Well spec says:
> > >
> > > indicates to both the device and the driver that no
> > > assumptions were made about framing.
> > >
> > > if device assumes that descriptor boundaries are where
> > > driver wants packet to be stored that is clearly
> > > an assumption.
> >
> > Yes but what I want to say is, the device can choose to not split the
> > packet if the framing doesn't fit. Does it still comply with the above
> > description?
> >
> > Thanks
>
> The point of ANY_LAYOUT is to give drivers maximum flexibility.
> For example, if driver wants to split the header at some specific
> offset this is already possible without extra functionality.
>
> Let's keep it that way.
>
> Now, let's formulate what are some of the problems with the current way.
>
>
>
> A- mergeable buffers is even more flexible, since a single packet
>   is built up of multiple buffers.

If I understand correctly, this is our v8.

> And in theory device can
>   choose arbitrary set of buffers to store a packet.
>   So you could supply a small buffer for headers followed by a bigger
>   one for payload, in theory even without any changes.

This is very interesting, I did not think of this point.
This is helpful to reduce the waste of memory.

>   Problem 1: However since this is not how devices currently operate,
>   a feature bit would be helpful.
>
>   Problem 2: Also, in the past we found it useful to be able to figure out 
> whether
>   packet fits in a single buffer without looking at the header.
>   For this reason, we have this text:
>
>   If a receive packet is spread over multiple buffers, the device
>   MUST use all buffers but the last (i.e. the first \field{num_buffers} -
>   1 buffers) completely up to the full length of each buffer
>   supplied by the driver.
>
>   if we want to keep this optimization and allow using a separate
>   buffer for headers, then I think we could rely on the feature bit
>   from Problem 1 and just make an exception for the first buffer.
>   Also num_buffers is then always >= 2, maybe state this to avoid
>   confusion.
>

Yes, I think this is feasible.

>
>
>
>
> B- without mergeable, there's no flexibility. In particular, there can
> not be uninitialized space between header and data. If we had flexibility 
> here, this could be
> helpful for alignment, security, etc.
> Unfortunately, our hands are tied:
>
>
>   \field{len} is particularly useful
>   for drivers using untrusted buffers: if a driver does not know exactly
>   how much has been written by the device, the driver would have to zero
>   the buffer in advance to ensure no data leakage occurs.
>
>   For example, a network driver may hand a received buffer directly to
>   an unprivileged userspace application.  If the network device has not
>   overwritten the bytes which were in that buffer, this could leak the
>   contents of freed memory from other processes to the application.


I don't think this is very troublesome, the device can memset the hole by 0.

>
>
> so all buffers have to be initialized completely up to the reported
> used length.
>
> It could be that the guarantee is not relevant in some use-cases.
> We would have to specify that this is an exception to the rule,
> explain that drivers must be careful about information leaks.
> Let's say there's a feature bit that adds uninitialized space
> somewhere. How much was added? We can find out by parsing the
> packet but once you start doing that just to assemble the skb
> you have already lost performance.
> So lots of spec work, some security risks, and unclear performance.
>
>
>
>
> Is above a fair summary?
>
>
>
> If yes I would say let's address A, but make sure we ask drivers
> to clear the new feature bit if there's no mergeable
> (as opposed to e.g. failing probe) so we can add
> support for !mergeable down the road.
>
>
>
>
>
> --
> MST
>

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-29 Thread Michael S. Tsirkin
On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > Jason I think the issue with previous proposals is that they conflict
> > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > driver flexibility in arranging the packet in memory is benefitial.
> > >
> > >
> > > Yes, but I didn't found how it can conflict the any_layout. Device can 
> > > just
> > > to not split the header when the layout doesn't fit for header splitting.
> > > (And this seems the case even if we're using buffers).
> >
> > Well spec says:
> >
> > indicates to both the device and the driver that no
> > assumptions were made about framing.
> >
> > if device assumes that descriptor boundaries are where
> > driver wants packet to be stored that is clearly
> > an assumption.
> 
> Yes but what I want to say is, the device can choose to not split the
> packet if the framing doesn't fit. Does it still comply with the above
> description?
> 
> Thanks

The point of ANY_LAYOUT is to give drivers maximum flexibility.
For example, if driver wants to split the header at some specific
offset this is already possible without extra functionality.

Let's keep it that way.

Now, let's formulate what are some of the problems with the current way.



A- mergeable buffers is even more flexible, since a single packet
  is built up of multiple buffers. And in theory device can
  choose arbitrary set of buffers to store a packet.
  So you could supply a small buffer for headers followed by a bigger
  one for payload, in theory even without any changes.
  Problem 1: However since this is not how devices currently operate,
  a feature bit would be helpful.

  Problem 2: Also, in the past we found it useful to be able to figure out 
whether
  packet fits in a single buffer without looking at the header.
  For this reason, we have this text:

If a receive packet is spread over multiple buffers, the device
MUST use all buffers but the last (i.e. the first \field{num_buffers} -
1 buffers) completely up to the full length of each buffer
supplied by the driver.

  if we want to keep this optimization and allow using a separate
  buffer for headers, then I think we could rely on the feature bit
  from Problem 1 and just make an exception for the first buffer.
  Also num_buffers is then always >= 2, maybe state this to avoid
  confusion.

  



B- without mergeable, there's no flexibility. In particular, there can
not be uninitialized space between header and data. If we had flexibility here, 
this could be
helpful for alignment, security, etc.
Unfortunately, our hands are tied:


\field{len} is particularly useful
for drivers using untrusted buffers: if a driver does not know exactly
how much has been written by the device, the driver would have to zero
the buffer in advance to ensure no data leakage occurs.

For example, a network driver may hand a received buffer directly to
an unprivileged userspace application.  If the network device has not
overwritten the bytes which were in that buffer, this could leak the
contents of freed memory from other processes to the application.


so all buffers have to be initialized completely up to the reported
used length.

It could be that the guarantee is not relevant in some use-cases.
We would have to specify that this is an exception to the rule,
explain that drivers must be careful about information leaks.
Let's say there's a feature bit that adds uninitialized space
somewhere. How much was added? We can find out by parsing the
packet but once you start doing that just to assemble the skb
you have already lost performance.
So lots of spec work, some security risks, and unclear performance.




Is above a fair summary?



If yes I would say let's address A, but make sure we ask drivers
to clear the new feature bit if there's no mergeable
(as opposed to e.g. failing probe) so we can add
support for !mergeable down the road.





-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-28 Thread Jason Wang
On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  wrote:
>
> On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > Jason I think the issue with previous proposals is that they conflict
> > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > driver flexibility in arranging the packet in memory is benefitial.
> >
> >
> > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > to not split the header when the layout doesn't fit for header splitting.
> > (And this seems the case even if we're using buffers).
>
> Well spec says:
>
> indicates to both the device and the driver that no
> assumptions were made about framing.
>
> if device assumes that descriptor boundaries are where
> driver wants packet to be stored that is clearly
> an assumption.

Yes but what I want to say is, the device can choose to not split the
packet if the framing doesn't fit. Does it still comply with the above
description?

Thanks

>
>
> --
> MST
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-28 Thread Michael S. Tsirkin
On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > Jason I think the issue with previous proposals is that they conflict
> > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > driver flexibility in arranging the packet in memory is benefitial.
> 
> 
> Yes, but I didn't found how it can conflict the any_layout. Device can just
> to not split the header when the layout doesn't fit for header splitting.
> (And this seems the case even if we're using buffers).

Well spec says:

indicates to both the device and the driver that no
assumptions were made about framing.

if device assumes that descriptor boundaries are where
driver wants packet to be stored that is clearly
an assumption.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-27 Thread Michael S. Tsirkin
On Wed, Sep 28, 2022 at 09:43:36AM +0800, Xuan Zhuo wrote:
> References: <1663297006-64248-1-git-send-email-hen...@linux.alibaba.com>
>  <3c3cc916-c605-1ed2-d3ff-d8d8ce668...@redhat.com>
>  <20220920032824.ga125...@h68b04307.sqa.eu95>
>  
>  <1663903426.8765974-1-xuanz...@linux.alibaba.com>
>  
>  <20220923013341-mutt-send-email-...@kernel.org>
>  <619af5b3-4c38-955f-0f7f-f351f5a95...@redhat.com>
> In-Reply-To: <619af5b3-4c38-955f-0f7f-f351f5a95...@redhat.com>
> 
> 
> hi Michael
> 
> I see you reverted to v7, our discussion is here.
> 
> Thanks.

Sure, I know. I responded to v7 because Jason mentioned it.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-26 Thread Jason Wang



在 2022/9/23 13:59, Michael S. Tsirkin 写道:

On Fri, Sep 23, 2022 at 12:04:28PM +0800, Jason Wang wrote:

On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo  wrote:

On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang  wrote:

On Tue, Sep 20, 2022 at 11:28 AM Heng Qi  wrote:

On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:

在 2022/9/16 10:56, hengqi 写道:

From: Xuan Zhuo 

The purpose of this feature is to split the transport header and the payload
of the packet.

| receive buffer1(page)| receive buffer2(page) |
|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|payload|
  |<--->|
   ^
   |
max_len

We can use one page for every receive buffer. In this way, we can ensure that
all payloads can be independently in a page, which is very beneficial for
the zerocopy implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---
v8:
 1. Do not depend on descriptor chain. @Michael S. Tsirkin
 2. Add \field{offset} and \field{max_len}.
 3. Fix some presentation issues. @Jason Wang
 4. Clarify some paragraphs.

v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split on 
a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

  conformance.tex |  2 ++
  content.tex | 91 +
  2 files changed, 93 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..fad9dea 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and 
VIRTIO_NET_F_MRG_RXBUF.
  \end{description}
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-23 Thread Xuan Zhuo
On Fri, 23 Sep 2022 07:04:10 -0400, "Michael S. Tsirkin"  
wrote:
> On Fri, Sep 23, 2022 at 06:48:56PM +0800, Xuan Zhuo wrote:
> > On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > > > Michael doesn't want to use desc chain, it's not just a 
> > > > > > > performance issue. In an
> > > > > > > early email, he mentioned that desc chain may be abandoned in the 
> > > > > > > future. So we
> > > > > > > have been trying not to rely on desc chain.
> > > > > >
> > > > > > This seems to be a very large change which seems a little bit too
> > > > > > early to be considered.
> > > > >
> > > > > I'd like to put it in other terms. Fundamentally devices are not
> > > > > supposed to talk about descriptors at all. Descriptors are
> > > > > a way to describe buffers, and devices should all work in terms
> > > > > of buffers. I am working on cleaning up the spec from confusion
> > > > > and terminology mixups. We have several major sources all over the 
> > > > > spec:
> > > > > - descriptor/buffer used inconsistently
> > > > > - feature negotiated/offered used inconsistently
> > > > > - field exists/valid used inconsistently
> > > > >
> > > > > My way to address the first issue is to make sure devices all work
> > > > > with buffers. And buffers are described by descriptors (makes sense,
> > > > > right?) and made available to device by driver and used by device.
> > > >
> > > > Can we try to keep desc info? I think it's a very important feature for
> > > > virtio-net, and many NICs are designed based on this.
> > > >
> > > > Our current solutions B and C will waste a lot of memory. If the page 
> > > > occupied
> > > > by the header can be quickly reclaimed, then we have to do a copy in 
> > > > the driver.
> > >
> > > Not sure I understand. Can you explain?
> >
> >
> > For example B, we allocate two consecutive pages, the second page is used 
> > for
> > payload. The first page is used to reserve the header. The first page is too
> > wasteful, 4k of space, only the header is saved. In this way, we can copy 
> > away
> > the data of the first page, so that the first page can be quickly reclaimed.
> >
> > Why two consecutive pages? The payload achieves page alignment, and there is
> > still some space in front of the page, so we must allocate two connected 
> > pages.
>
> Are we talking without mergeable here?

I think B(offset) is not dependent on mergeable .

C must depend on mergeable.

Thanks.

>
>
> > >
> > > > >
> > > > > The advantage of this is layering - we can change the way buffers
> > > > > are passed around without changing devices. And, it matches
> > > > > the virtio API nicely.
> > > > >
> > > > > Existing devices are all fine with this - they do not pass any
> > > > > information in the descriptor. Yes, this seems like an option to
> > > > > pass some information around, but I am not convinced it is worth
> > > > > the layering violation.
> > > > >
> > > > > By comparison, ability to write data at an offset seems generally
> > > > > useful, in particular we have a very old issue even without
> > > > > the split header feature where with mergeable buffers
> > > > > if we attempt to align the data in the 1st buffer at a cache line
> > > > > boundary by adding an offset before ETH header, then when it spills
> > > > > over to the second buffer it will be misaligned there. Wastes
> > > > > an extra cache line for such packets. Offsets can allow fixing this.
> > > > >
> > > >
> > > > Scheme B In addition to the memory problem, under this scheme, if we 
> > > > want to
> > > > implement tcp zerocopy, then we must apply for two consecutive pages 
> > > > for a
> > > > buffer. The second page is used to place the payload. The first is used 
> > > > to place
> > > > the header.
> > >
> > >
> > > I just don't understand why there would be a difference. Explanation?
> >
> > I'm concerned that allocating two contiguous pages in large numbers will 
> > cause
> > unknowable problems(such perfermance)
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > >
> > > > > I don't see architecturally what is wrong with making feature just
> > > > > depend on mergeable buffers for now. We can always allow a combination
> > > > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > > > !MERGEABLE they should not fail probe, they should instead clear the
> > > > > split header feature.
> > > > >
> > > >
> > > > According to my understanding, for B, it does not depend on mergeable.
> > > >
> > > > If we give up desc info completely, then we prefer B.
> > > >
> > > > Hi Jason, what are your thoughts?
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > If we can't make a feature depend only on mergeable, should we 
> > > > > > > use solution B?
> > > > > > >
> > > > > > >  2. Scheme B ( refer to your suggestion )
> > > > > > >
> > > > > > >  Our rethinking approach is no 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-23 Thread Michael S. Tsirkin
On Fri, Sep 23, 2022 at 06:48:56PM +0800, Xuan Zhuo wrote:
> On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > > Michael doesn't want to use desc chain, it's not just a performance 
> > > > > > issue. In an
> > > > > > early email, he mentioned that desc chain may be abandoned in the 
> > > > > > future. So we
> > > > > > have been trying not to rely on desc chain.
> > > > >
> > > > > This seems to be a very large change which seems a little bit too
> > > > > early to be considered.
> > > >
> > > > I'd like to put it in other terms. Fundamentally devices are not
> > > > supposed to talk about descriptors at all. Descriptors are
> > > > a way to describe buffers, and devices should all work in terms
> > > > of buffers. I am working on cleaning up the spec from confusion
> > > > and terminology mixups. We have several major sources all over the spec:
> > > > - descriptor/buffer used inconsistently
> > > > - feature negotiated/offered used inconsistently
> > > > - field exists/valid used inconsistently
> > > >
> > > > My way to address the first issue is to make sure devices all work
> > > > with buffers. And buffers are described by descriptors (makes sense,
> > > > right?) and made available to device by driver and used by device.
> > >
> > > Can we try to keep desc info? I think it's a very important feature for
> > > virtio-net, and many NICs are designed based on this.
> > >
> > > Our current solutions B and C will waste a lot of memory. If the page 
> > > occupied
> > > by the header can be quickly reclaimed, then we have to do a copy in the 
> > > driver.
> >
> > Not sure I understand. Can you explain?
> 
> 
> For example B, we allocate two consecutive pages, the second page is used for
> payload. The first page is used to reserve the header. The first page is too
> wasteful, 4k of space, only the header is saved. In this way, we can copy away
> the data of the first page, so that the first page can be quickly reclaimed.
> 
> Why two consecutive pages? The payload achieves page alignment, and there is
> still some space in front of the page, so we must allocate two connected 
> pages.

Are we talking without mergeable here?


> >
> > > >
> > > > The advantage of this is layering - we can change the way buffers
> > > > are passed around without changing devices. And, it matches
> > > > the virtio API nicely.
> > > >
> > > > Existing devices are all fine with this - they do not pass any
> > > > information in the descriptor. Yes, this seems like an option to
> > > > pass some information around, but I am not convinced it is worth
> > > > the layering violation.
> > > >
> > > > By comparison, ability to write data at an offset seems generally
> > > > useful, in particular we have a very old issue even without
> > > > the split header feature where with mergeable buffers
> > > > if we attempt to align the data in the 1st buffer at a cache line
> > > > boundary by adding an offset before ETH header, then when it spills
> > > > over to the second buffer it will be misaligned there. Wastes
> > > > an extra cache line for such packets. Offsets can allow fixing this.
> > > >
> > >
> > > Scheme B In addition to the memory problem, under this scheme, if we want 
> > > to
> > > implement tcp zerocopy, then we must apply for two consecutive pages for a
> > > buffer. The second page is used to place the payload. The first is used 
> > > to place
> > > the header.
> >
> >
> > I just don't understand why there would be a difference. Explanation?
> 
> I'm concerned that allocating two contiguous pages in large numbers will cause
> unknowable problems(such perfermance)
> 
> Thanks.
> 
> 
> >
> > >
> > > >
> > > > I don't see architecturally what is wrong with making feature just
> > > > depend on mergeable buffers for now. We can always allow a combination
> > > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > > !MERGEABLE they should not fail probe, they should instead clear the
> > > > split header feature.
> > > >
> > >
> > > According to my understanding, for B, it does not depend on mergeable.
> > >
> > > If we give up desc info completely, then we prefer B.
> > >
> > > Hi Jason, what are your thoughts?
> > >
> > > Thanks.
> > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > If we can't make a feature depend only on mergeable, should we use 
> > > > > > solution B?
> > > > > >
> > > > > >  2. Scheme B ( refer to your suggestion )
> > > > > >
> > > > > >  Our rethinking approach is no longer based on descriptor chain.
> > > > > >
> > > > > >  We refer to your proposed offset-based scheme as scheme B.
> > > > >
> > > > > The offset seems to be the suggestion of Michael.
> > > > >
> > > > > I think I like the design of v7 for several reasons:
> > > > >
> > > > > 1) easy to reserve head/tailroom without any extension of the spec
> > > > > 2) easy to work with mergeable rx buffer
> > > > > 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-23 Thread Xuan Zhuo
On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin"  
wrote:
> On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > Michael doesn't want to use desc chain, it's not just a performance 
> > > > > issue. In an
> > > > > early email, he mentioned that desc chain may be abandoned in the 
> > > > > future. So we
> > > > > have been trying not to rely on desc chain.
> > > >
> > > > This seems to be a very large change which seems a little bit too
> > > > early to be considered.
> > >
> > > I'd like to put it in other terms. Fundamentally devices are not
> > > supposed to talk about descriptors at all. Descriptors are
> > > a way to describe buffers, and devices should all work in terms
> > > of buffers. I am working on cleaning up the spec from confusion
> > > and terminology mixups. We have several major sources all over the spec:
> > > - descriptor/buffer used inconsistently
> > > - feature negotiated/offered used inconsistently
> > > - field exists/valid used inconsistently
> > >
> > > My way to address the first issue is to make sure devices all work
> > > with buffers. And buffers are described by descriptors (makes sense,
> > > right?) and made available to device by driver and used by device.
> >
> > Can we try to keep desc info? I think it's a very important feature for
> > virtio-net, and many NICs are designed based on this.
> >
> > Our current solutions B and C will waste a lot of memory. If the page 
> > occupied
> > by the header can be quickly reclaimed, then we have to do a copy in the 
> > driver.
>
> Not sure I understand. Can you explain?


For example B, we allocate two consecutive pages, the second page is used for
payload. The first page is used to reserve the header. The first page is too
wasteful, 4k of space, only the header is saved. In this way, we can copy away
the data of the first page, so that the first page can be quickly reclaimed.

Why two consecutive pages? The payload achieves page alignment, and there is
still some space in front of the page, so we must allocate two connected pages.

>
> > >
> > > The advantage of this is layering - we can change the way buffers
> > > are passed around without changing devices. And, it matches
> > > the virtio API nicely.
> > >
> > > Existing devices are all fine with this - they do not pass any
> > > information in the descriptor. Yes, this seems like an option to
> > > pass some information around, but I am not convinced it is worth
> > > the layering violation.
> > >
> > > By comparison, ability to write data at an offset seems generally
> > > useful, in particular we have a very old issue even without
> > > the split header feature where with mergeable buffers
> > > if we attempt to align the data in the 1st buffer at a cache line
> > > boundary by adding an offset before ETH header, then when it spills
> > > over to the second buffer it will be misaligned there. Wastes
> > > an extra cache line for such packets. Offsets can allow fixing this.
> > >
> >
> > Scheme B In addition to the memory problem, under this scheme, if we want to
> > implement tcp zerocopy, then we must apply for two consecutive pages for a
> > buffer. The second page is used to place the payload. The first is used to 
> > place
> > the header.
>
>
> I just don't understand why there would be a difference. Explanation?

I'm concerned that allocating two contiguous pages in large numbers will cause
unknowable problems(such perfermance)

Thanks.


>
> >
> > >
> > > I don't see architecturally what is wrong with making feature just
> > > depend on mergeable buffers for now. We can always allow a combination
> > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > !MERGEABLE they should not fail probe, they should instead clear the
> > > split header feature.
> > >
> >
> > According to my understanding, for B, it does not depend on mergeable.
> >
> > If we give up desc info completely, then we prefer B.
> >
> > Hi Jason, what are your thoughts?
> >
> > Thanks.
> >
> > >
> > >
> > >
> > > > >
> > > > > If we can't make a feature depend only on mergeable, should we use 
> > > > > solution B?
> > > > >
> > > > >  2. Scheme B ( refer to your suggestion )
> > > > >
> > > > >  Our rethinking approach is no longer based on descriptor chain.
> > > > >
> > > > >  We refer to your proposed offset-based scheme as scheme B.
> > > >
> > > > The offset seems to be the suggestion of Michael.
> > > >
> > > > I think I like the design of v7 for several reasons:
> > > >
> > > > 1) easy to reserve head/tailroom without any extension of the spec
> > > > 2) easy to work with mergeable rx buffer
> > > > 3) it is the model used by modern NIC like e810 [1]
> > > >
> > > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > > it works which is similar to v7
> > > >
> > > > Thanks
> > > >
> > > > >  As you suggested, scheme B gives the device a buffer, using 
> > > > > offset to
> > > > >  indicate where to 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-23 Thread Michael S. Tsirkin
On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > Michael doesn't want to use desc chain, it's not just a performance 
> > > > issue. In an
> > > > early email, he mentioned that desc chain may be abandoned in the 
> > > > future. So we
> > > > have been trying not to rely on desc chain.
> > >
> > > This seems to be a very large change which seems a little bit too
> > > early to be considered.
> >
> > I'd like to put it in other terms. Fundamentally devices are not
> > supposed to talk about descriptors at all. Descriptors are
> > a way to describe buffers, and devices should all work in terms
> > of buffers. I am working on cleaning up the spec from confusion
> > and terminology mixups. We have several major sources all over the spec:
> > - descriptor/buffer used inconsistently
> > - feature negotiated/offered used inconsistently
> > - field exists/valid used inconsistently
> >
> > My way to address the first issue is to make sure devices all work
> > with buffers. And buffers are described by descriptors (makes sense,
> > right?) and made available to device by driver and used by device.
> 
> Can we try to keep desc info? I think it's a very important feature for
> virtio-net, and many NICs are designed based on this.
> 
> Our current solutions B and C will waste a lot of memory. If the page occupied
> by the header can be quickly reclaimed, then we have to do a copy in the 
> driver.

Not sure I understand. Can you explain?

> >
> > The advantage of this is layering - we can change the way buffers
> > are passed around without changing devices. And, it matches
> > the virtio API nicely.
> >
> > Existing devices are all fine with this - they do not pass any
> > information in the descriptor. Yes, this seems like an option to
> > pass some information around, but I am not convinced it is worth
> > the layering violation.
> >
> > By comparison, ability to write data at an offset seems generally
> > useful, in particular we have a very old issue even without
> > the split header feature where with mergeable buffers
> > if we attempt to align the data in the 1st buffer at a cache line
> > boundary by adding an offset before ETH header, then when it spills
> > over to the second buffer it will be misaligned there. Wastes
> > an extra cache line for such packets. Offsets can allow fixing this.
> >
> 
> Scheme B In addition to the memory problem, under this scheme, if we want to
> implement tcp zerocopy, then we must apply for two consecutive pages for a
> buffer. The second page is used to place the payload. The first is used to 
> place
> the header.


I just don't understand why there would be a difference. Explanation?

> 
> >
> > I don't see architecturally what is wrong with making feature just
> > depend on mergeable buffers for now. We can always allow a combination
> > down the road. Let's just make it clear that if drivers see SPLIT &&
> > !MERGEABLE they should not fail probe, they should instead clear the
> > split header feature.
> >
> 
> According to my understanding, for B, it does not depend on mergeable.
> 
> If we give up desc info completely, then we prefer B.
> 
> Hi Jason, what are your thoughts?
> 
> Thanks.
> 
> >
> >
> >
> > > >
> > > > If we can't make a feature depend only on mergeable, should we use 
> > > > solution B?
> > > >
> > > >  2. Scheme B ( refer to your suggestion )
> > > >
> > > >  Our rethinking approach is no longer based on descriptor chain.
> > > >
> > > >  We refer to your proposed offset-based scheme as scheme B.
> > >
> > > The offset seems to be the suggestion of Michael.
> > >
> > > I think I like the design of v7 for several reasons:
> > >
> > > 1) easy to reserve head/tailroom without any extension of the spec
> > > 2) easy to work with mergeable rx buffer
> > > 3) it is the model used by modern NIC like e810 [1]
> > >
> > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > it works which is similar to v7
> > >
> > > Thanks
> > >
> > > >  As you suggested, scheme B gives the device a buffer, using offset 
> > > > to
> > > >  indicate where to place the payload. Like this:
> > > >
> > > >  .. 
> > > >
> > > >  But how to apply for this buffer?
> > > >  Since we want the payload to be placed on a separate page, the 
> > > > method
> > > >  we consider is to directly alloc two pages from driver of 
> > > > contiguous memory.
> > > >
> > > >  Then the beginning of this contiguous memory is used to store the 
> > > > headroom,
> > > >  and the contiguous memory after the headroom is directly handed 
> > > > over to the device.
> > > >  Similar to the following:
> > > >
> > > >  [-- receive buffer(2 pages) 
> > > > --]
> > > >  [<-- second page 
> > > > >]
> > > >  [<-> ..<   payload   
> > > >   >]
> > > > ^^
> > > > ||
> > 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-22 Thread Michael S. Tsirkin
On Fri, Sep 23, 2022 at 12:04:28PM +0800, Jason Wang wrote:
> On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo  wrote:
> >
> > On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang  wrote:
> > > On Tue, Sep 20, 2022 at 11:28 AM Heng Qi  wrote:
> > > >
> > > > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/9/16 10:56, hengqi 写道:
> > > > > >From: Xuan Zhuo 
> > > > > >
> > > > > >The purpose of this feature is to split the transport header and the 
> > > > > >payload
> > > > > >of the packet.
> > > > > >
> > > > > >| receive buffer1(page)| receive 
> > > > > >buffer2(page) |
> > > > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|
> > > > > >payload|
> > > > > >  |<--->|
> > > > > >   ^
> > > > > >   |
> > > > > >max_len
> > > > > >
> > > > > >We can use one page for every receive buffer. In this way, we can 
> > > > > >ensure that
> > > > > >all payloads can be independently in a page, which is very 
> > > > > >beneficial for
> > > > > >the zerocopy implemented by the upper layer.
> > > > > >
> > > > > >Signed-off-by: Xuan Zhuo 
> > > > > >Signed-off-by: Heng Qi 
> > > > > >Reviewed-by: Kangjie Xu 
> > > > > >---
> > > > > >v8:
> > > > > > 1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > > > 2. Add \field{offset} and \field{max_len}.
> > > > > > 3. Fix some presentation issues. @Jason Wang
> > > > > > 4. Clarify some paragraphs.
> > > > > >
> > > > > >v7:
> > > > > > 1. Fix some presentation issues.
> > > > > > 2. Use "split transport header". @Jason Wang
> > > > > > 3. Clarify some paragraphs. @Cornelia Huck
> > > > > > 4. determine the device what to do if it does not perform 
> > > > > > header split on a packet.
> > > > > >
> > > > > >v6:
> > > > > > 1. Fix some syntax issues. @Cornelia Huck
> > > > > > 2. Clarify some paragraphs. @Cornelia Huck
> > > > > > 3. Determine the device what to do if it does not perform 
> > > > > > header split on a packet.
> > > > > >
> > > > > >v5:
> > > > > > 1. Determine when hdr_len is credible in the process of rx
> > > > > > 2. Clean up the use of buffers and descriptors
> > > > > > 3. Clarify the meaning of used lenght if the first descriptor 
> > > > > > is skipped in the case of merge
> > > > > >
> > > > > >v4:
> > > > > > 1. fix typo @Cornelia Huck @Jason Wang
> > > > > > 2. do not split header for IP fragmentation packet. @Jason Wang
> > > > > >
> > > > > >  conformance.tex |  2 ++
> > > > > >  content.tex | 91 
> > > > > > +
> > > > > >  2 files changed, 93 insertions(+)
> > > > > >
> > > > > >diff --git a/conformance.tex b/conformance.tex
> > > > > >index 2b86fc6..4e2b82e 100644
> > > > > >--- a/conformance.tex
> > > > > >+++ b/conformance.tex
> > > > > >@@ -150,6 +150,7 @@ \section{Conformance 
> > > > > >Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Offloads State Configuration / 
> > > > > > Setting Offloads State}
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >+\item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > >Operation / Control Virtqueue / Split Transport Header}
> > > > > >  \end{itemize}
> > > > > >  \conformance{\subsection}{Block Driver 
> > > > > > Conformance}\label{sec:Conformance / Driver Conformance / Block 
> > > > > > Driver Conformance}
> > > > > >@@ -415,6 +416,7 @@ \section{Conformance 
> > > > > >Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Automatic receive steering in 
> > > > > > multiqueue mode}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS 
> > > > > > processing}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >+\item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > >Operation / Control Virtqueue / Split Transport Header}
> > > > > >  \end{itemize}
> > > > > >  \conformance{\subsection}{Block Device 
> > > > > > Conformance}\label{sec:Conformance / Device Conformance / Block 
> > > > > > Device Conformance}
> > > > > >diff --git a/content.tex b/content.tex
> > > > > >index e863709..fad9dea 100644
> > 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-22 Thread Jason Wang
On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo  wrote:
>
> On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang  wrote:
> > On Tue, Sep 20, 2022 at 11:28 AM Heng Qi  wrote:
> > >
> > > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/9/16 10:56, hengqi 写道:
> > > > >From: Xuan Zhuo 
> > > > >
> > > > >The purpose of this feature is to split the transport header and the 
> > > > >payload
> > > > >of the packet.
> > > > >
> > > > >| receive buffer1(page)| receive 
> > > > >buffer2(page) |
> > > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|
> > > > >payload|
> > > > >  |<--->|
> > > > >   ^
> > > > >   |
> > > > >max_len
> > > > >
> > > > >We can use one page for every receive buffer. In this way, we can 
> > > > >ensure that
> > > > >all payloads can be independently in a page, which is very beneficial 
> > > > >for
> > > > >the zerocopy implemented by the upper layer.
> > > > >
> > > > >Signed-off-by: Xuan Zhuo 
> > > > >Signed-off-by: Heng Qi 
> > > > >Reviewed-by: Kangjie Xu 
> > > > >---
> > > > >v8:
> > > > > 1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > > 2. Add \field{offset} and \field{max_len}.
> > > > > 3. Fix some presentation issues. @Jason Wang
> > > > > 4. Clarify some paragraphs.
> > > > >
> > > > >v7:
> > > > > 1. Fix some presentation issues.
> > > > > 2. Use "split transport header". @Jason Wang
> > > > > 3. Clarify some paragraphs. @Cornelia Huck
> > > > > 4. determine the device what to do if it does not perform header 
> > > > > split on a packet.
> > > > >
> > > > >v6:
> > > > > 1. Fix some syntax issues. @Cornelia Huck
> > > > > 2. Clarify some paragraphs. @Cornelia Huck
> > > > > 3. Determine the device what to do if it does not perform header 
> > > > > split on a packet.
> > > > >
> > > > >v5:
> > > > > 1. Determine when hdr_len is credible in the process of rx
> > > > > 2. Clean up the use of buffers and descriptors
> > > > > 3. Clarify the meaning of used lenght if the first descriptor is 
> > > > > skipped in the case of merge
> > > > >
> > > > >v4:
> > > > > 1. fix typo @Cornelia Huck @Jason Wang
> > > > > 2. do not split header for IP fragmentation packet. @Jason Wang
> > > > >
> > > > >  conformance.tex |  2 ++
> > > > >  content.tex | 91 
> > > > > +
> > > > >  2 files changed, 93 insertions(+)
> > > > >
> > > > >diff --git a/conformance.tex b/conformance.tex
> > > > >index 2b86fc6..4e2b82e 100644
> > > > >--- a/conformance.tex
> > > > >+++ b/conformance.tex
> > > > >@@ -150,6 +150,7 @@ \section{Conformance 
> > > > >Targets}\label{sec:Conformance / Conformance Targets}
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Offloads State Configuration / 
> > > > > Setting Offloads State}
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > > >+\item \ref{drivernormative:Device Types / Network Device / Device 
> > > > >Operation / Control Virtqueue / Split Transport Header}
> > > > >  \end{itemize}
> > > > >  \conformance{\subsection}{Block Driver 
> > > > > Conformance}\label{sec:Conformance / Driver Conformance / Block 
> > > > > Driver Conformance}
> > > > >@@ -415,6 +416,7 @@ \section{Conformance 
> > > > >Targets}\label{sec:Conformance / Conformance Targets}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Automatic receive steering in 
> > > > > multiqueue mode}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS 
> > > > > processing}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > > >+\item \ref{devicenormative:Device Types / Network Device / Device 
> > > > >Operation / Control Virtqueue / Split Transport Header}
> > > > >  \end{itemize}
> > > > >  \conformance{\subsection}{Block Device 
> > > > > Conformance}\label{sec:Conformance / Device Conformance / Block 
> > > > > Device Conformance}
> > > > >diff --git a/content.tex b/content.tex
> > > > >index e863709..fad9dea 100644
> > > > >--- a/content.tex
> > > > >+++ b/content.tex
> > > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types 
> > > > >/ Network Device / Feature bits
> > > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > >  

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-21 Thread Jason Wang
On Wed, Sep 21, 2022 at 2:20 PM Jason Wang  wrote:
>
> On Tue, Sep 20, 2022 at 11:28 AM Heng Qi  wrote:
> >
> > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > >
> > > 在 2022/9/16 10:56, hengqi 写道:
> > > >From: Xuan Zhuo 
> > > >
> > > >The purpose of this feature is to split the transport header and the 
> > > >payload
> > > >of the packet.
> > > >
> > > >| receive buffer1(page)| receive 
> > > >buffer2(page) |
> > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|payload  
> > > >  |
> > > >  |<--->|
> > > >   ^
> > > >   |
> > > >max_len
> > > >
> > > >We can use one page for every receive buffer. In this way, we can ensure 
> > > >that
> > > >all payloads can be independently in a page, which is very beneficial for
> > > >the zerocopy implemented by the upper layer.
> > > >
> > > >Signed-off-by: Xuan Zhuo 
> > > >Signed-off-by: Heng Qi 
> > > >Reviewed-by: Kangjie Xu 
> > > >---
> > > >v8:
> > > > 1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > 2. Add \field{offset} and \field{max_len}.
> > > > 3. Fix some presentation issues. @Jason Wang
> > > > 4. Clarify some paragraphs.
> > > >
> > > >v7:
> > > > 1. Fix some presentation issues.
> > > > 2. Use "split transport header". @Jason Wang
> > > > 3. Clarify some paragraphs. @Cornelia Huck
> > > > 4. determine the device what to do if it does not perform header 
> > > > split on a packet.
> > > >
> > > >v6:
> > > > 1. Fix some syntax issues. @Cornelia Huck
> > > > 2. Clarify some paragraphs. @Cornelia Huck
> > > > 3. Determine the device what to do if it does not perform header 
> > > > split on a packet.
> > > >
> > > >v5:
> > > > 1. Determine when hdr_len is credible in the process of rx
> > > > 2. Clean up the use of buffers and descriptors
> > > > 3. Clarify the meaning of used lenght if the first descriptor is 
> > > > skipped in the case of merge
> > > >
> > > >v4:
> > > > 1. fix typo @Cornelia Huck @Jason Wang
> > > > 2. do not split header for IP fragmentation packet. @Jason Wang
> > > >
> > > >  conformance.tex |  2 ++
> > > >  content.tex | 91 
> > > > +
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > >diff --git a/conformance.tex b/conformance.tex
> > > >index 2b86fc6..4e2b82e 100644
> > > >--- a/conformance.tex
> > > >+++ b/conformance.tex
> > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance 
> > > >/ Conformance Targets}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Offloads State Configuration / Setting 
> > > > Offloads State}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{drivernormative:Device Types / Network Device / Device 
> > > >Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Driver 
> > > > Conformance}\label{sec:Conformance / Driver Conformance / Block Driver 
> > > > Conformance}
> > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance 
> > > >/ Conformance Targets}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Automatic receive steering in 
> > > > multiqueue mode}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS 
> > > > processing}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > > Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{devicenormative:Device Types / Network Device / Device 
> > > >Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Device 
> > > > Conformance}\label{sec:Conformance / Device Conformance / Block Device 
> > > > Conformance}
> > > >diff --git a/content.tex b/content.tex
> > > >index e863709..fad9dea 100644
> > > >--- a/content.tex
> > > >+++ b/content.tex
> > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
> > > >Network Device / Feature bits
> > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > >  channel.
> > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports 
> > > >splitting
> > > >+the transport header and the payload.
> > > >+
> > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications 
> > > > coalescing.
> > > >  

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-21 Thread Jason Wang
On Tue, Sep 20, 2022 at 11:28 AM Heng Qi  wrote:
>
> On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> >
> > 在 2022/9/16 10:56, hengqi 写道:
> > >From: Xuan Zhuo 
> > >
> > >The purpose of this feature is to split the transport header and the 
> > >payload
> > >of the packet.
> > >
> > >| receive buffer1(page)| receive 
> > >buffer2(page) |
> > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|payload
> > >|
> > >  |<--->|
> > >   ^
> > >   |
> > >max_len
> > >
> > >We can use one page for every receive buffer. In this way, we can ensure 
> > >that
> > >all payloads can be independently in a page, which is very beneficial for
> > >the zerocopy implemented by the upper layer.
> > >
> > >Signed-off-by: Xuan Zhuo 
> > >Signed-off-by: Heng Qi 
> > >Reviewed-by: Kangjie Xu 
> > >---
> > >v8:
> > > 1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > 2. Add \field{offset} and \field{max_len}.
> > > 3. Fix some presentation issues. @Jason Wang
> > > 4. Clarify some paragraphs.
> > >
> > >v7:
> > > 1. Fix some presentation issues.
> > > 2. Use "split transport header". @Jason Wang
> > > 3. Clarify some paragraphs. @Cornelia Huck
> > > 4. determine the device what to do if it does not perform header 
> > > split on a packet.
> > >
> > >v6:
> > > 1. Fix some syntax issues. @Cornelia Huck
> > > 2. Clarify some paragraphs. @Cornelia Huck
> > > 3. Determine the device what to do if it does not perform header 
> > > split on a packet.
> > >
> > >v5:
> > > 1. Determine when hdr_len is credible in the process of rx
> > > 2. Clean up the use of buffers and descriptors
> > > 3. Clarify the meaning of used lenght if the first descriptor is 
> > > skipped in the case of merge
> > >
> > >v4:
> > > 1. fix typo @Cornelia Huck @Jason Wang
> > > 2. do not split header for IP fragmentation packet. @Jason Wang
> > >
> > >  conformance.tex |  2 ++
> > >  content.tex | 91 
> > > +
> > >  2 files changed, 93 insertions(+)
> > >
> > >diff --git a/conformance.tex b/conformance.tex
> > >index 2b86fc6..4e2b82e 100644
> > >--- a/conformance.tex
> > >+++ b/conformance.tex
> > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
> > >Conformance Targets}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Offloads State Configuration / Setting 
> > > Offloads State}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > >  \item \ref{drivernormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Notifications Coalescing}
> > >+\item \ref{drivernormative:Device Types / Network Device / Device 
> > >Operation / Control Virtqueue / Split Transport Header}
> > >  \end{itemize}
> > >  \conformance{\subsection}{Block Driver 
> > > Conformance}\label{sec:Conformance / Driver Conformance / Block Driver 
> > > Conformance}
> > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
> > >Conformance Targets}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Automatic receive steering in multiqueue 
> > > mode}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS 
> > > processing}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device 
> > > Operation / Control Virtqueue / Notifications Coalescing}
> > >+\item \ref{devicenormative:Device Types / Network Device / Device 
> > >Operation / Control Virtqueue / Split Transport Header}
> > >  \end{itemize}
> > >  \conformance{\subsection}{Block Device 
> > > Conformance}\label{sec:Conformance / Device Conformance / Block Device 
> > > Conformance}
> > >diff --git a/content.tex b/content.tex
> > >index e863709..fad9dea 100644
> > >--- a/content.tex
> > >+++ b/content.tex
> > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
> > >Network Device / Feature bits
> > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >  channel.
> > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > >+the transport header and the payload.
> > >+
> > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications 
> > > coalescing.
> > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit 
> > >requirements}\label{sec:Device Types / Network Device
> > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > >  \item[VIRTIO_NET_F_RSC_EXT] Requires 

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-19 Thread Jason Wang



在 2022/9/16 10:56, hengqi 写道:

From: Xuan Zhuo 

The purpose of this feature is to split the transport header and the payload
of the packet.

| receive buffer1(page)| receive buffer2(page) |
|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|payload|
  |<--->|
   ^
   |
max_len

We can use one page for every receive buffer. In this way, we can ensure that
all payloads can be independently in a page, which is very beneficial for
the zerocopy implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---
v8:
1. Do not depend on descriptor chain. @Michael S. Tsirkin
2. Add \field{offset} and \field{max_len}.
3. Fix some presentation issues. @Jason Wang
4. Clarify some paragraphs.

v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

  conformance.tex |  2 ++
  content.tex | 91 +
  2 files changed, 93 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..fad9dea 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and 
VIRTIO_NET_F_MRG_RXBUF.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define 

[virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-09-15 Thread hengqi
From: Xuan Zhuo 

The purpose of this feature is to split the transport header and the payload
of the packet.

| receive buffer1(page)| receive buffer2(page) |
|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|payload|
 |<--->|
  ^
  |
   max_len

We can use one page for every receive buffer. In this way, we can ensure that
all payloads can be independently in a page, which is very beneficial for
the zerocopy implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---
v8:
1. Do not depend on descriptor chain. @Michael S. Tsirkin
2. Add \field{offset} and \field{max_len}.
3. Fix some presentation issues. @Jason Wang
4. Clarify some paragraphs.

v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

 conformance.tex |  2 ++
 content.tex | 91 +
 2 files changed, 93 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..fad9dea 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and 
VIRTIO_NET_F_MRG_RXBUF.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
 #define VIRTIO_NET_HDR_F_DATA_VALID2
 #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
 u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE