On Wed, 24 Apr 2024 11:50:44 +0800, Jason Wang <jasow...@redhat.com> wrote: > On Wed, Apr 24, 2024 at 10:58 AM Xuan Zhuo <xuanz...@linux.alibaba.com> wrote: > > > > On Wed, 24 Apr 2024 10:45:49 +0800, Jason Wang <jasow...@redhat.com> wrote: > > > On Wed, Apr 24, 2024 at 10:42 AM Xuan Zhuo <xuanz...@linux.alibaba.com> > > > wrote: > > > > > > > > On Wed, 24 Apr 2024 10:34:56 +0800, Jason Wang <jasow...@redhat.com> > > > > wrote: > > > > > On Wed, Apr 24, 2024 at 9:10 AM Xuan Zhuo > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > On Wed, 24 Apr 2024 08:43:21 +0800, Jason Wang > > > > > > <jasow...@redhat.com> wrote: > > > > > > > On Tue, Apr 23, 2024 at 8:38 PM Xuan Zhuo > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Tue, 23 Apr 2024 12:36:42 +0800, Jason Wang > > > > > > > > <jasow...@redhat.com> wrote: > > > > > > > > > On Mon, Apr 22, 2024 at 3:24 PM Xuan Zhuo > > > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > In big mode, pre-mapping DMA is beneficial because if the > > > > > > > > > > pages are not > > > > > > > > > > used, we can reuse them without needing to unmap and remap. > > > > > > > > > > > > > > > > > > > > We require space to store the DMA address. I use the > > > > > > > > > > page.dma_addr to > > > > > > > > > > store the DMA address from the pp structure inside the page. > > > > > > > > > > > > > > > > > > > > Every page retrieved from get_a_page() is mapped, and its > > > > > > > > > > DMA address is > > > > > > > > > > stored in page.dma_addr. When a page is returned to the > > > > > > > > > > chain, we check > > > > > > > > > > the DMA status; if it is not mapped (potentially having > > > > > > > > > > been unmapped), > > > > > > > > > > we remap it before returning it to the chain. > > > > > > > > > > > > > > > > > > > > Based on the following points, we do not use page pool to > > > > > > > > > > manage these > > > > > > > > > > pages: > > > > > > > > > > > > > > > > > > > > 1. virtio-net uses the DMA APIs wrapped by virtio core. > > > > > > > > > > Therefore, > > > > > > > > > > we can only prevent the page pool from performing DMA > > > > > > > > > > operations, and > > > > > > > > > > let the driver perform DMA operations on the allocated > > > > > > > > > > pages. > > > > > > > > > > 2. But when the page pool releases the page, we have no > > > > > > > > > > chance to > > > > > > > > > > execute dma unmap. > > > > > > > > > > 3. A solution to #2 is to execute dma unmap every time > > > > > > > > > > before putting > > > > > > > > > > the page back to the page pool. (This is actually a > > > > > > > > > > waste, we don't > > > > > > > > > > execute unmap so frequently.) > > > > > > > > > > 4. But there is another problem, we still need to use > > > > > > > > > > page.dma_addr to > > > > > > > > > > save the dma address. Using page.dma_addr while using > > > > > > > > > > page pool is > > > > > > > > > > unsafe behavior. > > > > > > > > > > > > > > > > > > > > More: > > > > > > > > > > > > > > > > > > > > https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9n7hqyi+m...@mail.gmail.com/ > > > > > > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanz...@linux.alibaba.com> > > > > > > > > > > --- > > > > > > > > > > drivers/net/virtio_net.c | 123 > > > > > > > > > > ++++++++++++++++++++++++++++++++++----- > > > > > > > > > > 1 file changed, 108 insertions(+), 15 deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c > > > > > > > > > > b/drivers/net/virtio_net.c > > > > > > > > > > index 2c7a67ad4789..d4f5e65b247e 100644 > > > > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > > > > @@ -439,6 +439,81 @@ skb_vnet_common_hdr(struct sk_buff > > > > > > > > > > *skb) > > > > > > > > > > return (struct virtio_net_common_hdr *)skb->cb; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t > > > > > > > > > > addr, u32 len) > > > > > > > > > > +{ > > > > > > > > > > + sg->dma_address = addr; > > > > > > > > > > + sg->length = len; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* For pages submitted to the ring, we need to record its > > > > > > > > > > dma for unmap. > > > > > > > > > > + * Here, we use the page.dma_addr and page.pp_magic to > > > > > > > > > > store the dma > > > > > > > > > > + * address. > > > > > > > > > > + */ > > > > > > > > > > +static void page_chain_set_dma(struct page *p, dma_addr_t > > > > > > > > > > addr) > > > > > > > > > > +{ > > > > > > > > > > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) { > > > > > > > > > > > > > > > > > > Need a macro like PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA. > > > > > > > > > > > > > > > > > > > + p->dma_addr = lower_32_bits(addr); > > > > > > > > > > + p->pp_magic = upper_32_bits(addr); > > > > > > > > > > > > > > > > > > And this uses three fields on page_pool which I'm not sure > > > > > > > > > the other > > > > > > > > > maintainers are happy with. For example, re-using pp_maing > > > > > > > > > might be > > > > > > > > > dangerous. See c07aea3ef4d40 ("mm: add a signature in struct > > > > > > > > > page"). > > > > > > > > > > > > > > > > > > I think a more safe way is to reuse page pool, for example > > > > > > > > > introducing > > > > > > > > > a new flag with dma callbacks? > > > > > > > > > > > > > > > > If we use page pool, how can we chain the pages allocated for a > > > > > > > > packet? > > > > > > > > > > > > > > I'm not sure I get this, it is chained via the descriptor flag. > > > > > > > > > > > > > > > > > > In the big mode, we will commit many pages to the virtio core by > > > > > > virtqueue_add_inbuf(). > > > > > > > > > > > > By virtqueue_get_buf_ctx(), we got the data. That is the first page. > > > > > > Other pages are chained by the "private". > > > > > > > > > > > > If we use the page pool, how can we chain the pages. > > > > > > After virtqueue_add_inbuf(), we need to get the pages to fill the > > > > > > skb. > > > > > > > > > > Right, technically it could be solved by providing helpers in the > > > > > virtio core, but considering it's an optimization for big mode which > > > > > is not popular, it's not worth to bother. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yon know the "private" can not be used. > > > > > > > > > > > > > > > > > > > > > > > > If the pp struct inside the page is not safe, how about: > > > > > > > > > > > > > > > > struct { /* Page cache and anonymous > > > > > > > > pages */ > > > > > > > > /** > > > > > > > > * @lru: Pageout list, eg. active_list > > > > > > > > protected by > > > > > > > > * lruvec->lru_lock. Sometimes used as > > > > > > > > a generic list > > > > > > > > * by the page owner. > > > > > > > > */ > > > > > > > > union { > > > > > > > > struct list_head lru; > > > > > > > > > > > > > > > > /* Or, for the Unevictable "LRU > > > > > > > > list" slot */ > > > > > > > > struct { > > > > > > > > /* Always even, to > > > > > > > > negate PageTail */ > > > > > > > > void *__filler; > > > > > > > > /* Count page's or > > > > > > > > folio's mlocks */ > > > > > > > > unsigned int > > > > > > > > mlock_count; > > > > > > > > }; > > > > > > > > > > > > > > > > /* Or, free page */ > > > > > > > > struct list_head buddy_list; > > > > > > > > struct list_head pcp_list; > > > > > > > > }; > > > > > > > > /* See page-flags.h for > > > > > > > > PAGE_MAPPING_FLAGS */ > > > > > > > > struct address_space *mapping; > > > > > > > > union { > > > > > > > > pgoff_t index; /* Our > > > > > > > > offset within mapping. */ > > > > > > > > unsigned long share; /* > > > > > > > > share count for fsdax */ > > > > > > > > }; > > > > > > > > /** > > > > > > > > * @private: Mapping-private opaque > > > > > > > > data. > > > > > > > > * Usually used for buffer_heads if > > > > > > > > PagePrivate. > > > > > > > > * Used for swp_entry_t if > > > > > > > > PageSwapCache. > > > > > > > > * Indicates order in the buddy system > > > > > > > > if PageBuddy. > > > > > > > > */ > > > > > > > > unsigned long private; > > > > > > > > }; > > > > > > > > > > > > > > > > Or, we can map the private space of the page as a new structure. > > > > > > > > > > > > > > It could be a way. But such allocation might be huge if we are > > > > > > > using > > > > > > > indirect descriptors or I may miss something. > > > > > > > > > > > > No. we only need to store the "chain next" and the dma as this > > > > > > patch set did. > > > > > > The size of the private space inside the page is > > > > > > 20(32bit)/40(64bit) bytes. > > > > > > That is enough for us. > > > > > > > > > > > > If you worry about the change of the pp structure, we can use the > > > > > > "private" as > > > > > > origin and use the "struct list_head lru" to store the dma. > > > > > > > > > > This looks even worse, as it uses fields belonging to the different > > > > > structures in the union. > > > > > > > > I mean we do not use the elems from the pp structure inside the page, > > > > if we worry the change of the pp structure. > > > > > > > > I mean use the "private" and "lru", these in the same structure. > > > > > > > > I think this is a good way. > > > > > > > > Thanks. > > > > > > See this: > > > > > > https://lore.kernel.org/netdev/20210411114307.5087f958@carbon/ > > > > > > I think that is because that the page pool will share the page with > > the skbs. I'm not entirely sure. > > > > In our case, virtio-net fully owns the page. After the page is referenced > > by skb, > > virtio-net no longer references the page. I don't think there is any problem > > here. > > Well, in the rx path, though the page is allocated by the virtio-net, > unlike the page pool, those pages are not freed by virtio-net. So it > may leave things in the page structure which is problematic. I don't > think we can introduce a virtio-net specific hook for kfree_skb() in > this case. That's why I think leveraging the page pool is better. > > For reusing page pool. Maybe we can reuse __pp_mapping_pad for > virtio-net specific use cases like chaining, and clear it in > page_pool_clear_pp_info(). And we need to make sure we don't break > things like TCP RX zerocopy since mapping is aliasied with > __pp_mapping_pad at a first glance. > > > > > The key is that who owns the page, who can use the page private space > > (20/40 bytes). > > > > Is that? > > I'm not saying we can't investigate in this direction. But it needs > more comments from mm guys and we need to evaluate the price we pay > for that. > > The motivation is to drop the fallback code when pre mapping is not > supported to improve the maintainability of the code and ease the > AF_XDP support for virtio-net. But it turns out to be not easy. > > Considering the rx fallback code we need to maintain is not too huge, > maybe we can leave it as is, for example forbid AF_XDP in big modes.
I see. Thanks. > > Thanks > > > > > Thanks. > > > > > > > > > > Thanks > > > > > >