On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanz...@linux.alibaba.com> wrote: > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasow...@redhat.com> wrote: > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanz...@linux.alibaba.com> > > wrote: > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasow...@redhat.com> > > > wrote: > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanz...@linux.alibaba.com> > > > > wrote: > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable. > > > > > But a subsequent patch aims to make the big mode to support > > > > > premapped mode. This requires additional space to store the dma addr. > > > > > > > > > > Within the sub-struct that contains the 'private', there is no > > > > > suitable > > > > > variable for storing the DMA addr. > > > > > > > > > > struct { /* Page cache and anonymous pages */ > > > > > /** > > > > > * @lru: Pageout list, eg. active_list > > > > > protected by > > > > > * lruvec->lru_lock. Sometimes used as a > > > > > generic list > > > > > * by the page owner. > > > > > */ > > > > > union { > > > > > struct list_head lru; > > > > > > > > > > /* Or, for the Unevictable "LRU list" > > > > > slot */ > > > > > struct { > > > > > /* Always even, to negate > > > > > PageTail */ > > > > > void *__filler; > > > > > /* Count page's or folio's > > > > > mlocks */ > > > > > unsigned int mlock_count; > > > > > }; > > > > > > > > > > /* Or, free page */ > > > > > struct list_head buddy_list; > > > > > struct list_head pcp_list; > > > > > }; > > > > > /* See page-flags.h for PAGE_MAPPING_FLAGS */ > > > > > struct address_space *mapping; > > > > > union { > > > > > pgoff_t index; /* Our offset > > > > > within mapping. */ > > > > > unsigned long share; /* share > > > > > count for fsdax */ > > > > > }; > > > > > /** > > > > > * @private: Mapping-private opaque data. > > > > > * Usually used for buffer_heads if > > > > > PagePrivate. > > > > > * Used for swp_entry_t if PageSwapCache. > > > > > * Indicates order in the buddy system if > > > > > PageBuddy. > > > > > */ > > > > > unsigned long private; > > > > > }; > > > > > > > > > > But within the page pool struct, we have a variable called > > > > > dma_addr that is appropriate for storing dma addr. > > > > > And that struct is used by netstack. That works to our advantage. > > > > > > > > > > struct { /* page_pool used by netstack */ > > > > > /** > > > > > * @pp_magic: magic value to avoid recycling > > > > > non > > > > > * page_pool allocated pages. > > > > > */ > > > > > unsigned long pp_magic; > > > > > struct page_pool *pp; > > > > > unsigned long _pp_mapping_pad; > > > > > unsigned long dma_addr; > > > > > atomic_long_t pp_ref_count; > > > > > }; > > > > > > > > > > On the other side, we should use variables from the same sub-struct. > > > > > So this patch replaces the "private" with "pp". > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanz...@linux.alibaba.com> > > > > > --- > > > > > > > > Instead of doing a customized version of page pool, can we simply > > > > switch to use page pool for big mode instead? Then we don't need to > > > > bother the dma stuffs. > > > > > > > > > The page pool needs to do the dma by the DMA APIs. > > > So we can not use the page pool directly. > > > > I found this: > > > > define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA > > * map/unmap > > > > It seems to work here? > > > I have studied the page pool mechanism and believe that we cannot use it > directly. We can make the page pool to bypass the DMA operations. > This allows us to handle DMA within virtio-net for pages allocated from the > page > pool. Furthermore, we can utilize page pool helpers to associate the DMA > address > to the page. > > However, the critical issue pertains to unmapping. Ideally, we want to return > the mapped pages to the page pool and reuse them. In doing so, we can omit the > unmapping and remapping steps. > > Currently, there's a caveat: when the page pool cache is full, it disconnects > and releases the pages. When the pool hits its capacity, pages are > relinquished > without a chance for unmapping.
Technically, when ptr_ring is full there could be a fallback, but then it requires expensive synchronization between producer and consumer. For virtio-net, it might not be a problem because add/get has been synchronized. (It might be relaxed in the future, actually we've already seen a requirement in the past for virito-blk). > If we were to unmap pages each time before > returning them to the pool, we would negate the benefits of bypassing the > mapping and unmapping process altogether. Yes, but the problem in this approach is that it creates a corner exception where dma_addr is used outside the page pool. Maybe for big mode it doesn't matter too much if there's no performance improvement. Thanks > > Thanks. > > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > >