Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-30 Thread Christoph Hellwig
On Thu, Apr 20, 2023 at 06:42:17PM +0200, Alexander Lobakin wrote:
> When there's no recycling of pages, then yes. And since recycling is
> done asynchronously, sometimes new allocations happen either way.
> Anyways, that was roughly a couple years ago right when you introduced
> dma_alloc_noncoherent(). Things might've been changed since then.
> I could try again while next is closed (i.e. starting this Sunday), the
> only thing I'd like to mention: Page Pool allocates pages via
> alloc_pages_bulk_array_node(). Bulking helps a lot (and PP uses bulks of
> 16 IIRC), explicit node setting helps when Rx queues are distributed
> between several nodes. We can then have one struct device for several nodes.
> As I can see, there's now no function to allocate in bulks and no
> explicit node setting option (e.g. mlx5 works around this using
> set_dev_node() + allocate + set_dev_node(orig_node)). Could such options
> be added in near future? That would help a lot switching to the
> functions intended for use when DMA mappings can stay for a long time.
> >From what I see from the code, that shouldn't be a problem (except for
> non-direct DMA cases, where we'd need to introduce new callbacks or
> extend the existing ones).

So the node hint is something we can triviall pass through, and
something the mlx5 maintainers should have done from the beginning
instead of this nasty hack.  Patches gladly accepted.

A alloc_pages_bulk_array_node-like allocator also seems doable, we
just need to make sure it has a decent fallback as I don't think
we can wire it up to all the crazy legacy iommu drivers.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-30 Thread Christoph Hellwig
On Tue, Apr 25, 2023 at 04:12:05AM -0400, Michael S. Tsirkin wrote:
> In theory, absolutely. In practice modern virtio devices are ok,
> the reason we are stuck supporting old legacy ones is because legacy
> devices are needed to run old guests. And then people turn
> around and run a new guest on the same device,
> for example because they switch back and forth e.g.
> for data recovery? Or because whoever is selling the
> host wants to opt for maximum compatibility.
> 
> Teaching all of linux to sometimes use dma and sometimes not
> is a lot of work, and for limited benefit of these legacy systems.

It's not like virtio is the only case where blindly assuming
your can use DMA operations in a higher layer is the problem..
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-25 Thread Michael S. Tsirkin
On Thu, Apr 20, 2023 at 09:18:21AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 20, 2023 at 05:11:48PM +0800, Xuan Zhuo wrote:
> > I know that the current design of DMA API only supports some physical 
> > hardware,
> > but can it be modified or expanded?
> 
> I think the important point is that for some cases there is no need
> to dma map at all, and upper layers should be fine by that by just
> doing the dma mapping in helpers called by the driver.
> 
> The virtio drivers then check if platform_access is set, then call the
> generic dma mapping helper, or if not just allocate memory using
> alloc_pages and also skip all the sync calls.

In theory, absolutely. In practice modern virtio devices are ok,
the reason we are stuck supporting old legacy ones is because legacy
devices are needed to run old guests. And then people turn
around and run a new guest on the same device,
for example because they switch back and forth e.g.
for data recovery? Or because whoever is selling the
host wants to opt for maximum compatibility.

Teaching all of linux to sometimes use dma and sometimes not
is a lot of work, and for limited benefit of these legacy systems.
We do it in a limited number of cases but generally
making DMA itself DTRT sounds more attractive.

So special DMA ops for these makes some sense: yes the
firmware described DMA is wrong on these boxes but
buggy firmware is not so unusual, is it?
Given virtio devices actually are on a virtual bus (the virtio bus)
sticking the fake DMA ops on this bus seems to make sense
as a way to express this quirk.

No?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-24 Thread Xuan Zhuo
On Mon, 24 Apr 2023 17:28:01 +0200, Alexander Lobakin 
 wrote:
> From: Jakub Kicinski 
> Date: Fri, 21 Apr 2023 06:50:59 -0700
>
> > On Fri, 21 Apr 2023 15:31:04 +0800 Xuan Zhuo wrote:
> >> I am not particularly familiar with dma-bufs. I want to know if this 
> >> mechanism
> >> can solve the problem of virtio-net.
> >>
> >> I saw this framework, allowing the driver do something inside the ops of
> >> dma-bufs.
> >>
> >> If so, is it possible to propose a new patch based on dma-bufs?
> >
> > I haven't looked in detail, maybe Olek has? AFAIU you'd need to rework
>
> Oh no, not me. I suck at dma-bufs, tried to understand them several
> times with no progress :D My knowledge is limited to "ok, if it's
> DMA + userspace, then it's likely dma-buf" :smile_with_tear:
>
> > uAPI of XSK to allow user to pass in a dma-buf region rather than just
> > a user VA. So it may be a larger effort but architecturally it may be
> > the right solution.
> >
>
> I'm curious whether this could be done without tons of work. Switching
> Page Pool to dma_alloc_noncoherent() is simpler :D But, as I wrote
> above, we need to extend DMA API first to provide bulk allocations and
> NUMA-aware allocations.
> Can't we provide a shim for back-compat, i.e. if a program passes just a
> user VA, create a dma-buf in the kernel already?


Yes

I think so too. If this is the case, will the workload be much smaller? Let me
try it.

Thanks.


>
> Thanks,
> Olek
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-22 Thread Xuan Zhuo
On Fri, 21 Apr 2023 06:50:59 -0700, Jakub Kicinski  wrote:
> On Fri, 21 Apr 2023 15:31:04 +0800 Xuan Zhuo wrote:
> > I am not particularly familiar with dma-bufs. I want to know if this 
> > mechanism
> > can solve the problem of virtio-net.
> >
> > I saw this framework, allowing the driver do something inside the ops of
> > dma-bufs.
> >
> > If so, is it possible to propose a new patch based on dma-bufs?
>
> I haven't looked in detail, maybe Olek has? AFAIU you'd need to rework
> uAPI of XSK to allow user to pass in a dma-buf region rather than just
> a user VA.

This seems to be a big job. Can we first receive this patch.

Thanks.

> So it may be a larger effort but architecturally it may be
> the right solution.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-21 Thread Xuan Zhuo
On Thu, 20 Apr 2023 07:13:49 -0700, Jakub Kicinski  wrote:
> On Wed, 19 Apr 2023 23:19:22 -0700 Christoph Hellwig wrote:
> > > In this case yes, pinned user memory, it gets sliced up into MTU sized
> > > chunks, fed into an Rx queue of a device, and user can see packets
> > > without any copies.
> >
> > How long is the life time of these mappings?  Because dma_map_*
> > assumes a temporary mapping and not one that is pinned bascically
> > forever.
>
> Yeah, this one is "for ever".
>
> > > Quite similar use case #2 is upcoming io_uring / "direct placement"
> > > patches (former from Meta, latter for Google) which will try to receive
> > > just the TCP data into pinned user memory.
> >
> > I don't think we can just long term pin user memory here.  E.g. for
> > confidential computing cases we can't even ever do DMA straight to
> > userspace.  I had that conversation with Meta's block folks who
> > want to do something similar with io_uring and the only option is an
> > an allocator for memory that is known DMAable, e.g. through dma-bufs.
> >
> > You guys really all need to get together and come up with a scheme
> > that actually works instead of piling these hacks over hacks.
>
> Okay, that simplifies various aspects. We'll just used dma-bufs from
> the start in the new APIs.


I am not particularly familiar with dma-bufs. I want to know if this mechanism
can solve the problem of virtio-net.

I saw this framework, allowing the driver do something inside the ops of
dma-bufs.

If so, is it possible to propose a new patch based on dma-bufs?

Thanks.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-20 Thread Christoph Hellwig
On Thu, Apr 20, 2023 at 05:11:48PM +0800, Xuan Zhuo wrote:
> I know that the current design of DMA API only supports some physical 
> hardware,
> but can it be modified or expanded?

I think the important point is that for some cases there is no need
to dma map at all, and upper layers should be fine by that by just
doing the dma mapping in helpers called by the driver.

The virtio drivers then check if platform_access is set, then call the
generic dma mapping helper, or if not just allocate memory using
alloc_pages and also skip all the sync calls.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-20 Thread Christoph Hellwig
On Thu, Apr 20, 2023 at 03:59:39PM +0200, Alexander Lobakin wrote:
> Hmm, currently almost all Ethernet drivers map Rx pages once and then
> just recycle them, keeping the original DMA mapping. Which means pages
> can have the same first mapping for very long time, often even for the
> lifetime of the struct device. Same for XDP sockets, the lifetime of DMA
> mappings equals the lifetime of sockets.
> Does it mean we'd better review that approach and try switching to
> dma_alloc_*() family (non-coherent/caching in our case)?

Yes, exactly.  dma_alloc_noncoherent can be used exactly as alloc_pages
+ dma_map_* by the driver (including the dma_sync_* calls on reuse), but
has a huge number of advantages.

> Also, I remember I tried to do that for one my driver, but the thing
> that all those functions zero the whole page(s) before returning them to
> the driver ruins the performance -- we don't need to zero buffers for
> receiving packets and spend a ton of cycles on it (esp. in cases when 4k
> gets zeroed each time, but your main body of traffic is 64-byte frames).

Hmm, the single zeroing when doing the initial allocation shows up
in these profiles?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-20 Thread Xuan Zhuo
On Wed, 19 Apr 2023 23:19:22 -0700, Christoph Hellwig  
wrote:
> On Wed, Apr 19, 2023 at 09:45:06AM -0700, Jakub Kicinski wrote:
> > > Can you explain what the actual use case is?
> > >
> > > From the original patchset I suspect it is dma mapping something very
> > > long term and then maybe doing syncs on it as needed?
> >
> > In this case yes, pinned user memory, it gets sliced up into MTU sized
> > chunks, fed into an Rx queue of a device, and user can see packets
> > without any copies.
>
> How long is the life time of these mappings?  Because dma_map_*
> assumes a temporary mapping and not one that is pinned bascically
> forever.
>
> > Quite similar use case #2 is upcoming io_uring / "direct placement"
> > patches (former from Meta, latter for Google) which will try to receive
> > just the TCP data into pinned user memory.
>
> I don't think we can just long term pin user memory here.  E.g. for
> confidential computing cases we can't even ever do DMA straight to
> userspace.  I had that conversation with Meta's block folks who
> want to do something similar with io_uring and the only option is an
> an allocator for memory that is known DMAable, e.g. through dma-bufs.
>
> You guys really all need to get together and come up with a scheme
> that actually works instead of piling these hacks over hacks.

I think that cases Jakub mentioned are new requirements. From the perspective of
implementation, compared to this patch I submitted, if the DMA API can be
expanded, compatible with some special hardware, I think it is a good solution.

I know that the current design of DMA API only supports some physical hardware,
but can it be modified or expanded?

Still the previous idea, can we add a new ops (not dma_ops) in device? After the
driver configuration, so that the DMA API can be compatible with some special
hardware?


Thanks.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-19 Thread Christoph Hellwig
On Wed, Apr 19, 2023 at 09:45:06AM -0700, Jakub Kicinski wrote:
> > Can you explain what the actual use case is?
> > 
> > From the original patchset I suspect it is dma mapping something very
> > long term and then maybe doing syncs on it as needed?
> 
> In this case yes, pinned user memory, it gets sliced up into MTU sized
> chunks, fed into an Rx queue of a device, and user can see packets
> without any copies.

How long is the life time of these mappings?  Because dma_map_*
assumes a temporary mapping and not one that is pinned bascically
forever.

> Quite similar use case #2 is upcoming io_uring / "direct placement"
> patches (former from Meta, latter for Google) which will try to receive
> just the TCP data into pinned user memory.

I don't think we can just long term pin user memory here.  E.g. for
confidential computing cases we can't even ever do DMA straight to
userspace.  I had that conversation with Meta's block folks who
want to do something similar with io_uring and the only option is an
an allocator for memory that is known DMAable, e.g. through dma-bufs.

You guys really all need to get together and come up with a scheme
that actually works instead of piling these hacks over hacks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-19 Thread Christoph Hellwig
On Wed, Apr 19, 2023 at 03:14:48PM +0200, Alexander Lobakin wrote:
> >>> dma addresses and thus dma mappings are completely driver specific.
> >>> Upper layers have no business looking at them.
> 
> Here it's not an "upper layer". XSk core doesn't look at them or pass
> them between several drivers.

Same for upper layers :)  The just do abstract operations that can sit
on top of a variety of drivers.

> It maps DMA solely via the struct device
> passed from the driver and then just gets-sets addresses for this driver
> only. Just like Page Pool does for regular Rx buffers. This got moved to
> the XSk core to not repeat the same code pattern in each driver.

Which assumes that:

 a) a DMA mapping needs to be done at all
 b) it can be done using a struct device exposed to it
 c) that DMA mapping is actually at the same granularity that it
operates on

all of which might not be true.

> >>From the original patchset I suspect it is dma mapping something very
> > long term and then maybe doing syncs on it as needed?
> 
> As I mentioned, XSk provides some handy wrappers to map DMA for drivers.
> Previously, XSk was supported by real hardware drivers only, but here
> the developer tries to add support to virtio-net. I suspect he needs to
> use DMA mapping functions different from which the regular driver use.

Yes,  For actual hardware virtio and some more complex virtualized
setups it works just like real hardware.  For legacy virtio there is 
no DMA maping involved at all.  Because of that all DMA mapping needs
to be done inside of virtio.

> So this is far from dma_map_ops, the author picked wrong name :D
> And correct, for XSk we map one big piece of memory only once and then
> reuse it for buffers, no inflight map/unmap on hotpath (only syncs when
> needed). So this mapping is longterm and is stored in XSk core structure
> assigned to the driver which this mapping was done for.
> I think Jakub thinks of something similar, but for the "regular" Rx/Tx,
> not only XDP sockets :)

FYI, dma_map_* is not intended for long term mappings, can lead
to starvation issues.  You need to use dma_alloc_* instead.  And
"you" in that case is as I said the driver, not an upper layer.
If it's just helper called by drivers and never from core code,
that's of course fine.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-19 Thread Christoph Hellwig
On Wed, Apr 19, 2023 at 03:22:39PM +0200, Alexander Lobakin wrote:
> If DMA syncs are not needed on your x86_64 DMA-coherent system, it
> doesn't mean we all don't need it.

If the DMA isn't actually a DMA (as in the virtio case, or other
cases that instead have to do their own dma mapping at much lower
layers) syncs generally don't make sense.

> Instead of filling pointers with
> "default" callbacks, you could instead avoid indirect calls at all when
> no custom DMA ops are specified. Pls see how for example Christoph did
> that for direct DMA. It would cost only one if-else for case without
> custom DMA ops here instead of an indirect call each time.

So yes, I think the abstraction here should not be another layer of
DMA ops, but the option to DMA map or not at all.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-19 Thread Xuan Zhuo
On Wed, 19 Apr 2023 15:22:39 +0200, Alexander Lobakin 
 wrote:
> From: Xuan Zhuo 
> Date: Mon, 17 Apr 2023 11:27:50 +0800
>
> > The purpose of this patch is to allow driver pass the own dma_ops to
> > xsk.
> >
> > This is to cope with the scene of virtio-net. If virtio does not have
> > VIRTIO_F_ACCESS_PLATFORM, then virtio cannot use DMA API. In this case,
> > XSK cannot use DMA API directly to achieve DMA address. Based on this
> > scene, we must let XSK support driver to use the driver's dma_ops.
> >
> > On the other hand, the implementation of XSK as a highlevel code
> > should put the underlying operation of DMA to the driver layer.
> > The driver layer determines the implementation of the final DMA. XSK
> > should not make such assumptions. Everything will be simplified if DMA
> > is done at the driver level.
> >
> > More is here:
> > 
> > https://lore.kernel.org/virtualization/1681265026.6082013-1-xuanz...@linux.alibaba.com/
> >
> > Signed-off-by: Xuan Zhuo 
>
> [...]
>
> >  struct xsk_buff_pool {
> > /* Members only used in the control path first. */
> > struct device *dev;
> > @@ -85,6 +102,7 @@ struct xsk_buff_pool {
> >  * sockets share a single cq when the same netdev and queue id is 
> > shared.
> >  */
> > spinlock_t cq_lock;
> > +   struct xsk_dma_ops dma_ops;
>
> Why full struct, not a const pointer? You'll have indirect calls either
> way, copying the full struct won't reclaim you much performance.
>
> > struct xdp_buff_xsk *free_heads[];
> >  };
> >
>
> [...]
>
> > @@ -424,18 +426,29 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct 
> > device *dev,
> > return 0;
> > }
> >
> > +   if (!dma_ops) {
> > +   pool->dma_ops.map_page = dma_map_page_attrs;
> > +   pool->dma_ops.mapping_error = dma_mapping_error;
> > +   pool->dma_ops.need_sync = dma_need_sync;
> > +   pool->dma_ops.sync_single_range_for_device = 
> > dma_sync_single_range_for_device;
> > +   pool->dma_ops.sync_single_range_for_cpu = 
> > dma_sync_single_range_for_cpu;
> > +   dma_ops = &pool->dma_ops;
> > +   } else {
> > +   pool->dma_ops = *dma_ops;
> > +   }
>
> If DMA syncs are not needed on your x86_64 DMA-coherent system, it
> doesn't mean we all don't need it. Instead of filling pointers with
> "default" callbacks, you could instead avoid indirect calls at all when
> no custom DMA ops are specified. Pls see how for example Christoph did
> that for direct DMA. It would cost only one if-else for case without
> custom DMA ops here instead of an indirect call each time.
>
> (I *could* suggest using INDIRECT_CALL_WRAPPER(), but I won't, since
>  it's more expensive than direct checking and I feel like it's more
>  appropriate to check directly here)

OK, I will fix it in next version.

Thanks.




>
> > +
> > dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, pool->umem);
> > if (!dma_map)
> > return -ENOMEM;
> [...]
>
> Thanks,
> Olek
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-19 Thread Xuan Zhuo
On Wed, 19 Apr 2023 15:14:48 +0200, Alexander Lobakin 
 wrote:
> From: Christoph Hellwig 
> Date: Tue, 18 Apr 2023 22:16:53 -0700
>
> > On Mon, Apr 17, 2023 at 11:19:47PM -0700, Jakub Kicinski wrote:
> >>> You can't just do dma mapping outside the driver, because there are
> >>> drivers that do not require DMA mapping at all.  virtio is an example,
> >>> but all the classic s390 drivers and some other odd virtualization
> >>> ones are others.
> >>
> >> What bus are the classic s390 on (in terms of the device model)?
> >
> > I think most of them are based on struct ccw_device, but I'll let the
> > s390 maintainers fill in.
> >
> > Another interesting case that isn't really relevant for your networking
> > guys, but that caused as problems is RDMA.  For hardware RDMA devices
> > it wants the ULPs to DMA map, but it turns out we have various software
> > drivers that map to network drivers that do their own DMA mapping
> > at a much lower layer and after potentially splitting packets or
> > even mangling them.
> >
> >>
>  I don't think it's reasonable to be bubbling up custom per-subsystem
>  DMA ops into all of them for the sake of virtio.
> >>>
> >>> dma addresses and thus dma mappings are completely driver specific.
> >>> Upper layers have no business looking at them.
>
> Here it's not an "upper layer". XSk core doesn't look at them or pass
> them between several drivers. It maps DMA solely via the struct device
> passed from the driver and then just gets-sets addresses for this driver
> only. Just like Page Pool does for regular Rx buffers. This got moved to
> the XSk core to not repeat the same code pattern in each driver.
>
> >>
> >> Damn, that's unfortunate. Thinking aloud -- that means that if we want
> >> to continue to pull memory management out of networking drivers to
> >> improve it for all, cross-optimize with the rest of the stack and
> >> allow various upcoming forms of zero copy -- then we need to add an
> >> equivalent of dma_ops and DMA API locally in networking?
>
> Managing DMA addresses is totally fine as long as you don't try to pass
> mapped addresses between different drivers :D Page Pool already does
> that and I don't see a problem in that in general.
>
> >
> > Can you explain what the actual use case is?
> >
> >>From the original patchset I suspect it is dma mapping something very
> > long term and then maybe doing syncs on it as needed?
>
> As I mentioned, XSk provides some handy wrappers to map DMA for drivers.
> Previously, XSk was supported by real hardware drivers only, but here
> the developer tries to add support to virtio-net. I suspect he needs to
> use DMA mapping functions different from which the regular driver use.
> So this is far from dma_map_ops, the author picked wrong name :D


Yes, dma_ops caused some misunderstandings, which is indeed a wrong name.

Thanks.


> And correct, for XSk we map one big piece of memory only once and then
> reuse it for buffers, no inflight map/unmap on hotpath (only syncs when
> needed). So this mapping is longterm and is stored in XSk core structure
> assigned to the driver which this mapping was done for.
> I think Jakub thinks of something similar, but for the "regular" Rx/Tx,
> not only XDP sockets :)
>
> Thanks,
> Olek
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-18 Thread Christoph Hellwig
On Mon, Apr 17, 2023 at 11:19:47PM -0700, Jakub Kicinski wrote:
> > You can't just do dma mapping outside the driver, because there are
> > drivers that do not require DMA mapping at all.  virtio is an example,
> > but all the classic s390 drivers and some other odd virtualization
> > ones are others.
> 
> What bus are the classic s390 on (in terms of the device model)?

I think most of them are based on struct ccw_device, but I'll let the
s390 maintainers fill in.

Another interesting case that isn't really relevant for your networking
guys, but that caused as problems is RDMA.  For hardware RDMA devices
it wants the ULPs to DMA map, but it turns out we have various software
drivers that map to network drivers that do their own DMA mapping
at a much lower layer and after potentially splitting packets or
even mangling them.

> 
> > > I don't think it's reasonable to be bubbling up custom per-subsystem
> > > DMA ops into all of them for the sake of virtio.  
> > 
> > dma addresses and thus dma mappings are completely driver specific.
> > Upper layers have no business looking at them.
> 
> Damn, that's unfortunate. Thinking aloud -- that means that if we want 
> to continue to pull memory management out of networking drivers to
> improve it for all, cross-optimize with the rest of the stack and
> allow various upcoming forms of zero copy -- then we need to add an
> equivalent of dma_ops and DMA API locally in networking?

Can you explain what the actual use case is?

>From the original patchset I suspect it is dma mapping something very
long term and then maybe doing syncs on it as needed?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-17 Thread Christoph Hellwig
On Mon, Apr 17, 2023 at 07:54:00PM -0700, Jakub Kicinski wrote:
> AF_XDP, io_uring, and increasing number of pinned memory / zero copy
> implementations need to do DMA mapping outside the drivers.

You can't just do dma mapping outside the driver, because there are
drivers that do not require DMA mapping at all.  virtio is an example,
but all the classic s390 drivers and some other odd virtualization
ones are others.

> I don't think it's reasonable to be bubbling up custom per-subsystem
> DMA ops into all of them for the sake of virtio.

dma addresses and thus dma mappings are completely driver specific.
Upper layers have no business looking at them.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-17 Thread Xuan Zhuo
On Mon, 17 Apr 2023 18:19:50 -0700, Jakub Kicinski  wrote:
> On Tue, 18 Apr 2023 09:07:30 +0800 Jason Wang wrote:
> > > > Would you mind explaining this a bit more to folks like me who are not
> > > > familiar with VirtIO?  DMA API is supposed to hide the DMA mapping
> > > > details from the stack, why is it not sufficient here.
> >
> > The reason is that legacy virtio device don't use DMA(vring_use_dma_api()).
> >
> > The AF_XDP assumes DMA for netdev doesn't work in this case. We need a
> > way to make it work.
>
> Can we not push this down to be bus level? virtio has its own bus it
> can plug in whatever magic it wants into dma ops.

It is actually not possible.

[1] https://lore.kernel.org/virtualization/zducdeylqawqv...@infradead.org/

>
> Doesn't have to be super fast for af_xdp's sake - for af_xdp dma mapping
> is on the control path. You can keep using the if (vring_use_dma_api())
> elsewhere for now if there is a perf concern.

Sorry, I don't particularly understand this passage.

Now, the question is if vring_use_dma_api() is false, then we cannot use DMA
API in AF_XDP.

The good news is that except for some of sync's operations, they are in the
control path. I think it is very small effect on performance. Because in most
case the sync is unnecessary.


>
> Otherwise it really seems like we're bubbling up a virtio hack into
> generic code :(

Can we understand the purpose of this matter to back the DMA operation to the
driver? Although I don't know if there are other drivers with similar
requirements.

Thanks.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-17 Thread Xuan Zhuo
On Mon, 17 Apr 2023 11:57:53 -0700, Jakub Kicinski  wrote:
> On Mon, 17 Apr 2023 11:56:10 -0700 Jakub Kicinski wrote:
> > > May misunderstand, here the "dma_ops" is not the "dma_ops" of DMA API.
> > >
> > > I mean the callbacks for xsk to do dma.
> > >
> > > Maybe, I should rename it in the next version.
> >
> > Would you mind explaining this a bit more to folks like me who are not
> > familiar with VirtIO?  DMA API is supposed to hide the DMA mapping
> > details from the stack, why is it not sufficient here.
>
> Umm.. also it'd help to post the user of the API in the same series.
> I only see the XSK changes, maybe if the virtio changes were in
> the same series I could answer my own question.


This [1] is the similar code. This is the early version. But the idea is
similar to this patch.


[1] 
https://lore.kernel.org/virtualization/20230202110058.130695-1-xuanz...@linux.alibaba.com/


Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-17 Thread Jason Wang
On Tue, Apr 18, 2023 at 2:58 AM Jakub Kicinski  wrote:
>
> On Mon, 17 Apr 2023 11:56:10 -0700 Jakub Kicinski wrote:
> > > May misunderstand, here the "dma_ops" is not the "dma_ops" of DMA API.
> > >
> > > I mean the callbacks for xsk to do dma.
> > >
> > > Maybe, I should rename it in the next version.
> >
> > Would you mind explaining this a bit more to folks like me who are not
> > familiar with VirtIO?  DMA API is supposed to hide the DMA mapping
> > details from the stack, why is it not sufficient here.

The reason is that legacy virtio device don't use DMA(vring_use_dma_api()).

The AF_XDP assumes DMA for netdev doesn't work in this case. We need a
way to make it work.

Thanks

>
> Umm.. also it'd help to post the user of the API in the same series.
> I only see the XSK changes, maybe if the virtio changes were in
> the same series I could answer my own question.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread Xuan Zhuo
On Mon, 17 Apr 2023 02:43:32 -0400, "Michael S. Tsirkin"  
wrote:
> On Mon, Apr 17, 2023 at 11:27:50AM +0800, Xuan Zhuo wrote:
> > @@ -532,9 +545,9 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
> > xskb->xdp.data_meta = xskb->xdp.data;
> >
> > if (pool->dma_need_sync) {
> > -   dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
> > -pool->frame_len,
> > -DMA_BIDIRECTIONAL);
> > +   pool->dma_ops.sync_single_range_for_device(pool->dev, 
> > xskb->dma, 0,
> > +  pool->frame_len,
> > +  DMA_BIDIRECTIONAL);
> > }
> > return &xskb->xdp;
> >  }
> > @@ -670,15 +683,15 @@ EXPORT_SYMBOL(xp_raw_get_dma);
> >
> >  void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb)
> >  {
> > -   dma_sync_single_range_for_cpu(xskb->pool->dev, xskb->dma, 0,
> > - xskb->pool->frame_len, DMA_BIDIRECTIONAL);
> > +   xskb->pool->dma_ops.sync_single_range_for_cpu(xskb->pool->dev, 
> > xskb->dma, 0,
> > + xskb->pool->frame_len, 
> > DMA_BIDIRECTIONAL);
> >  }
> >  EXPORT_SYMBOL(xp_dma_sync_for_cpu_slow);
> >
> >  void xp_dma_sync_for_device_slow(struct xsk_buff_pool *pool, dma_addr_t 
> > dma,
> >  size_t size)
> >  {
> > -   dma_sync_single_range_for_device(pool->dev, dma, 0,
> > -size, DMA_BIDIRECTIONAL);
> > +   pool->dma_ops.sync_single_range_for_device(pool->dev, dma, 0,
> > +  size, DMA_BIDIRECTIONAL);
> >  }
> >  EXPORT_SYMBOL(xp_dma_sync_for_device_slow);
>
> So you add an indirect function call on data path? Won't this be costly?


Yes, this may introduce some cost. The good news is that in some case,
sync is not necessary. dma_need_sync should be false.

Thanks.


>
> > --
> > 2.32.0.3.g01195cf9f
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread kernel test robot
Hi Xuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Xuan-Zhuo/xsk-introduce-xsk_dma_ops/20230417-112903
patch link:
https://lore.kernel.org/r/20230417032750.7086-1-xuanzhuo%40linux.alibaba.com
patch subject: [PATCH net-next] xsk: introduce xsk_dma_ops
config: mips-randconfig-r021-20230416 
(https://download.01.org/0day-ci/archive/20230417/202304171441.ezrwcnsh-...@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 
9638da200e00bd069e6dd63604e14cbafede9324)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mipsel-linux-gnu
# 
https://github.com/intel-lab-lkp/linux/commit/28e766603a33761d7bd1fdd3e107595408319f7d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/xsk-introduce-xsk_dma_ops/20230417-112903
git checkout 28e766603a33761d7bd1fdd3e107595408319f7d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=mips olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=mips SHELL=/bin/bash net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202304171441.ezrwcnsh-...@intel.com/

All errors (new ones prefixed by >>):

>> net/xdp/xsk_buff_pool.c:430:26: error: incompatible function pointer types 
>> assigning to 'dma_addr_t (*)(struct device *, struct page *, unsigned long, 
>> size_t, enum dma_data_direction, unsigned long)' (aka 'unsigned int 
>> (*)(struct device *, struct page *, unsigned long, unsigned int, enum 
>> dma_data_direction, unsigned long)') from 'dma_addr_t (struct device *, 
>> struct page *, size_t, size_t, enum dma_data_direction, unsigned long)' (aka 
>> 'unsigned int (struct device *, struct page *, unsigned int, unsigned int, 
>> enum dma_data_direction, unsigned long)') 
>> [-Wincompatible-function-pointer-types]
   pool->dma_ops.map_page = dma_map_page_attrs;
  ^ ~~
   1 error generated.


vim +430 net/xdp/xsk_buff_pool.c

   409  
   410  int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
   411 struct xsk_dma_ops *dma_ops,
   412 unsigned long attrs, struct page **pages, u32 nr_pages)
   413  {
   414  struct xsk_dma_map *dma_map;
   415  dma_addr_t dma;
   416  int err;
   417  u32 i;
   418  
   419  dma_map = xp_find_dma_map(pool);
   420  if (dma_map) {
   421  err = xp_init_dma_info(pool, dma_map);
   422  if (err)
   423  return err;
   424  
   425  refcount_inc(&dma_map->users);
   426  return 0;
   427  }
   428  
   429  if (!dma_ops) {
 > 430  pool->dma_ops.map_page = dma_map_page_attrs;
   431  pool->dma_ops.mapping_error = dma_mapping_error;
   432  pool->dma_ops.need_sync = dma_need_sync;
   433  pool->dma_ops.sync_single_range_for_device = 
dma_sync_single_range_for_device;
   434  pool->dma_ops.sync_single_range_for_cpu = 
dma_sync_single_range_for_cpu;
   435  dma_ops = &pool->dma_ops;
   436  } else {
   437  pool->dma_ops = *dma_ops;
   438  }
   439  
   440  dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, 
pool->umem);
   441  if (!dma_map)
   442  return -ENOMEM;
   443  
   444  for (i = 0; i < dma_map->dma_pages_cnt; i++) {
   445  dma = dma_ops->map_page(dev, pages[i], 0, PAGE_SIZE,
   446  DMA_BIDIRECTIONAL, attrs);
   447  if (dma_ops->mapping_error(dev, dma)) {
   448  __xp_dma_unmap(dma_map, dma_ops, attrs);
   449  return -ENOMEM;
   450  }
   451  if (dma_ops->need_sync(dev, dma))
   452  dma_map->dma_need_sync = true;
   453  dma_map->dma_pages[i] = dma;
   454  }
   455  
   456  if (pool->unaligned)
   457  xp_check_dma_contiguity(dma_map);
   458  
   459  err = xp_init_dma_info(pool, dma_map);
   

Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread Michael S. Tsirkin
On Mon, Apr 17, 2023 at 11:27:50AM +0800, Xuan Zhuo wrote:
> @@ -532,9 +545,9 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
>   xskb->xdp.data_meta = xskb->xdp.data;
>  
>   if (pool->dma_need_sync) {
> - dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
> -  pool->frame_len,
> -  DMA_BIDIRECTIONAL);
> + pool->dma_ops.sync_single_range_for_device(pool->dev, 
> xskb->dma, 0,
> +pool->frame_len,
> +DMA_BIDIRECTIONAL);
>   }
>   return &xskb->xdp;
>  }
> @@ -670,15 +683,15 @@ EXPORT_SYMBOL(xp_raw_get_dma);
>  
>  void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb)
>  {
> - dma_sync_single_range_for_cpu(xskb->pool->dev, xskb->dma, 0,
> -   xskb->pool->frame_len, DMA_BIDIRECTIONAL);
> + xskb->pool->dma_ops.sync_single_range_for_cpu(xskb->pool->dev, 
> xskb->dma, 0,
> +   xskb->pool->frame_len, 
> DMA_BIDIRECTIONAL);
>  }
>  EXPORT_SYMBOL(xp_dma_sync_for_cpu_slow);
>  
>  void xp_dma_sync_for_device_slow(struct xsk_buff_pool *pool, dma_addr_t dma,
>size_t size)
>  {
> - dma_sync_single_range_for_device(pool->dev, dma, 0,
> -  size, DMA_BIDIRECTIONAL);
> + pool->dma_ops.sync_single_range_for_device(pool->dev, dma, 0,
> +size, DMA_BIDIRECTIONAL);
>  }
>  EXPORT_SYMBOL(xp_dma_sync_for_device_slow);

So you add an indirect function call on data path? Won't this be costly?

> -- 
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread kernel test robot
Hi Xuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Xuan-Zhuo/xsk-introduce-xsk_dma_ops/20230417-112903
patch link:
https://lore.kernel.org/r/20230417032750.7086-1-xuanzhuo%40linux.alibaba.com
patch subject: [PATCH net-next] xsk: introduce xsk_dma_ops
config: i386-randconfig-a011-20230417 
(https://download.01.org/0day-ci/archive/20230417/202304171427.uaryn9jl-...@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project 
f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/28e766603a33761d7bd1fdd3e107595408319f7d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/xsk-introduce-xsk_dma_ops/20230417-112903
git checkout 28e766603a33761d7bd1fdd3e107595408319f7d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=i386 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=i386 SHELL=/bin/bash net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202304171427.uaryn9jl-...@intel.com/

All errors (new ones prefixed by >>):

>> net/xdp/xsk_buff_pool.c:430:26: error: incompatible function pointer types 
>> assigning to 'dma_addr_t (*)(struct device *, struct page *, unsigned long, 
>> size_t, enum dma_data_direction, unsigned long)' (aka 'unsigned int 
>> (*)(struct device *, struct page *, unsigned long, unsigned int, enum 
>> dma_data_direction, unsigned long)') from 'dma_addr_t (struct device *, 
>> struct page *, size_t, size_t, enum dma_data_direction, unsigned long)' (aka 
>> 'unsigned int (struct device *, struct page *, unsigned int, unsigned int, 
>> enum dma_data_direction, unsigned long)') 
>> [-Werror,-Wincompatible-function-pointer-types]
   pool->dma_ops.map_page = dma_map_page_attrs;
  ^ ~~
   1 error generated.


vim +430 net/xdp/xsk_buff_pool.c

   409  
   410  int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
   411 struct xsk_dma_ops *dma_ops,
   412 unsigned long attrs, struct page **pages, u32 nr_pages)
   413  {
   414  struct xsk_dma_map *dma_map;
   415  dma_addr_t dma;
   416  int err;
   417  u32 i;
   418  
   419  dma_map = xp_find_dma_map(pool);
   420  if (dma_map) {
   421  err = xp_init_dma_info(pool, dma_map);
   422  if (err)
   423  return err;
   424  
   425  refcount_inc(&dma_map->users);
   426  return 0;
   427  }
   428  
   429  if (!dma_ops) {
 > 430  pool->dma_ops.map_page = dma_map_page_attrs;
   431  pool->dma_ops.mapping_error = dma_mapping_error;
   432  pool->dma_ops.need_sync = dma_need_sync;
   433  pool->dma_ops.sync_single_range_for_device = 
dma_sync_single_range_for_device;
   434  pool->dma_ops.sync_single_range_for_cpu = 
dma_sync_single_range_for_cpu;
   435  dma_ops = &pool->dma_ops;
   436  } else {
   437  pool->dma_ops = *dma_ops;
   438  }
   439  
   440  dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, 
pool->umem);
   441  if (!dma_map)
   442  return -ENOMEM;
   443  
   444  for (i = 0; i < dma_map->dma_pages_cnt; i++) {
   445  dma = dma_ops->map_page(dev, pages[i], 0, PAGE_SIZE,
   446  DMA_BIDIRECTIONAL, attrs);
   447  if (dma_ops->mapping_error(dev, dma)) {
   448  __xp_dma_unmap(dma_map, dma_ops, attrs);
   449  return -ENOMEM;
   450  }
   451  if (dma_ops->need_sync(dev, dma))
   452  dma_map->dma_need_sync = true;
   453  dma_map->dma_pages[i] = dma;
   454  }
   455  
   456  if (pool->unaligned)
   457  xp_check_dma_contiguity(dma_map);
   458  
   459  err = xp_init_dma_info(pool, dma_map);
   460  if (err) {
   461  __xp_dma_unmap(dma_map, dma_ops, attrs);
   462

Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread Xuan Zhuo
On Sun, 16 Apr 2023 21:24:32 -0700, Christoph Hellwig  
wrote:
> On Mon, Apr 17, 2023 at 11:27:50AM +0800, Xuan Zhuo wrote:
> > The purpose of this patch is to allow driver pass the own dma_ops to
> > xsk.
>
> Drivers have no business passing around dma_ops, or even knowing about
> them.


May misunderstand, here the "dma_ops" is not the "dma_ops" of DMA API.

I mean the callbacks for xsk to do dma.

Maybe, I should rename it in the next version.

Thanks.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread Christoph Hellwig
On Mon, Apr 17, 2023 at 11:27:50AM +0800, Xuan Zhuo wrote:
> The purpose of this patch is to allow driver pass the own dma_ops to
> xsk.

Drivers have no business passing around dma_ops, or even knowing about
them.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-16 Thread Xuan Zhuo
The purpose of this patch is to allow driver pass the own dma_ops to
xsk.

This is to cope with the scene of virtio-net. If virtio does not have
VIRTIO_F_ACCESS_PLATFORM, then virtio cannot use DMA API. In this case,
XSK cannot use DMA API directly to achieve DMA address. Based on this
scene, we must let XSK support driver to use the driver's dma_ops.

On the other hand, the implementation of XSK as a highlevel code
should put the underlying operation of DMA to the driver layer.
The driver layer determines the implementation of the final DMA. XSK
should not make such assumptions. Everything will be simplified if DMA
is done at the driver level.

More is here:

https://lore.kernel.org/virtualization/1681265026.6082013-1-xuanz...@linux.alibaba.com/

Signed-off-by: Xuan Zhuo 
---
 include/net/xdp_sock_drv.h  | 20 +++-
 include/net/xsk_buff_pool.h | 19 +++
 net/xdp/xsk_buff_pool.c | 47 +++--
 3 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 9c0d860609ba..181583ff6a26 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -67,7 +67,17 @@ static inline int xsk_pool_dma_map(struct xsk_buff_pool 
*pool,
 {
struct xdp_umem *umem = pool->umem;
 
-   return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs);
+   return xp_dma_map(pool, dev, NULL, attrs, umem->pgs, umem->npgs);
+}
+
+static inline int xsk_pool_dma_map_with_ops(struct xsk_buff_pool *pool,
+   struct device *dev,
+   struct xsk_dma_ops *dma_ops,
+   unsigned long attrs)
+{
+   struct xdp_umem *umem = pool->umem;
+
+   return xp_dma_map(pool, dev, dma_ops, attrs, umem->pgs, umem->npgs);
 }
 
 static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp)
@@ -226,6 +236,14 @@ static inline int xsk_pool_dma_map(struct xsk_buff_pool 
*pool,
return 0;
 }
 
+static inline int xsk_pool_dma_map_with_ops(struct xsk_buff_pool *pool,
+   struct device *dev,
+   struct xsk_dma_ops *dma_ops,
+   unsigned long attrs)
+{
+   return 0;
+}
+
 static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp)
 {
return 0;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 3e952e569418..1299b9d12484 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -43,6 +43,23 @@ struct xsk_dma_map {
bool dma_need_sync;
 };
 
+struct xsk_dma_ops {
+   dma_addr_t (*map_page)(struct device *dev, struct page *page,
+  unsigned long offset, size_t size,
+  enum dma_data_direction dir, unsigned long 
attrs);
+   void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
+  size_t size, enum dma_data_direction dir,
+  unsigned long attrs);
+   int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
+   bool (*need_sync)(struct device *dev, dma_addr_t dma);
+   void (*sync_single_range_for_cpu)(struct device *dev, dma_addr_t addr,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir);
+   void (*sync_single_range_for_device)(struct device *dev, dma_addr_t 
addr,
+unsigned long offset, size_t size,
+enum dma_data_direction dir);
+};
+
 struct xsk_buff_pool {
/* Members only used in the control path first. */
struct device *dev;
@@ -85,6 +102,7 @@ struct xsk_buff_pool {
 * sockets share a single cq when the same netdev and queue id is 
shared.
 */
spinlock_t cq_lock;
+   struct xsk_dma_ops dma_ops;
struct xdp_buff_xsk *free_heads[];
 };
 
@@ -131,6 +149,7 @@ static inline void xp_init_xskb_dma(struct xdp_buff_xsk 
*xskb, struct xsk_buff_p
 /* AF_XDP ZC drivers, via xdp_sock_buff.h */
 void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq);
 int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
+  struct xsk_dma_ops *dma_ops,
   unsigned long attrs, struct page **pages, u32 nr_pages);
 void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs);
 struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool);
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index b2df1e0f8153..646090cae8ec 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -328,7 +328,8 @@ static void xp_destroy_dma_map(struct xsk_dma_map *dma_map)
kfree(dma_map);
 }
 
-static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs)
+static void __xp_dma_un