Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-10-01 Thread Ioana Ciornei
On Fri, Oct 01, 2021 at 06:19:59AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 14, 2021 at 03:45:06PM +0000, Ioana Ciornei wrote:
> > [  245.927020] fsl_dpaa2_eth dpni.3: scather-gather idx 0 P=20a732 
> > N=20a7320 D=20a732 L=30 DMA_BIDIRECTIONAL dma map error check not 
> > applicableĀ·
> > [  245.927048] fsl_dpaa2_eth dpni.3: scather-gather idx 1 P=20a7320030 
> > N=20a7320 D=20a7320030 L=5a8 DMA_BIDIRECTIONAL dma map error check not 
> > applicable
> > [  245.927062] DMA-API: cacheline tracking EEXIST, overlapping mappings 
> > aren't supported
> > 
> > The first line is the dump of the dma_debug_entry which is already present
> > in the radix tree and the second one is the entry which just triggered
> > the EEXIST.
> > 
> > As we can see, they are not actually overlapping, at least from my
> > understanding. The first one starts at 0x20a732 with a size 0x30
> > and the second one at 0x20a7320030.
> 
> They overlap the cache lines.  Which means if you use this driver
> on a system that is not dma coherent you will corrupt data.

This is a driver of an integrated ethernet controller which is DMA
coherent.

I added a print just to make sure of this:

--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -567,6 +567,7 @@ static void add_dma_entry(struct dma_debug_entry *entry)
pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
} else if (rc == -EEXIST) {
+   pr_err("dev_is_dma_coherent(%s) = %d\n", dev_name(entry->dev), 
dev_is_dma_coherent(entry->dev));
err_printk(entry->dev, entry,
"cacheline tracking EEXIST, overlapping mappings aren't 
supported\n");
}


[   85.852218] DMA-API: dev_is_dma_coherent(dpni.3) = 1
[   85.858891] [ cut here ]
[   85.858893] DMA-API: fsl_dpaa2_eth dpni.3: cacheline tracking EEXIST, 
overlapping mappings aren't supported
[   85.858901] WARNING: CPU: 13 PID: 1046 at kernel/dma/debug.c:571 
add_dma_entry+0x330/0x390
[   85.858911] Modules linked in:
[   85.858915] CPU: 13 PID: 1046 Comm: iperf3 Not tainted 
5.15.0-rc2-00478-g34286ba6a164-dirty #1275
[   85.858919] Hardware name: NXP Layerscape LX2160ARDB (DT)


Shouldn't this case not generate this kind of warning?

Ioana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-09-14 Thread Ioana Ciornei
On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
> +DPAA2, netdev maintainers
> Hi,
> 
> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
> > Since, overlapping mappings are not supported by the DMA API we should
> > report an error if active_cacheline_insert returns -EEXIST.
> 
> It seems this patch found a victim. I was trying to run iperf3 on a
> honeycomb (5.14.0, fedora 35) and the console is blasting this error message
> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which
> is attached below.
> 
> 
> [  151.839693] cacheline tracking EEXIST, overlapping mappings aren't
> supported
> ...
> [  151.924397] Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS
> EDK II Aug  9 2021
> [  151.932481] pstate: 4045 (nZcv daif +PAN -UAO -TCO BTYPE=--)
> [  151.938483] pc : add_dma_entry+0x218/0x240
> [  151.942575] lr : add_dma_entry+0x218/0x240
> [  151.94] sp : 8000101e2f20
> [  151.949975] x29: 8000101e2f20 x28: af317ac85000 x27:
> 3d0366ecb3a0
> [  151.957116] x26: 0400 x25: 0001 x24:
> af317bbe8908
> [  151.964257] x23: 0001 x22: af317bbe8810 x21:
> 
> [  151.971397] x20: 82e48000 x19: af317be6e000 x18:
> 
> [  151.978537] x17: 646574726f707075 x16: 732074276e657261 x15:
> 8000901e2c2f
> [  151.985676] x14:  x13:  x12:
> 
> [  151.992816] x11: af317bb4c4c0 x10: e000 x9 :
> af3179708060
> [  151.56] x8 : dfff x7 : af317bb4c4c0 x6 :
> 0001
> [  152.007096] x5 : 3d0a9af66e30 x4 :  x3 :
> 0027
> [  152.014236] x2 : 0023 x1 : 3d0360aac000 x0 :
> 0040
> [  152.021376] Call trace:
> [  152.023816]  add_dma_entry+0x218/0x240
> [  152.027561]  debug_dma_map_sg+0x118/0x17c
> [  152.031566]  dma_map_sg_attrs+0x70/0xb0
> [  152.035397]  dpaa2_eth_build_sg_fd+0xac/0x2f0 [fsl_dpaa2_eth]
> [  152.041150]  __dpaa2_eth_tx+0x3ec/0x570 [fsl_dpaa2_eth]
> [  152.046377]  dpaa2_eth_tx+0x74/0x110 [fsl_dpaa2_eth]
> [  152.051342]  dev_hard_start_xmit+0xe8/0x1a4
> [  152.055523]  sch_direct_xmit+0x8c/0x1e0
> [  152.059355]  __dev_xmit_skb+0x484/0x6a0
> [  152.063186]  __dev_queue_xmit+0x380/0x744
> [  152.067190]  dev_queue_xmit+0x20/0x2c
> [  152.070848]  neigh_hh_output+0xb4/0x130
> [  152.074679]  ip_finish_output2+0x494/0x8f0
> [  152.078770]  __ip_finish_output+0x12c/0x230
> [  152.082948]  ip_finish_output+0x40/0xe0
> [  152.086778]  ip_output+0xe4/0x2d4
> [  152.090088]  __ip_queue_xmit+0x1b4/0x5c0
> [  152.094006]  ip_queue_xmit+0x20/0x30
> [  152.097576]  __tcp_transmit_skb+0x3b8/0x7b4
> [  152.101755]  tcp_write_xmit+0x350/0x8e0
> [  152.105586]  __tcp_push_pending_frames+0x48/0x110
> [  152.110286]  tcp_rcv_established+0x338/0x690
> [  152.114550]  tcp_v4_do_rcv+0x1c0/0x29c
> [  152.118294]  tcp_v4_rcv+0xd14/0xe3c
> [  152.121777]  ip_protocol_deliver_rcu+0x88/0x340
> [  152.126302]  ip_local_deliver_finish+0xc0/0x184
> [  152.130827]  ip_local_deliver+0x7c/0x23c
> [  152.134744]  ip_rcv_finish+0xb4/0x100
> [  152.138400]  ip_rcv+0x54/0x210
> [  152.141449]  deliver_skb+0x74/0xdc
> [  152.144846]  __netif_receive_skb_core.constprop.0+0x250/0x81c
> [  152.150588]  __netif_receive_skb_list_core+0x94/0x264
> [  152.155635]  netif_receive_skb_list_internal+0x1d0/0x3bc
> [  152.160942]  netif_receive_skb_list+0x38/0x70
> [  152.165295]  dpaa2_eth_poll+0x168/0x350 [fsl_dpaa2_eth]
> [  152.170521]  __napi_poll.constprop.0+0x40/0x19c
> [  152.175047]  net_rx_action+0x2c4/0x360
> [  152.178792]  __do_softirq+0x1b0/0x394
> [  152.182450]  run_ksoftirqd+0x68/0xa0
> [  152.186023]  smpboot_thread_fn+0x13c/0x270
> [  152.190115]  kthread+0x138/0x140
>

I got some time to look at this and I am not sure if it's an actual
problem or not.

First of all, I added some more debug prints when any overlapping
happens so that I can actually see the entries.

[  245.927020] fsl_dpaa2_eth dpni.3: scather-gather idx 0 P=20a732 
N=20a7320 D=20a732 L=30 DMA_BIDIRECTIONAL dma map error check not 
applicableĀ·
[  245.927048] fsl_dpaa2_eth dpni.3: scather-gather idx 1 P=20a7320030 
N=20a7320 D=20a7320030 L=5a8 DMA_BIDIRECTIONAL dma map error check not 
applicable
[  245.927062] DMA-API: cacheline tracking EEXIST, overlapping mappings aren't 
supported

The first line is the dump of the dma_debug_entry which is already present
in the radix tree and the second one is the entry which just triggered
the EEXIST.

As we can see, they are not actually overlapping, at least from my
understanding. The first one starts at 0x20a732 with a size 0x30
and the second one at 0x20a7320030.

I wanted to see where these mappings are originating so I added some
traces around the dma_[un]map_single, dma_[un]map_sg operations in
dpaa2-eth.

I can see the following:
 - There are two S/G skbs being sent one after another (no cleanup of
   the Tx 

Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-09-10 Thread Ioana Ciornei
On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
> +DPAA2, netdev maintainers
> Hi,
> 
> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
> > Since, overlapping mappings are not supported by the DMA API we should
> > report an error if active_cacheline_insert returns -EEXIST.
> 
> It seems this patch found a victim. I was trying to run iperf3 on a
> honeycomb (5.14.0, fedora 35) and the console is blasting this error message
> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which
> is attached below.
> 

Thanks for the report.

I don't have access to hardware at the moment to actually see what's
happening since I'm on vacation.  I'll work on it in a few days.

Ioana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu