On Wed, 21 Oct 2020, at 08:40, Benjamin Herrenschmidt wrote:
> On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote:
> > On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung <dylan_h...@aspeedtech.com>
> > wrote:
> > > > +1 @first is system memory from dma_alloc_coherent(), right?
> > > >
> > > > You shouldn't have to do this. Is coherent DMA memory broken on your
> > > > platform?
> > >
> > > It is about the arbitration on the DRAM controller. There are two queues
> > > in the dram controller, one is for the CPU access and the other is for
> > > the HW engines.
> > > When CPU issues a store command, the dram controller just acknowledges
> > > cpu's request and pushes the request into the queue. Then CPU triggers
> > > the HW MAC engine, the HW engine starts to fetch the DMA memory.
> > > But since the cpu's request may still stay in the queue, the HW engine
> > > may fetch the wrong data.
>
> Actually, I take back what I said earlier, the above seems to imply
> this is more generic.
>
> Dylan, please confirm, does this affect *all* DMA capable devices ? If
> yes, then it's a really really bad design bug in your chips
> unfortunately and the proper fix is indeed to make dma_wmb() do a dummy
> read of some sort (what address though ? would any dummy non-cachable
> page do ?) to force the data out as *all* drivers will potentially be
> affected.
>
> I was under the impression that it was a specific timing issue in the
> vhub and ethernet parts, but if it's more generic then it needs to be
> fixed globally.
>
We see a similar issue in the XDMA engine where it can transfer stale data to
the host. I think the driver ended up using memcpy_toio() to work around that
despite using a DMA reserved memory region.