> -----Original Message----- > From: Jakub Kicinski [mailto:[email protected]] > Sent: Tuesday, October 20, 2020 3:01 AM > To: Joel Stanley <[email protected]> > Cc: Dylan Hung <[email protected]>; Benjamin Herrenschmidt > <[email protected]>; David S . Miller <[email protected]>; > [email protected]; Linux Kernel Mailing List > <[email protected]>; Po-Yu Chuang <[email protected]>; > linux-aspeed <[email protected]>; OpenBMC Maillist > <[email protected]>; BMC-SW <[email protected]> > Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue > > On Mon, 19 Oct 2020 08:57:03 +0000 Joel Stanley wrote: > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > > > b/drivers/net/ethernet/faraday/ftgmac100.c > > > index 00024dd41147..9a99a87f29f3 100644 > > > --- a/drivers/net/ethernet/faraday/ftgmac100.c > > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > > > @@ -804,7 +804,8 @@ static netdev_tx_t > ftgmac100_hard_start_xmit(struct sk_buff *skb, > > > * before setting the OWN bit on the first descriptor. > > > */ > > > dma_wmb(); > > > - first->txdes0 = cpu_to_le32(f_ctl_stat); > > > + WRITE_ONCE(first->txdes0, cpu_to_le32(f_ctl_stat)); > > > + READ_ONCE(first->txdes0); > > > > I understand what you're trying to do here, but I'm not sure that this > > is the correct way to go about it. > > > > It does cause the compiler to produce a store and then a load.
Yes, the load instruction here is to guarantee the previous store is indeed pushed onto the physical memory. > > +1 @first is system memory from dma_alloc_coherent(), right? > > You shouldn't have to do this. Is coherent DMA memory broken on your > platform? It is about the arbitration on the DRAM controller. There are two queues in the dram controller, one is for the CPU access and the other is for the HW engines. When CPU issues a store command, the dram controller just acknowledges cpu's request and pushes the request into the queue. Then CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA memory. But since the cpu's request may still stay in the queue, the HW engine may fetch the wrong data.

