On 07.12.2018 14:00, Anssi Hannula wrote: > On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote: >> Hi Anssi, > > Hi! > >> On 05.12.2018 16:00, Anssi Hannula wrote: >>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: >>>> On 30.11.2018 20:21, Anssi Hannula wrote: >>>>> When reading buffer descriptors on RX or on TX completion, an >>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >>>>> been populated. However, there are no memory barriers to ensure that the >>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >>>>> that bit. >>>>> >>>>> Fix that by adding DMA read memory barriers on those paths. >>>>> >>>>> I did not observe any actual issues caused by these being missing, >>>>> though. >>>>> >>>>> Tested on a ZynqMP based system. >>>>> >>>>> Signed-off-by: Anssi Hannula <anssi.hann...@bitwise.fi> >>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >>>>> Cc: Nicolas Ferre <nicolas.fe...@microchip.com> >>>>> --- >>>>> drivers/net/ethernet/cadence/macb_main.c | 20 ++++++++++++++++---- >>>>> 1 file changed, 16 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>>>> b/drivers/net/ethernet/cadence/macb_main.c >>>>> index 430b7a0f5436..c93baa8621d5 100644 >>>>> --- a/drivers/net/ethernet/cadence/macb_main.c >>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue >>>>> *queue) >>>>> >>>>> /* First, update TX stats if needed */ >>>>> if (skb) { >>>>> + /* Ensure all of desc is at least as up-to-date >>>>> + * as ctrl (TX_USED bit) >>>>> + */ >>>>> + dma_rmb(); >>>>> + >>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time >>>> data specific to this descriptor was completed. The TX descriptors for next >>>> data to be send is updated under a locked spinlock. >>> The previous rmb() is before the TX_USED check, so my understanding is >>> that the following could happen in theory: >> We are using this IP on and ARM architecture, so, with regards to rmb(), I >> understand from [1] that dsb completes when: >> "All explicit memory accesses before this instruction complete. >> All Cache, Branch predictor and TLB maintenance operations before this >> instruction complete." >> >>> 1. rmb(). >> According to [1] this should end after all previous instructions (loads, >> stores) ends. >> >>> 2. Reads are reordered so that TX timestamp is read first - no barriers >>> are crossed. >> But, as per [1], no onward instruction will be reached until all >> instruction prior to dsb ends, so, after rmb() all descriptor's members >> should be updated, right? > > The descriptor that triggered the TX interrupt should be visible now, yes. > However, the controller may be writing to any other descriptors at the > same time as the loop is processing through them, as there are multiple > TX buffers.
Yes, but there is the "rmb()" after "desc = macb_tx_desc(queue, tail);" at the beginning of loop intended for that... In the 2nd loop you are operating with the same descriptor which was read in the first loop. > >>> 3. HW writes timestamp and sets TX_USED (or they become visible). >> I expect hardware to set TX_USED and timestamp before raising TX complete >> interrupt. If so, there should be no on-flight updates of this descriptor, >> right? Hardware raised a TX_USED bit read interrupt when it reads a >> descriptor like this and hangs TX. > > For the first iteration of the loop, that is correct - there should be > no in-flight writes from controller as it already raised the interrupt. > However, the following iterations of the loop process descriptors that > may or may not have the interrupt raised yet, and therefore may still > have in-flight writes. I expect the hardware to give up ownership of the descriptor just at the end of TX operation, so that the word containing TX_USED to be the last one updated. If you reads the descriptor and TX_USED is not there the first loop breaks, queue->tx_tail is updated with the last processed descriptor in the queue (see "queue->tx_tail = tail;") then the next interrupt will start processing with this last one descriptor. > >>> 4. Code checks TX_USED. >>> 5. Code operates on timestamp that is actually garbage. >>> >>> I'm not 100% sure that there isn't some lighter/cleaner way to do this >>> than dma_rmb(), though. >> If you still think this scenario could happen why not calling a dsb in >> gem_ptp_do_timestamp(). I feel like that is a proper place to call it. > > OK, I will move it there. Unless we arrive at a conclusion that it is > unnecessary altogether, of course :) > >> Moreover, there is bit 32 of desc->ctrl which tells you if a valid >> timestamp was placed in the descriptor. But, again, I expect the timestamp >> and TX_USED to be set by hardware before raising TX complete interrupt. > > Yes, but since my concern is that without barriers in between, > desc->ctrl might be read after ts_1/ts_2, so that bit might be seen as > set even though ts_1 is not yet an actual timestamp. And per above, all > this may occur before the TX complete interrupt is raised for the > descriptor in question. If so, why not placing the barrier when reading timestamps? From my point of view the place of "dma_rmb()" in this patch doesn't guarantees that ts1/ts2 were read after the execution of barrier (correct me if I'm wrong). > > I agree that this TX case seems somewhat unlikely to be triggered in > real life at least at present, though, as the ts_1/ts_2 read is so far > after the desc->ctrl read, and behind function calls, so unlikely to be > reordered before desc->ctrl read. > But the similar RX cases below seem more problematic as the racy loads > in question are right after each other in code. > >> [1] >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html >> >>>>> if (gem_ptp_do_txstamp(queue, skb, desc) == 0) { >>>>> /* skb now belongs to timestamp buffer >>>>> * and will be removed later >>>>> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int >>>>> budget) >>>>> rmb(); >>>>> >>>>> rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; >>>>> - addr = macb_get_addr(bp, desc); >>>>> - ctrl = desc->ctrl; >>>>> >>>>> if (!rxused) >>>>> break; >>>>> >>>>> + /* Ensure other data is at least as up-to-date as rxused */ >>>>> + dma_rmb(); >>>> Same here, wouldn't previous rmb() should do this job? >>> The scenario I'm concerned about here (and in the last hunk) is: >>> >>> 1. rmb(). >>> 2. ctrl is read (i.e. ctrl read reordered before addr read). >> Same here with regards to [1]. All prior loads, stores should be finished >> when dsb ends. > > Yes, but the problematic loads are *after* the dsb. > >>> 3. HW updates to ctrl and addr become visible. >>> 4. RX_USED check. >>> 5. code operates on garbage ctrl. >> If this is happen then the data will be read on next interrupt. > > As long as the RX_USED bit is set, the code in gem_rx() will either drop > the frame or process it, depending on if it seems valid or not. Yes, I wanted to say that if the RX_USED is not set when first reading ctrl then the frame will not be processed at that moment but on the next interrupt. It will > not be processed again on next interrupt. > > I'll try to rephrase what I meant: > > 1. rmb(). This does nothing for loads that occur after it. > 2. ctrl is loaded. It does not contain valid data yet. > 3. HW receives a new frame and writes ctrl, addr, and they become > visible to processor. > 4. addr is loaded. It contains the RX_USED=1 as written in step 3. > 5. Code does the RX_USED check, it sees 1 and continues processing the > frame. > 6. Code operates on ctrl, but as it was loaded before step 3, it > contains old data. > > The old ctrl may e.g. cause the frame to be dropped due to the > RX_SOF/RX_EOF check. Yes, agree, but placing dma_rmb() before "ctrl = desc->ctrl" will not solve this case, right? What you want is to have ctrl loaded after you execute next instruction which refers to it. According to [1] you will have to place the dma_rmb() after the "ctrl = desc->ctrl" instruction to ensure next time you refer to it, it contains the proper value. > > >> >> dma_rmb() is a dmb. According to [1]: >> "Data Memory Barrier acts as a memory barrier. It ensures that all explicit >> memory accesses that appear in program order before the DMB instruction are >> observed before any explicit memory accesses that appear in program order >> after the DMB instruction. It does not affect the ordering of any other >> instructions executing on the processor." >> >> and your code is: >> >> /* Ensure other data is at least as up-to-date as rxused */ >> dma_rmb(); >> >> addr = macb_get_addr(bp, desc); >> ctrl = desc->ctrl; >> >> I understand from this that you want to wait for instructions prior to >> dma_rmb() to be finished? > > I want the load of "desc->addr" on the "rxused = (desc->addr & > MACB_BIT(RX_USED)) ? true : false;" line to be observed before the later > loads, such as "desc->ctrl". > > >>> I think it may be OK to move the earlier rmb() outside the loop so that >>> there is an rmb() only before and after the RX loop, as I don't at least >>> immediately see any hard requirement to do it on each loop pass (unlike >>> the added dma_rmb()). But my intent was to fix issues instead of >>> optimization so I didn't look too closely into that. >> But you said you did not see any issues with the code as it was previously. > > I meant that I have not observed any issues in practice (though I > haven't really tried to), but I do see theoretical issues which this > patch addresses. > > Especially the issues addressed by the two RX hunks seem entirely > possible - the gem_rx() one requires the compiler to just reorder the > two adjacent loads, and the macb_rx() one does not even require any > reordering - desc->ctrl is read before desc->addr in the code. In gem_rx() I agree that a barrier might be necessary, but placing it after desc->addr will not guarantee you that the load of addr and ctrl will be consistent. According to [1] the barrier should be after reading of addr and ctrl. Thank you, Claudiu Beznea > >> Thank you, >> Claudiu Beznea >> >>>>> + >>>>> + addr = macb_get_addr(bp, desc); >>>>> + ctrl = desc->ctrl; >>>>> + >>>>> queue->rx_tail++; >>>>> count++; >>>>> >>>>> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int >>>>> budget) >>>>> /* Make hw descriptor updates visible to CPU */ >>>>> rmb(); >>>>> >>>>> - ctrl = desc->ctrl; >>>>> - >>>>> if (!(desc->addr & MACB_BIT(RX_USED))) >>>>> break; >>>>> >>>>> + /* Ensure other data is at least as up-to-date as addr */ >>>>> + dma_rmb(); >>>> Ditto >>>> >>>>> + >>>>> + ctrl = desc->ctrl; >>>>> + >>>>> if (ctrl & MACB_BIT(RX_SOF)) { >>>>> if (first_frag != -1) >>>>> discard_partial_frame(queue, first_frag, tail); >>>>> >