[PATCH 0/3] net: macb: DMA race condition fixes
Hi all, Here are a couple of race condition fixes for the macb driver. The first two are issues observed on real HW. Anssi Hannula (3): net: macb: fix random memory corruption on RX with 64-bit DMA net: macb: fix dropped RX frames due to a race net: macb: add missing barriers when reading buffers drivers/net/ethernet/cadence/macb_main.c | 34 +- 1 file changed, 28 insertions(+), 6 deletions(-) -- Anssi Hannula / Bitwise Oy
[PATCH 2/3] net: macb: fix dropped RX frames due to a race
Bit RX_USED set to 0 in the address field allows the controller to write data to the receive buffer descriptor. The driver does not ensure the ctrl field is ready (cleared) when the controller sees the RX_USED=0 written by the driver. The ctrl field might only be cleared after the controller has already updated it according to a newly received frame, causing the frame to be discarded in gem_rx() due to unexpected ctrl field contents. A message is logged when the above scenario occurs: macb ff0b.ethernet eth0: not whole frame pointed by descriptor Fix the issue by ensuring that when the controller sees RX_USED=0 the ctrl field is already cleared. This issue was observed on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: 4df95131ea80 ("net/macb: change RX path for GEM") Cc: Nicolas Ferre --- drivers/net/ethernet/cadence/macb_main.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 0bc2aab7be40..430b7a0f5436 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -935,14 +935,19 @@ static void gem_rx_refill(struct macb_queue *queue) if (entry == bp->rx_ring_size - 1) paddr |= MACB_BIT(RX_WRAP); - macb_set_addr(bp, desc, paddr); desc->ctrl = 0; + /* Setting addr clears RX_USED and allows reception, +* make sure ctrl is cleared first to avoid a race. +*/ + dma_wmb(); + macb_set_addr(bp, desc, paddr); /* properly align Ethernet header */ skb_reserve(skb, NET_IP_ALIGN); } else { - desc->addr &= ~MACB_BIT(RX_USED); desc->ctrl = 0; + dma_wmb(); + desc->addr &= ~MACB_BIT(RX_USED); } } -- 2.17.2
[PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA
64-bit DMA addresses are split in upper and lower halves that are written in separate fields on GEM. For RX, bit 0 of the address is used as the ownership bit (RX_USED). When the RX_USED bit is unset the controller is allowed to write data to the buffer. The driver does not guarantee that the controller already sees the upper half when the RX_USED bit is cleared, possibly resulting in the controller writing an incoming frame to an address with an incorrect upper half and therefore possibly corrupting unrelated system memory. Fix that by adding the necessary DMA memory barrier between the writes. This corruption was observed on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM") Cc: Nicolas Ferre Cc: Harini Katakam Cc: Michal Simek --- drivers/net/ethernet/cadence/macb_main.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index d8c7ca037ae3..0bc2aab7be40 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct macb_dma_desc *desc, dma_addr_ if (bp->hw_dma_cap & HW_DMA_CAP_64B) { desc_64 = macb_64b_desc(bp, desc); desc_64->addrh = upper_32_bits(addr); + /* The low bits of RX address contain the RX_USED bit, clearing +* of which allows packet RX. Make sure the high bits are also +* visible to HW at that point. +*/ + dma_wmb(); } #endif desc->addr = lower_32_bits(addr); -- 2.17.2
[PATCH 3/3] net: macb: add missing barriers when reading buffers
When reading buffer descriptors on RX or on TX completion, an RX_USED/TX_USED bit is checked first to ensure that the descriptor has been populated. However, there are no memory barriers to ensure that the data protected by the RX_USED/TX_USED bit is up-to-date with respect to that bit. Fix that by adding DMA read memory barriers on those paths. I did not observe any actual issues caused by these being missing, though. Tested on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") Cc: Nicolas Ferre --- drivers/net/ethernet/cadence/macb_main.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 430b7a0f5436..c93baa8621d5 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue) /* First, update TX stats if needed */ if (skb) { + /* Ensure all of desc is at least as up-to-date +* as ctrl (TX_USED bit) +*/ + dma_rmb(); + if (gem_ptp_do_txstamp(queue, skb, desc) == 0) { /* skb now belongs to timestamp buffer * and will be removed later @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int budget) rmb(); rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; - addr = macb_get_addr(bp, desc); - ctrl = desc->ctrl; if (!rxused) break; + /* Ensure other data is at least as up-to-date as rxused */ + dma_rmb(); + + addr = macb_get_addr(bp, desc); + ctrl = desc->ctrl; + queue->rx_tail++; count++; @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int budget) /* Make hw descriptor updates visible to CPU */ rmb(); - ctrl = desc->ctrl; - if (!(desc->addr & MACB_BIT(RX_USED))) break; + /* Ensure other data is at least as up-to-date as addr */ + dma_rmb(); + + ctrl = desc->ctrl; + if (ctrl & MACB_BIT(RX_SOF)) { if (first_frag != -1) discard_partial_frame(queue, first_frag, tail); -- 2.17.2
[PATCH 0/2] net: phy: fixes to PHY_HALTED handling
Hi all, Here are a couple of fixes for PHY_HALTED/PHY_RESUMING handling. On a related note, it feels a bit strange that AFAICS phydevs will only be put to powerdown state (via PHY_HALTED) after the network interface has been brought up and down once. If the ethernet interface is never brought up, the phydev remains powered on (in PHY_READY). This is because the phydev is only put to PHY_HALTED from phy_stop() and phy_error(), and phy_stop() seems to be normally called only from .ndo_stop(). But I didn't touch that now as I wasn't sure what is the intent there. Anssi Hannula (2): net: phy: suspend phydev on PHY_HALTED even if there is no link net: phy: ensure autoneg is configured when resuming a phydev drivers/net/phy/phy.c | 13 +++-- include/linux/phy.h | 2 ++ 2 files changed, 13 insertions(+), 2 deletions(-) -- Anssi Hannula / Bitwise Oy
[PATCH 1/2] net: phy: suspend phydev on PHY_HALTED even if there is no link
When the phydev is put to PHY_HALTED (by e.g. phy_stop()) the state machine suspends the phydev to save power. However, this is wrongly inside a phydev->link check, causing the phydev not to be suspended if there was no link at the time of stopping it. Fix that by setting do_suspend regardless of whether there was a link or not. Signed-off-by: Anssi Hannula Fixes: be9dad1f9f26 ("net: phy: suspend phydev when going to HALTED") Cc: Sebastian Hesselbarth --- drivers/net/phy/phy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 376a0d8a2b61..d46858694db9 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -956,8 +956,8 @@ void phy_state_machine(struct work_struct *work) if (phydev->link) { phydev->link = 0; phy_link_down(phydev, true); - do_suspend = true; } + do_suspend = true; break; } -- 2.17.2
[PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev
When a PHY_HALTED phydev is resumed by phy_start(), it is set to PHY_RESUMING to wait for the link to come up. However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before autonegotiation was ever started by phy_state_machine(), autonegotiation remains unconfigured, i.e. phy_config_aneg() is never called. Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation has never been configured. Signed-off-by: Anssi Hannula --- This doesn't feel as clean is I'd like to, though. Maybe there is a better way? drivers/net/phy/phy.c | 11 ++- include/linux/phy.h | 2 ++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index d46858694db9..7a650cce7599 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev) if (err < 0) goto out_unlock; + phydev->autoneg_configured = 1; + if (phydev->state != PHY_HALTED) { if (AUTONEG_ENABLE == phydev->autoneg) { err = phy_check_link_status(phydev); @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev) break; } - phydev->state = PHY_RESUMING; + /* if autoneg/forcing was never configured, go back to PHY_UP +* to make that happen +*/ + if (!phydev->autoneg_configured) + phydev->state = PHY_UP; + else + phydev->state = PHY_RESUMING; + break; default: break; diff --git a/include/linux/phy.h b/include/linux/phy.h index 8f927246acdb..b306d5ed9d09 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -389,6 +389,8 @@ struct phy_device { unsigned loopback_enabled:1; unsigned autoneg:1; + /* autoneg has been configured at least once */ + unsigned autoneg_configured:1; /* The most recently read link state */ unsigned link:1; -- 2.17.2
Re: [PATCH 2/3] net: macb: fix dropped RX frames due to a race
Hi, On 3.12.2018 6:52, Harini Katakam wrote: > Hi Anssi, > On Fri, Nov 30, 2018 at 11:53 PM Anssi Hannula > wrote: >> Bit RX_USED set to 0 in the address field allows the controller to write >> data to the receive buffer descriptor. >> >> The driver does not ensure the ctrl field is ready (cleared) when the >> controller sees the RX_USED=0 written by the driver. The ctrl field might >> only be cleared after the controller has already updated it according to >> a newly received frame, causing the frame to be discarded in gem_rx() due >> to unexpected ctrl field contents. >> >> A message is logged when the above scenario occurs: >> >> macb ff0b.ethernet eth0: not whole frame pointed by descriptor >> >> Fix the issue by ensuring that when the controller sees RX_USED=0 the >> ctrl field is already cleared. >> >> This issue was observed on a ZynqMP based system. >> > Thanks for the patch. > Could you please describe the test in which this behavior was observed? Sure. The testcase I used for the patches is: - RT_FULL kernel, - CPU-bound SCHED_FF RT priority 15 process (with rcutree.kthread_prio=20 to avoid RCU starvation), - Pyropus memtester running for 3GB (system has 4GB memory), - "ping -f -l 5000 -s 100" running from a PC. The "not whole frame pointed by descriptor" issue occurs within minutes and the RX memory corruption within an hour. I did not try to reduce the testcase to a minimum. Both were also observed using real production loads (that of course do not have CPU-bound RT tasks). > Were you able to confirm that this was because of the ctrl field being > cleared late? This error can also be observed under stress when RX UBR > is observed. I observed that the issue occurred without this patch, and didn't occur after applying this patch (individually), but I didn't check it further than that. If you have anything you'd like me to test, let me know. > I understand it makes sense to clear ctrl field before setting RX used bit. > But I'm trying to understand if a dmb is necessary in the receive data path. A barrier is needed as otherwise writes to the ctrl field and addr fields may be freely reordered as they are just regular writes to memory, as far as I've understood it, potentially undoing the effect of changing the order. If you mean the third patch, same issue with reads - they may be freely reordered (by e.g. the compiler) as they are just regular reads. But I don't claim to be an expert on these so please correct me if I'm wrong :) -- Anssi Hannula / Bitwise Oy
Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev
On 1.12.2018 0:16, Heiner Kallweit wrote: > On 30.11.2018 19:45, Anssi Hannula wrote: >> When a PHY_HALTED phydev is resumed by phy_start(), it is set to >> PHY_RESUMING to wait for the link to come up. >> >> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before >> autonegotiation was ever started by phy_state_machine(), autonegotiation >> remains unconfigured, i.e. phy_config_aneg() is never called. >> > phy_stop() should only be called once the PHY was started. But it's > right that we don't have full control over it because misbehaving > drivers may call phy_stop() even if the PHY hasn't been started yet. Having run phy_start() does not guarantee that the phy_state_machine() has been run at least once, though. I was able to reproduce the issue by calling phy_start(); phy_stop(). Then on the next phy_start() autoneg is not configured. > I think phy_error() and phy_stop() should be extended to set the state > to PHY_HALTED only if the PHY is in a started state, means: > don't touch the state if it's DOWN, READY, or HALTED. > In case of phy_error() it's more a precaution, because it's used within > phylib only and AFAICS it can be triggered from a started state only. This wouldn't fix the issue that occurs when phy_stop() is called right after phy_start(), though, as PHY is in UP state then. > > So yes, there is a theoretical issue. But as you wrote already, I think > there's a better way to deal with it. > > For checking whether PHY is in a started state you could use the helper > which is being discussed here: > https://marc.info/?l=linux-netdev&m=154360368104946&w=2 > > >> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation >> has never been configured. >> >> Signed-off-by: Anssi Hannula >> --- >> >> This doesn't feel as clean is I'd like to, though. Maybe there is a >> better way? >> >> >> drivers/net/phy/phy.c | 11 ++- >> include/linux/phy.h | 2 ++ >> 2 files changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c >> index d46858694db9..7a650cce7599 100644 >> --- a/drivers/net/phy/phy.c >> +++ b/drivers/net/phy/phy.c >> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev) >> if (err < 0) >> goto out_unlock; >> >> +phydev->autoneg_configured = 1; >> + >> if (phydev->state != PHY_HALTED) { >> if (AUTONEG_ENABLE == phydev->autoneg) { >> err = phy_check_link_status(phydev); >> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev) >> break; >> } >> >> -phydev->state = PHY_RESUMING; >> +/* if autoneg/forcing was never configured, go back to PHY_UP >> + * to make that happen >> + */ >> +if (!phydev->autoneg_configured) >> +phydev->state = PHY_UP; >> +else >> +phydev->state = PHY_RESUMING; >> + >> break; >> default: >> break; >> diff --git a/include/linux/phy.h b/include/linux/phy.h >> index 8f927246acdb..b306d5ed9d09 100644 >> --- a/include/linux/phy.h >> +++ b/include/linux/phy.h >> @@ -389,6 +389,8 @@ struct phy_device { >> unsigned loopback_enabled:1; >> >> unsigned autoneg:1; >> +/* autoneg has been configured at least once */ >> +unsigned autoneg_configured:1; >> /* The most recently read link state */ >> unsigned link:1; >> >> -- Anssi Hannula / Bitwise Oy
Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev
On 3.12.2018 23:41, Heiner Kallweit wrote: > On 03.12.2018 11:43, Anssi Hannula wrote: >> On 1.12.2018 0:16, Heiner Kallweit wrote: >>> On 30.11.2018 19:45, Anssi Hannula wrote: >>>> When a PHY_HALTED phydev is resumed by phy_start(), it is set to >>>> PHY_RESUMING to wait for the link to come up. >>>> >>>> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before >>>> autonegotiation was ever started by phy_state_machine(), autonegotiation >>>> remains unconfigured, i.e. phy_config_aneg() is never called. >>>> >>> phy_stop() should only be called once the PHY was started. But it's >>> right that we don't have full control over it because misbehaving >>> drivers may call phy_stop() even if the PHY hasn't been started yet. >> Having run phy_start() does not guarantee that the phy_state_machine() >> has been run at least once, though. >> >> I was able to reproduce the issue by calling phy_start(); phy_stop(). >> Then on the next phy_start() autoneg is not configured. >> > Right, phy_start() schedules the state machine run, so there is a small > window where this can happen. Did you experience this in any real-life > scenario? No, I encountered this while tinkering with phy_start/stop() to look how they interact with suspending phydevs as I wanted to suspend unused ones. So I don't think there is any rush to fix this one. >>> I think phy_error() and phy_stop() should be extended to set the state >>> to PHY_HALTED only if the PHY is in a started state, means: >>> don't touch the state if it's DOWN, READY, or HALTED. >>> In case of phy_error() it's more a precaution, because it's used within >>> phylib only and AFAICS it can be triggered from a started state only. >> This wouldn't fix the issue that occurs when phy_stop() is called right >> after phy_start(), though, as PHY is in UP state then. >> > After having removed state AN I was thinking already whether we really > need to have separate states READY and HALTED. I think we wouldn't > have this scenario if phy_stop() and phy_error() would set the state > to READY. Merging the two states requires some upfront work, but I have > some ideas in the back of my mind. > Overall this should be cleaner than the proposed workaround. Yeah, sounds better indeed. >>> So yes, there is a theoretical issue. But as you wrote already, I think >>> there's a better way to deal with it. >>> >>> For checking whether PHY is in a started state you could use the helper >>> which is being discussed here: >>> https://marc.info/?l=linux-netdev&m=154360368104946&w=2 >>> >>> >>>> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation >>>> has never been configured. >>>> >>>> Signed-off-by: Anssi Hannula >>>> --- >>>> >>>> This doesn't feel as clean is I'd like to, though. Maybe there is a >>>> better way? >>>> >>>> >>>> drivers/net/phy/phy.c | 11 ++- >>>> include/linux/phy.h | 2 ++ >>>> 2 files changed, 12 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c >>>> index d46858694db9..7a650cce7599 100644 >>>> --- a/drivers/net/phy/phy.c >>>> +++ b/drivers/net/phy/phy.c >>>> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev) >>>>if (err < 0) >>>>goto out_unlock; >>>> >>>> + phydev->autoneg_configured = 1; >>>> + >>>>if (phydev->state != PHY_HALTED) { >>>>if (AUTONEG_ENABLE == phydev->autoneg) { >>>>err = phy_check_link_status(phydev); >>>> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev) >>>>break; >>>>} >>>> >>>> - phydev->state = PHY_RESUMING; >>>> + /* if autoneg/forcing was never configured, go back to PHY_UP >>>> + * to make that happen >>>> + */ >>>> + if (!phydev->autoneg_configured) >>>> + phydev->state = PHY_UP; >>>> + else >>>> + phydev->state = PHY_RESUMING; >>>> + >>>>break; >>>>default: >>>>break; >>>> diff --git a/include/linux/phy.h b/include/linux/phy.h >>>> index 8f927246acdb..b306d5ed9d09 100644 >>>> --- a/include/linux/phy.h >>>> +++ b/include/linux/phy.h >>>> @@ -389,6 +389,8 @@ struct phy_device { >>>>unsigned loopback_enabled:1; >>>> >>>>unsigned autoneg:1; >>>> + /* autoneg has been configured at least once */ >>>> + unsigned autoneg_configured:1; >>>>/* The most recently read link state */ >>>>unsigned link:1; >>>> >>>> -- Anssi Hannula / Bitwise Oy
Re: [PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA
On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: > Hi Anssi, Hi, and thanks for looking at these. > Few comments... Otherwise I tested this series on a SAMA5D2 Xplained and > SAMA5D4 Xplained under heavy traffic and it seems to behave OK. > > Thank you, > Claudiu Beznea > > On 30.11.2018 20:21, Anssi Hannula wrote: >> 64-bit DMA addresses are split in upper and lower halves that are >> written in separate fields on GEM. For RX, bit 0 of the address is used >> as the ownership bit (RX_USED). When the RX_USED bit is unset the >> controller is allowed to write data to the buffer. >> >> The driver does not guarantee that the controller already sees the upper >> half when the RX_USED bit is cleared, possibly resulting in the >> controller writing an incoming frame to an address with an incorrect >> upper half and therefore possibly corrupting unrelated system memory. >> >> Fix that by adding the necessary DMA memory barrier between the writes.> >> This corruption was observed on a ZynqMP based system. >> >> Signed-off-by: Anssi Hannula >> Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM") >> Cc: Nicolas Ferre >> Cc: Harini Katakam >> Cc: Michal Simek >> --- >> drivers/net/ethernet/cadence/macb_main.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/net/ethernet/cadence/macb_main.c >> b/drivers/net/ethernet/cadence/macb_main.c >> index d8c7ca037ae3..0bc2aab7be40 100644 >> --- a/drivers/net/ethernet/cadence/macb_main.c >> +++ b/drivers/net/ethernet/cadence/macb_main.c >> @@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct >> macb_dma_desc *desc, dma_addr_ >> if (bp->hw_dma_cap & HW_DMA_CAP_64B) { >> desc_64 = macb_64b_desc(bp, desc); >> desc_64->addrh = upper_32_bits(addr); >> +/* The low bits of RX address contain the RX_USED bit, clearing >> + * of which allows packet RX. Make sure the high bits are also >> + * visible to HW at that point. >> + */ >> +dma_wmb(); > I think a wmb() would fit better here so that on ARM to also force the > flushing of caches not affected by dmb() by calling arm_heavy_mb(). Hmm, if we want to simply ensure ordering of the two writes (upper half, lower half) to DMA memory, isn't dma_wmb() exactly for that purpose? This situation seems to match the dma_wmb() example in Documentation/memory-barriers.txt where data is updated before ownership bit. If dma_wmb() = dmb(oshst) is not enough on ARM for this purpose, should not dma_wmb() be fixed then? Or maybe I'm missing some difference why dma_wmb() is not enough here but would be OK on some other case (e.g. in the memory-barriers.txt example)? > Thank you, > Claudiu Beznea > >> } >> #endif >> desc->addr = lower_32_bits(addr); >> -- Anssi Hannula / Bitwise Oy +358 503803997
Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers
On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: > > On 30.11.2018 20:21, Anssi Hannula wrote: >> When reading buffer descriptors on RX or on TX completion, an >> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >> been populated. However, there are no memory barriers to ensure that the >> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >> that bit. >> >> Fix that by adding DMA read memory barriers on those paths. >> >> I did not observe any actual issues caused by these being missing, >> though. >> >> Tested on a ZynqMP based system. >> >> Signed-off-by: Anssi Hannula >> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >> Cc: Nicolas Ferre >> --- >> drivers/net/ethernet/cadence/macb_main.c | 20 >> 1 file changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/cadence/macb_main.c >> b/drivers/net/ethernet/cadence/macb_main.c >> index 430b7a0f5436..c93baa8621d5 100644 >> --- a/drivers/net/ethernet/cadence/macb_main.c >> +++ b/drivers/net/ethernet/cadence/macb_main.c >> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue) >> >> /* First, update TX stats if needed */ >> if (skb) { >> +/* Ensure all of desc is at least as up-to-date >> + * as ctrl (TX_USED bit) >> + */ >> +dma_rmb(); >> + > Is this necessary? Wouldn't previous rmb() take care of this? At this time > data specific to this descriptor was completed. The TX descriptors for next > data to be send is updated under a locked spinlock. The previous rmb() is before the TX_USED check, so my understanding is that the following could happen in theory: 1. rmb(). 2. Reads are reordered so that TX timestamp is read first - no barriers are crossed. 3. HW writes timestamp and sets TX_USED (or they become visible). 4. Code checks TX_USED. 5. Code operates on timestamp that is actually garbage. I'm not 100% sure that there isn't some lighter/cleaner way to do this than dma_rmb(), though. >> if (gem_ptp_do_txstamp(queue, skb, desc) == 0) { >> /* skb now belongs to timestamp buffer >> * and will be removed later >> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int >> budget) >> rmb(); >> >> rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; >> -addr = macb_get_addr(bp, desc); >> -ctrl = desc->ctrl; >> >> if (!rxused) >> break; >> >> +/* Ensure other data is at least as up-to-date as rxused */ >> +dma_rmb(); > Same here, wouldn't previous rmb() should do this job? The scenario I'm concerned about here (and in the last hunk) is: 1. rmb(). 2. ctrl is read (i.e. ctrl read reordered before addr read). 3. HW updates to ctrl and addr become visible. 4. RX_USED check. 5. code operates on garbage ctrl. I think it may be OK to move the earlier rmb() outside the loop so that there is an rmb() only before and after the RX loop, as I don't at least immediately see any hard requirement to do it on each loop pass (unlike the added dma_rmb()). But my intent was to fix issues instead of optimization so I didn't look too closely into that. >> + >> +addr = macb_get_addr(bp, desc); >> +ctrl = desc->ctrl; >> + >> queue->rx_tail++; >> count++; >> >> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int >> budget) >> /* Make hw descriptor updates visible to CPU */ >> rmb(); >> >> - ctrl = desc->ctrl; >> - >> if (!(desc->addr & MACB_BIT(RX_USED))) >> break; >> >> +/* Ensure other data is at least as up-to-date as addr */ >> +dma_rmb(); > Ditto > >> + >> +ctrl = desc->ctrl; >> + >> if (ctrl & MACB_BIT(RX_SOF)) { >> if (first_frag != -1) >> discard_partial_frame(queue, first_frag, tail); >> -- Anssi Hannula / Bitwise Oy +358 503803997
Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers
On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote: > Hi Anssi, Hi! > On 05.12.2018 16:00, Anssi Hannula wrote: >> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: >>> On 30.11.2018 20:21, Anssi Hannula wrote: >>>> When reading buffer descriptors on RX or on TX completion, an >>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >>>> been populated. However, there are no memory barriers to ensure that the >>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >>>> that bit. >>>> >>>> Fix that by adding DMA read memory barriers on those paths. >>>> >>>> I did not observe any actual issues caused by these being missing, >>>> though. >>>> >>>> Tested on a ZynqMP based system. >>>> >>>> Signed-off-by: Anssi Hannula >>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >>>> Cc: Nicolas Ferre >>>> --- >>>> drivers/net/ethernet/cadence/macb_main.c | 20 >>>> 1 file changed, 16 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>>> b/drivers/net/ethernet/cadence/macb_main.c >>>> index 430b7a0f5436..c93baa8621d5 100644 >>>> --- a/drivers/net/ethernet/cadence/macb_main.c >>>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue >>>> *queue) >>>> >>>>/* First, update TX stats if needed */ >>>>if (skb) { >>>> + /* Ensure all of desc is at least as up-to-date >>>> + * as ctrl (TX_USED bit) >>>> + */ >>>> + dma_rmb(); >>>> + >>> Is this necessary? Wouldn't previous rmb() take care of this? At this time >>> data specific to this descriptor was completed. The TX descriptors for next >>> data to be send is updated under a locked spinlock. >> The previous rmb() is before the TX_USED check, so my understanding is >> that the following could happen in theory: > We are using this IP on and ARM architecture, so, with regards to rmb(), I > understand from [1] that dsb completes when: > "All explicit memory accesses before this instruction complete. > All Cache, Branch predictor and TLB maintenance operations before this > instruction complete." > >> 1. rmb(). > According to [1] this should end after all previous instructions (loads, > stores) ends. > >> 2. Reads are reordered so that TX timestamp is read first - no barriers >> are crossed. > But, as per [1], no onward instruction will be reached until all > instruction prior to dsb ends, so, after rmb() all descriptor's members > should be updated, right? The descriptor that triggered the TX interrupt should be visible now, yes. However, the controller may be writing to any other descriptors at the same time as the loop is processing through them, as there are multiple TX buffers. >> 3. HW writes timestamp and sets TX_USED (or they become visible). > I expect hardware to set TX_USED and timestamp before raising TX complete > interrupt. If so, there should be no on-flight updates of this descriptor, > right? Hardware raised a TX_USED bit read interrupt when it reads a > descriptor like this and hangs TX. For the first iteration of the loop, that is correct - there should be no in-flight writes from controller as it already raised the interrupt. However, the following iterations of the loop process descriptors that may or may not have the interrupt raised yet, and therefore may still have in-flight writes. >> 4. Code checks TX_USED. >> 5. Code operates on timestamp that is actually garbage. >> >> I'm not 100% sure that there isn't some lighter/cleaner way to do this >> than dma_rmb(), though. > If you still think this scenario could happen why not calling a dsb in > gem_ptp_do_timestamp(). I feel like that is a proper place to call it. OK, I will move it there. Unless we arrive at a conclusion that it is unnecessary altogether, of course :) > Moreover, there is bit 32 of desc->ctrl which tells you if a valid > timestamp was placed in the descriptor. But, again, I expect the timestamp > and TX_USED to be set by hardware before raising TX complete interrupt. Yes, but since my concern is that without barriers in between, desc->ctrl might be read after ts_1/ts_2, so th
Re: [PATCH 0/3] net: macb: DMA race condition fixes
On 5.12.2018 22:35, David Miller wrote: > From: Anssi Hannula > Date: Fri, 30 Nov 2018 20:21:34 +0200 > >> Here are a couple of race condition fixes for the macb driver. The first >> two are issues observed on real HW. > It looks like there is still an active discussion about the memory > barriers in patch #3 being excessive. > > Once that is sorted out to everyone's satisfaction, would you > please repost this series with appropriate ACKs, reviewed-by's, > tested-by's, etc. added? > > Thank you. OK, I'll do that once everything is sorted. Nicolas, were Claudiu's tests the ones you wanted to wait for or are we waiting for more tests? -- Anssi Hannula / Bitwise Oy +358 503803997
Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers
On 10.12.2018 12:34, claudiu.bez...@microchip.com wrote: > On 07.12.2018 14:00, Anssi Hannula wrote: >> On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote: >>> Hi Anssi, >> Hi! >> >>> On 05.12.2018 16:00, Anssi Hannula wrote: >>>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: >>>>> On 30.11.2018 20:21, Anssi Hannula wrote: >>>>>> When reading buffer descriptors on RX or on TX completion, an >>>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >>>>>> been populated. However, there are no memory barriers to ensure that the >>>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >>>>>> that bit. >>>>>> >>>>>> Fix that by adding DMA read memory barriers on those paths. >>>>>> >>>>>> I did not observe any actual issues caused by these being missing, >>>>>> though. >>>>>> >>>>>> Tested on a ZynqMP based system. >>>>>> >>>>>> Signed-off-by: Anssi Hannula >>>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >>>>>> Cc: Nicolas Ferre >>>>>> --- >>>>>> drivers/net/ethernet/cadence/macb_main.c | 20 >>>>>> 1 file changed, 16 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>>>>> b/drivers/net/ethernet/cadence/macb_main.c >>>>>> index 430b7a0f5436..c93baa8621d5 100644 >>>>>> --- a/drivers/net/ethernet/cadence/macb_main.c >>>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue >>>>>> *queue) >>>>>> >>>>>> /* First, update TX stats if needed */ >>>>>> if (skb) { >>>>>> +/* Ensure all of desc is at least as >>>>>> up-to-date >>>>>> + * as ctrl (TX_USED bit) >>>>>> + */ >>>>>> +dma_rmb(); >>>>>> + >>>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time >>>>> data specific to this descriptor was completed. The TX descriptors for >>>>> next >>>>> data to be send is updated under a locked spinlock. >>>> The previous rmb() is before the TX_USED check, so my understanding is >>>> that the following could happen in theory: >>> We are using this IP on and ARM architecture, so, with regards to rmb(), I >>> understand from [1] that dsb completes when: >>> "All explicit memory accesses before this instruction complete. >>> All Cache, Branch predictor and TLB maintenance operations before this >>> instruction complete." >>> >>>> 1. rmb(). >>> According to [1] this should end after all previous instructions (loads, >>> stores) ends. >>> >>>> 2. Reads are reordered so that TX timestamp is read first - no barriers >>>> are crossed. >>> But, as per [1], no onward instruction will be reached until all >>> instruction prior to dsb ends, so, after rmb() all descriptor's members >>> should be updated, right? >> The descriptor that triggered the TX interrupt should be visible now, yes. >> However, the controller may be writing to any other descriptors at the >> same time as the loop is processing through them, as there are multiple >> TX buffers. > Yes, but there is the "rmb()" after "desc = macb_tx_desc(queue, tail);" at > the beginning of loop intended for that... In the 2nd loop you are > operating with the same descriptor which was read in the first loop. I'm concerned about the 2nd iteration of the first for loop in macb_tx_interrupt(). That operates on a different descriptor due to tail++, and that different descriptor may have in-flight writes from controller as the interrupt may or may not already be raised for it. Or am I missing something? >>>> 3. HW writes timestamp and sets TX_USED (or they become visible). >>> I expect hardware to set TX_USED and timestamp before raising TX complete >>> interrupt. If so, there should be no on-flight updates of this descrip
[PATCH 3/3 v2] net: macb: add missing barriers when reading descriptors
When reading buffer descriptors on RX or on TX completion, an RX_USED/TX_USED bit is checked first to ensure that the descriptors have been populated, i.e. the ownership has been transferred. However, there are no memory barriers to ensure that the data protected by the RX_USED/TX_USED bit is up-to-date with respect to that bit. Specifically: - TX timestamp descriptors may be loaded before ctrl is loaded for the TX_USED check, which is racy as the descriptors may be updated between the loads, causing old timestamp descriptor data to be used. - RX ctrl may be loaded before addr is loaded for the RX_USED check, which is racy as a new frame may be written between the loads, causing old ctrl descriptor data to be used. This issue exists for both macb_rx() and gem_rx() variants. Fix the races by adding DMA read memory barriers on those paths and reordering the reads in macb_rx(). I have not observed any actual problems in practice caused by these being missing, though. Tested on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") Cc: Nicolas Ferre --- The discussion is still ongoing on whether I have used the barriers correctly here, but in the meantime, here's an updated version of the patch. v2: - moved the timestamp protection barrier closer to the timestamp reads - removed unnecessary move of the addr assignment in gem_rx() to keep the patch minimal for maximum clarity - clarified commit message and comments drivers/net/ethernet/cadence/macb_main.c | 13 ++--- drivers/net/ethernet/cadence/macb_ptp.c | 2 ++ 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 430b7a0f5436..10a2bb44612b 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1001,11 +1001,15 @@ static int gem_rx(struct macb_queue *queue, int budget) rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; addr = macb_get_addr(bp, desc); - ctrl = desc->ctrl; if (!rxused) break; + /* Ensure ctrl is at least as up-to-date as rxused */ + dma_rmb(); + + ctrl = desc->ctrl; + queue->rx_tail++; count++; @@ -1180,11 +1184,14 @@ static int macb_rx(struct macb_queue *queue, int budget) /* Make hw descriptor updates visible to CPU */ rmb(); - ctrl = desc->ctrl; - if (!(desc->addr & MACB_BIT(RX_USED))) break; + /* Ensure ctrl is at least as up-to-date as addr */ + dma_rmb(); + + ctrl = desc->ctrl; + if (ctrl & MACB_BIT(RX_SOF)) { if (first_frag != -1) discard_partial_frame(queue, first_frag, tail); diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c index cd5296b84229..a6dc47edc4cf 100644 --- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -319,6 +319,8 @@ int gem_ptp_txstamp(struct macb_queue *queue, struct sk_buff *skb, desc_ptp = macb_ptp_desc(queue->bp, desc); tx_timestamp = &queue->tx_timestamps[head]; tx_timestamp->skb = skb; + /* ensure ts_1/ts_2 is loaded after ctrl (TX_USED check) */ + dma_rmb(); tx_timestamp->desc_ptp.ts_1 = desc_ptp->ts_1; tx_timestamp->desc_ptp.ts_2 = desc_ptp->ts_2; /* move head */ -- 2.17.2
Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers
On 12.12.2018 12:58, claudiu.bez...@microchip.com wrote: > > On 11.12.2018 15:21, Anssi Hannula wrote: >> On 10.12.2018 12:34, claudiu.bez...@microchip.com wrote: >>> On 07.12.2018 14:00, Anssi Hannula wrote: >>>> On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote: >>>>> Hi Anssi, >>>> Hi! >>>> >>>>> On 05.12.2018 16:00, Anssi Hannula wrote: >>>>>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: >>>>>>> On 30.11.2018 20:21, Anssi Hannula wrote: >>>>>>>> When reading buffer descriptors on RX or on TX completion, an >>>>>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >>>>>>>> been populated. However, there are no memory barriers to ensure that >>>>>>>> the >>>>>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >>>>>>>> that bit. >>>>>>>> >>>>>>>> Fix that by adding DMA read memory barriers on those paths. >>>>>>>> >>>>>>>> I did not observe any actual issues caused by these being missing, >>>>>>>> though. >>>>>>>> >>>>>>>> Tested on a ZynqMP based system. >>>>>>>> >>>>>>>> Signed-off-by: Anssi Hannula >>>>>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >>>>>>>> Cc: Nicolas Ferre >>>>>>>> --- >>>>>>>> drivers/net/ethernet/cadence/macb_main.c | 20 >>>>>>>> 1 file changed, 16 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>>>>>>> b/drivers/net/ethernet/cadence/macb_main.c >>>>>>>> index 430b7a0f5436..c93baa8621d5 100644 >>>>>>>> --- a/drivers/net/ethernet/cadence/macb_main.c >>>>>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>>>>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue >>>>>>>> *queue) >>>>>>>> >>>>>>>>/* First, update TX stats if needed */ >>>>>>>>if (skb) { >>>>>>>> + /* Ensure all of desc is at least as >>>>>>>> up-to-date >>>>>>>> + * as ctrl (TX_USED bit) >>>>>>>> + */ >>>>>>>> + dma_rmb(); >>>>>>>> + >>>>>>> Is this necessary? Wouldn't previous rmb() take care of this? At this >>>>>>> time >>>>>>> data specific to this descriptor was completed. The TX descriptors for >>>>>>> next >>>>>>> data to be send is updated under a locked spinlock. >>>>>> The previous rmb() is before the TX_USED check, so my understanding is >>>>>> that the following could happen in theory: >>>>> We are using this IP on and ARM architecture, so, with regards to rmb(), I >>>>> understand from [1] that dsb completes when: >>>>> "All explicit memory accesses before this instruction complete. >>>>> All Cache, Branch predictor and TLB maintenance operations before this >>>>> instruction complete." >>>>> >>>>>> 1. rmb(). >>>>> According to [1] this should end after all previous instructions (loads, >>>>> stores) ends. >>>>> >>>>>> 2. Reads are reordered so that TX timestamp is read first - no barriers >>>>>> are crossed. >>>>> But, as per [1], no onward instruction will be reached until all >>>>> instruction prior to dsb ends, so, after rmb() all descriptor's members >>>>> should be updated, right? >>>> The descriptor that triggered the TX interrupt should be visible now, yes. >>>> However, the controller may be writing to any other descriptors at the >>>> same time as the loop is processing through them, as there are multiple >>>> TX buffers. >>> Yes, but there is the "rmb()" after "desc = macb_tx_desc(q
Re: [PATCH v3 net-next 04/12] net: ethernet: Use phy_set_max_speed() to limit advertised speed
Hi, On 12.9.2018 2:53, Andrew Lunn wrote: > Many Ethernet MAC drivers want to limit the PHY to only advertise a > maximum speed of 100Mbs or 1Gbps. Rather than using a mask, make use > of the helper function phy_set_max_speed(). But what if the PHY does not support 1Gbps in the first place? This now adds 1Gbps to ->supported and ->advertising even on PHYs that do not support 1Gbps or have the max speed limited to e.g. 100M via the "max-speed" device tree property, breaking things. Unless I'm missing something on how this is supposed to work now :) > Signed-off-by: Andrew Lunn > Reviewed-by: Florian Fainelli > --- > drivers/net/ethernet/8390/ax88796.c | 4 +--- > drivers/net/ethernet/aeroflex/greth.c | 4 ++-- > drivers/net/ethernet/agere/et131x.c | 12 ++-- > drivers/net/ethernet/allwinner/sun4i-emac.c | 3 +-- > drivers/net/ethernet/altera/altera_tse_main.c | 5 + > drivers/net/ethernet/amd/au1000_eth.c | 12 +--- > drivers/net/ethernet/broadcom/bcm63xx_enet.c | 10 ++ > drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +- > drivers/net/ethernet/broadcom/sb1250-mac.c| 11 ++- > drivers/net/ethernet/broadcom/tg3.c | 8 > drivers/net/ethernet/cadence/macb_main.c | 4 ++-- > drivers/net/ethernet/cortina/gemini.c | 2 +- > drivers/net/ethernet/dnet.c | 4 ++-- > drivers/net/ethernet/ethoc.c | 5 + > drivers/net/ethernet/freescale/fec_main.c | 4 ++-- > drivers/net/ethernet/freescale/ucc_geth.c | 7 +-- > drivers/net/ethernet/lantiq_etop.c| 11 ++- > drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 ++-- > drivers/net/ethernet/nxp/lpc_eth.c| 3 +-- > drivers/net/ethernet/rdc/r6040.c | 12 ++-- > drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 4 ++-- > drivers/net/ethernet/smsc/smsc911x.c | 5 +++-- > drivers/net/ethernet/smsc/smsc9420.c | 5 +++-- > drivers/net/ethernet/socionext/sni_ave.c | 6 ++ > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-- > drivers/net/ethernet/toshiba/tc35815.c| 2 +- > drivers/net/ethernet/xilinx/xilinx_emaclite.c | 3 +-- > drivers/staging/mt7621-eth/mdio.c | 2 +- > 28 files changed, 47 insertions(+), 110 deletions(-) [...] > diff --git a/drivers/net/ethernet/cadence/macb_main.c > b/drivers/net/ethernet/cadence/macb_main.c > index 16e4ef7d7185..bd4095c3a031 100644 > --- a/drivers/net/ethernet/cadence/macb_main.c > +++ b/drivers/net/ethernet/cadence/macb_main.c > @@ -544,9 +544,9 @@ static int macb_mii_probe(struct net_device *dev) > > /* mask with MAC supported features */ > if (macb_is_gem(bp) && bp->caps & MACB_CAPS_GIGABIT_MODE_AVAILABLE) > - phydev->supported &= PHY_GBIT_FEATURES; > + phy_set_max_speed(phydev, SPEED_1000); > else > - phydev->supported &= PHY_BASIC_FEATURES; > + phy_set_max_speed(phydev, SPEED_100); > > if (bp->caps & MACB_CAPS_NO_GIGABIT_HALF) > phydev->supported &= ~SUPPORTED_1000baseT_Half; [...] -- Anssi Hannula / Bitwise Oy
[PATCH] net: macb: do not disable MDIO bus when closing interface
macb_close() calls macb_reset_hw() which zeroes NCR register, including the MPE (Management Port Enable) bit. This will prevent accessing any other PHYs for other Ethernet MACs on the MDIO bus which is still registered. Fix that by keeping the MPE bit set. Signed-off-by: Anssi Hannula --- drivers/net/ethernet/cadence/macb_main.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index dc09f9a8a49b..3ca98fc32144 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -2030,12 +2030,13 @@ static void macb_reset_hw(struct macb *bp) unsigned int q; /* Disable RX and TX (XXX: Should we halt the transmission -* more gracefully?) +* more gracefully?) but keep management port open since there +* may be other users of the mdio bus */ - macb_writel(bp, NCR, 0); + macb_writel(bp, NCR, MACB_BIT(MPE)); /* Clear the stats registers (XXX: Update stats first?) */ - macb_writel(bp, NCR, MACB_BIT(CLRSTAT)); + macb_writel(bp, NCR, MACB_BIT(CLRSTAT) | MACB_BIT(MPE)); /* Clear all status flags */ macb_writel(bp, TSR, -1); -- 2.16.3
Re: [PATCH] net: macb: do not disable MDIO bus when closing interface
On 9.8.2018 11:26, Claudiu Beznea wrote: > On 08.08.2018 15:19, Anssi Hannula wrote: >> macb_close() calls macb_reset_hw() which zeroes NCR register, including >> the MPE (Management Port Enable) bit. >> >> This will prevent accessing any other PHYs for other Ethernet MACs on >> the MDIO bus which is still registered. >> >> Fix that by keeping the MPE bit set. >> >> Signed-off-by: Anssi Hannula >> --- >> drivers/net/ethernet/cadence/macb_main.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/ethernet/cadence/macb_main.c >> b/drivers/net/ethernet/cadence/macb_main.c >> index dc09f9a8a49b..3ca98fc32144 100644 >> --- a/drivers/net/ethernet/cadence/macb_main.c >> +++ b/drivers/net/ethernet/cadence/macb_main.c >> @@ -2030,12 +2030,13 @@ static void macb_reset_hw(struct macb *bp) >> unsigned int q; >> >> /* Disable RX and TX (XXX: Should we halt the transmission >> - * more gracefully?) >> + * more gracefully?) but keep management port open since there >> + * may be other users of the mdio bus >> */ >> -macb_writel(bp, NCR, 0); >> +macb_writel(bp, NCR, MACB_BIT(MPE)); > Would be better to read the NCR and clear only RX and TX bits, something like: > val = macb_readl(bp, NCR); > > /* Disable TX and RX. */ > val &= ~(MACB_BIT(TE) | MACB_BIT(RE)); > > /* Clear statistics */ > val |= MACB_BIT(CLRSTAT); > > macb_writel(bp, NCR, val); > > MPE should have been enabled by previous operations. macb_reset_hw() is called in init path too, though, so maybe clearing all bits is intentional / wanted to get the controller to a known state, even though the comment only mentions TX/RX? If so, maybe the below would be better? But if you think just clearing TE/RE is better I'll just do it like you suggested. val = macb_readl(bp, NCR); /* Keep only MDIO port active */ val &= MACB_BIT(MPE); /* Clear statistics */ val |= MACB_BIT(CLRSTAT); macb_writel(bp, NCR, val); > Thank you, > Claudiu Beznea > >> >> /* Clear the stats registers (XXX: Update stats first?) */ >> -macb_writel(bp, NCR, MACB_BIT(CLRSTAT)); >> +macb_writel(bp, NCR, MACB_BIT(CLRSTAT) | MACB_BIT(MPE)); >> >> /* Clear all status flags */ >> macb_writel(bp, TSR, -1); >> -- Anssi Hannula / Bitwise Oy +358 503803997
Re: [PATCH] net: macb: do not disable MDIO bus when closing interface
On 9.8.2018 18:14, Andrew Lunn wrote: > Hi Anssi Hi! >> macb_reset_hw() is called in init path too, though, so maybe clearing >> all bits is intentional / wanted to get the controller to a known state, >> even though the comment only mentions TX/RX? > You need to be careful here. Once of_mdiobus_register() is called, the > MDIO should be usable. If you happen to have an Ethernet switch on the > bus, it could be probed then. The DSA driver will start using the bus. > Or if you have a second PHY, connected to some other MAC, it could be > used by the other MAC. This all happens in the macb_probe function. > > Sometime later, the interface will be up'ed. At this point macb_open() > is called, which calls macb_init_hw(), which calls > macb_reset_hw(). What you don't want happening is changes to the NCR > at this point breaking an MDIO transaction which might be going on. > > Ideally, the MPE should be enabled before of_mdiobus_register(), and > left alone until mdiobus_unregister() is called in macb_remove(). Yep, fixing the use case of having PHYs of other MACs is why I wrote the patch :) Currently the reset code disables MPE while other MACs are using PHYs on the bus. -- Anssi Hannula / Bitwise Oy +358 503803997
[PATCH] net: macb: do not disable MDIO bus at open/close time
macb_reset_hw() is called from macb_close() and indirectly from macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE (Management Port Enable) bit. This will prevent accessing any other PHYs for other Ethernet MACs on the MDIO bus, which remains registered at macb_reset_hw() time, until macb_init_hw() is called from macb_open() which sets the MPE bit again. I.e. currently the MDIO bus has a short disruption at open time and is disabled at close time until the interface is opened again. Fix that by only touching the RE and TE bits when enabling and disabling RX/TX. Fixes: 6c36a7074436 ("macb: Use generic PHY layer") Signed-off-by: Anssi Hannula --- Claudiu Beznea wrote: > On 10.08.2018 09:22, Anssi Hannula wrote: >> >> macb_reset_hw() is called in init path too, > > I only see it in macb_close() and macb_open() called from macb_init_hw(). Yeah, macb_init_hw() is what I meant :) drivers/net/ethernet/cadence/macb_main.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index dc09f9a8a49b..6501e9b3785a 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp) { struct macb_queue *queue; unsigned int q; + u32 ctrl = macb_readl(bp, NCR); /* Disable RX and TX (XXX: Should we halt the transmission * more gracefully?) */ - macb_writel(bp, NCR, 0); + ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE)); /* Clear the stats registers (XXX: Update stats first?) */ - macb_writel(bp, NCR, MACB_BIT(CLRSTAT)); + ctrl |= MACB_BIT(CLRSTAT); + + macb_writel(bp, NCR, ctrl); /* Clear all status flags */ macb_writel(bp, TSR, -1); @@ -2170,6 +2173,7 @@ static void macb_init_hw(struct macb *bp) unsigned int q; u32 config; + u32 ctrl; macb_reset_hw(bp); macb_set_hwaddr(bp); @@ -2223,7 +2227,9 @@ static void macb_init_hw(struct macb *bp) } /* Enable TX and RX */ - macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE)); + ctrl = macb_readl(bp, NCR); + ctrl |= MACB_BIT(RE) | MACB_BIT(TE); + macb_writel(bp, NCR, ctrl); } /* The hash address register is 64 bits long and takes up two -- 2.17.1
[PATCH v2] net: macb: do not disable MDIO bus at open/close time
macb_reset_hw() is called from macb_close() and indirectly from macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE (Management Port Enable) bit. This will prevent accessing any other PHYs for other Ethernet MACs on the MDIO bus, which remains registered at macb_reset_hw() time, until macb_init_hw() is called from macb_open() which sets the MPE bit again. I.e. currently the MDIO bus has a short disruption at open time and is disabled at close time until the interface is opened again. Fix that by only touching the RE and TE bits when enabling and disabling RX/TX. v2: Make macb_init_hw() NCR write a single statement. Fixes: 6c36a7074436 ("macb: Use generic PHY layer") Signed-off-by: Anssi Hannula --- drivers/net/ethernet/cadence/macb_main.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index dc09f9a8a49b..225a7c8bad2d 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp) { struct macb_queue *queue; unsigned int q; + u32 ctrl = macb_readl(bp, NCR); /* Disable RX and TX (XXX: Should we halt the transmission * more gracefully?) */ - macb_writel(bp, NCR, 0); + ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE)); /* Clear the stats registers (XXX: Update stats first?) */ - macb_writel(bp, NCR, MACB_BIT(CLRSTAT)); + ctrl |= MACB_BIT(CLRSTAT); + + macb_writel(bp, NCR, ctrl); /* Clear all status flags */ macb_writel(bp, TSR, -1); @@ -2223,7 +2226,7 @@ static void macb_init_hw(struct macb *bp) } /* Enable TX and RX */ - macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE)); + macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE)); } /* The hash address register is 64 bits long and takes up two -- 2.17.1
[PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow
xilinx_emaclite looks at the received data to try to determine the Ethernet packet length but does not properly clamp it if proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer overflow and a panic via skb_panic() as the length exceeds the allocated skb size. Fix those cases. Also add an additional unconditional check with WARN_ON() at the end. Signed-off-by: Anssi Hannula Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver") --- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index e3070fd8..bd6470f 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -369,7 +369,7 @@ static int xemaclite_send_data(struct net_local *drvdata, u8 *data, * * Return: Total number of bytes received */ -static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data) +static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data, int maxlen) { void __iomem *addr; u16 length, proto_type; @@ -409,7 +409,7 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data) /* Check if received ethernet frame is a raw ethernet frame * or an IP packet or an ARP packet */ - if (proto_type > (ETH_FRAME_LEN + ETH_FCS_LEN)) { + if (proto_type > ETH_DATA_LEN) { if (proto_type == ETH_P_IP) { length = ((ntohl(__raw_readl(addr + @@ -417,6 +417,7 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data) XEL_RXBUFF_OFFSET)) >> XEL_HEADER_SHIFT) & XEL_RPLR_LENGTH_MASK); + length = min_t(u16, length, ETH_DATA_LEN); length += ETH_HLEN + ETH_FCS_LEN; } else if (proto_type == ETH_P_ARP) @@ -429,6 +430,9 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data) /* Use the length in the frame, plus the header and trailer */ length = proto_type + ETH_HLEN + ETH_FCS_LEN; + if (WARN_ON(length > maxlen)) + length = maxlen; + /* Read from the EmacLite device */ xemaclite_aligned_read((u32 __force *) (addr + XEL_RXBUFF_OFFSET), data, length); @@ -603,7 +607,7 @@ static void xemaclite_rx_handler(struct net_device *dev) skb_reserve(skb, 2); - len = xemaclite_recv_data(lp, (u8 *) skb->data); + len = xemaclite_recv_data(lp, (u8 *) skb->data, len); if (!len) { dev->stats.rx_errors++; -- 2.8.3
[PATCH 2/2] net: xilinx_emaclite: fix freezes due to unordered I/O
The xilinx_emaclite uses __raw_writel and __raw_readl for register accesses. Those functions do not imply any kind of memory barriers and they may be reordered. The driver does not seem to take that into account, though, and the driver does not satisfy the ordering requirements of the hardware. For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read() which try to set MDIO address before initiating the transaction. I'm seeing system freezes with the driver with GCC 5.4 and current Linux kernels on Zynq-7000 SoC immediately when trying to use the interface. In commit 123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO functions") the driver was switched from non-generic in_be32/out_be32 (memory barriers, big endian) to __raw_readl/__raw_writel (no memory barriers, native endian), so apparently the device follows system endianness and the driver was originally written with the assumption of memory barriers. Rather than try to hunt for each case of missing barrier, just switch the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending on endianness instead. Tested on little-endian Zynq-7000 ARM SoC FPGA. Signed-off-by: Anssi Hannula Fixes: 123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO functions") --- Not sure what would be the correct way to fix this. This way seems to work, at least, but maybe something else would be better? drivers/net/ethernet/xilinx/xilinx_emaclite.c | 116 ++ 1 file changed, 62 insertions(+), 54 deletions(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index bd6470f..69e31ce 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -100,6 +100,14 @@ /* BUFFER_ALIGN(adr) calculates the number of bytes to the next alignment. */ #define BUFFER_ALIGN(adr) ((ALIGNMENT - ((u32) adr)) % ALIGNMENT) +#ifdef __BIG_ENDIAN +#define xemaclite_readlioread32be +#define xemaclite_writel iowrite32be +#else +#define xemaclite_readlioread32 +#define xemaclite_writel iowrite32 +#endif + /** * struct net_local - Our private per device data * @ndev: instance of the network device @@ -156,15 +164,15 @@ static void xemaclite_enable_interrupts(struct net_local *drvdata) u32 reg_data; /* Enable the Tx interrupts for the first Buffer */ - reg_data = __raw_readl(drvdata->base_addr + XEL_TSR_OFFSET); - __raw_writel(reg_data | XEL_TSR_XMIT_IE_MASK, -drvdata->base_addr + XEL_TSR_OFFSET); + reg_data = xemaclite_readl(drvdata->base_addr + XEL_TSR_OFFSET); + xemaclite_writel(reg_data | XEL_TSR_XMIT_IE_MASK, +drvdata->base_addr + XEL_TSR_OFFSET); /* Enable the Rx interrupts for the first buffer */ - __raw_writel(XEL_RSR_RECV_IE_MASK, drvdata->base_addr + XEL_RSR_OFFSET); + xemaclite_writel(XEL_RSR_RECV_IE_MASK, drvdata->base_addr + XEL_RSR_OFFSET); /* Enable the Global Interrupt Enable */ - __raw_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET); + xemaclite_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET); } /** @@ -179,17 +187,17 @@ static void xemaclite_disable_interrupts(struct net_local *drvdata) u32 reg_data; /* Disable the Global Interrupt Enable */ - __raw_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET); + xemaclite_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET); /* Disable the Tx interrupts for the first buffer */ - reg_data = __raw_readl(drvdata->base_addr + XEL_TSR_OFFSET); - __raw_writel(reg_data & (~XEL_TSR_XMIT_IE_MASK), -drvdata->base_addr + XEL_TSR_OFFSET); + reg_data = xemaclite_readl(drvdata->base_addr + XEL_TSR_OFFSET); + xemaclite_writel(reg_data & (~XEL_TSR_XMIT_IE_MASK), +drvdata->base_addr + XEL_TSR_OFFSET); /* Disable the Rx interrupts for the first buffer */ - reg_data = __raw_readl(drvdata->base_addr + XEL_RSR_OFFSET); - __raw_writel(reg_data & (~XEL_RSR_RECV_IE_MASK), -drvdata->base_addr + XEL_RSR_OFFSET); + reg_data = xemaclite_readl(drvdata->base_addr + XEL_RSR_OFFSET); + xemaclite_writel(reg_data & (~XEL_RSR_RECV_IE_MASK), +drvdata->base_addr + XEL_RSR_OFFSET); } /** @@ -321,7 +329,7 @@ static int xemaclite_send_data(struct net_local *drvdata, u8 *data, byte_count = ETH_FRAME_LEN; /* Check if the expected buffer is available */ - reg_data = __raw_readl(addr + XEL_TSR_OFFSET); + reg_data = xemaclite_readl(addr + XEL_TSR_OFFSET); if ((reg_data & (XEL_TSR_XMIT_BUSY_MASK | XEL_T
Re: [PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow
On 14.2.2017 22:12, David Miller wrote: > From: Anssi Hannula > Date: Tue, 14 Feb 2017 19:11:44 +0200 > >> xilinx_emaclite looks at the received data to try to determine the >> Ethernet packet length but does not properly clamp it if >> proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer >> overflow and a panic via skb_panic() as the length exceeds the allocated >> skb size. >> >> Fix those cases. >> >> Also add an additional unconditional check with WARN_ON() at the end. >> >> Signed-off-by: Anssi Hannula >> Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver") > Why does this driver do all of this crazy stuff parsing the packet > headers? > > It should be able to just read the length provided by the device > at XEL_RPLR_OFFSET and just use that. Looks like XEL_RPLR_OFFSET == XEL_HEADER_OFFSET + XEL_RXBUFF_OFFSET and that is where the driver reads the on-wire Type/Length field. Looking through the product guide [1] I don't see the actual receive packet length provided anywhere, so I guess that is why the crazy stuff is done. [1] https://www.xilinx.com/support/documentation/ip_documentation/axi_ethernetlite/v3_0/pg135-axi-ethernetlite.pdf -- Anssi Hannula / Bitwise Oy +358503803997
[PATCH v2 3/3] net: macb: add missing barriers when reading descriptors
When reading buffer descriptors on RX or on TX completion, an RX_USED/TX_USED bit is checked first to ensure that the descriptors have been populated, i.e. the ownership has been transferred. However, there are no memory barriers to ensure that the data protected by the RX_USED/TX_USED bit is up-to-date with respect to that bit. Specifically: - TX timestamp descriptors may be loaded before ctrl is loaded for the TX_USED check, which is racy as the descriptors may be updated between the loads, causing old timestamp descriptor data to be used. - RX ctrl may be loaded before addr is loaded for the RX_USED check, which is racy as a new frame may be written between the loads, causing old ctrl descriptor data to be used. This issue exists for both macb_rx() and gem_rx() variants. Fix the races by adding DMA read memory barriers on those paths and reordering the reads in macb_rx(). I have not observed any actual problems in practice caused by these being missing, though. Tested on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") Cc: Nicolas Ferre --- drivers/net/ethernet/cadence/macb_main.c | 13 ++--- drivers/net/ethernet/cadence/macb_ptp.c | 2 ++ 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 430b7a0f5436..10a2bb44612b 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1001,11 +1001,15 @@ static int gem_rx(struct macb_queue *queue, int budget) rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; addr = macb_get_addr(bp, desc); - ctrl = desc->ctrl; if (!rxused) break; + /* Ensure ctrl is at least as up-to-date as rxused */ + dma_rmb(); + + ctrl = desc->ctrl; + queue->rx_tail++; count++; @@ -1180,11 +1184,14 @@ static int macb_rx(struct macb_queue *queue, int budget) /* Make hw descriptor updates visible to CPU */ rmb(); - ctrl = desc->ctrl; - if (!(desc->addr & MACB_BIT(RX_USED))) break; + /* Ensure ctrl is at least as up-to-date as addr */ + dma_rmb(); + + ctrl = desc->ctrl; + if (ctrl & MACB_BIT(RX_SOF)) { if (first_frag != -1) discard_partial_frame(queue, first_frag, tail); diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c index cd5296b84229..a6dc47edc4cf 100644 --- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -319,6 +319,8 @@ int gem_ptp_txstamp(struct macb_queue *queue, struct sk_buff *skb, desc_ptp = macb_ptp_desc(queue->bp, desc); tx_timestamp = &queue->tx_timestamps[head]; tx_timestamp->skb = skb; + /* ensure ts_1/ts_2 is loaded after ctrl (TX_USED check) */ + dma_rmb(); tx_timestamp->desc_ptp.ts_1 = desc_ptp->ts_1; tx_timestamp->desc_ptp.ts_2 = desc_ptp->ts_2; /* move head */ -- 2.17.2
[PATCH v2 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA
64-bit DMA addresses are split in upper and lower halves that are written in separate fields on GEM. For RX, bit 0 of the address is used as the ownership bit (RX_USED). When the RX_USED bit is unset the controller is allowed to write data to the buffer. The driver does not guarantee that the controller already sees the upper half when the RX_USED bit is cleared, possibly resulting in the controller writing an incoming frame to an address with an incorrect upper half and therefore possibly corrupting unrelated system memory. Fix that by adding the necessary DMA memory barrier between the writes. This corruption was observed on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM") Acked-by: Harini Katakam Tested-by: Claudiu Beznea Cc: Nicolas Ferre Cc: Michal Simek --- drivers/net/ethernet/cadence/macb_main.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index d8c7ca037ae3..0bc2aab7be40 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct macb_dma_desc *desc, dma_addr_ if (bp->hw_dma_cap & HW_DMA_CAP_64B) { desc_64 = macb_64b_desc(bp, desc); desc_64->addrh = upper_32_bits(addr); + /* The low bits of RX address contain the RX_USED bit, clearing +* of which allows packet RX. Make sure the high bits are also +* visible to HW at that point. +*/ + dma_wmb(); } #endif desc->addr = lower_32_bits(addr); -- 2.17.2
[PATCH v2 2/3] net: macb: fix dropped RX frames due to a race
Bit RX_USED set to 0 in the address field allows the controller to write data to the receive buffer descriptor. The driver does not ensure the ctrl field is ready (cleared) when the controller sees the RX_USED=0 written by the driver. The ctrl field might only be cleared after the controller has already updated it according to a newly received frame, causing the frame to be discarded in gem_rx() due to unexpected ctrl field contents. A message is logged when the above scenario occurs: macb ff0b.ethernet eth0: not whole frame pointed by descriptor Fix the issue by ensuring that when the controller sees RX_USED=0 the ctrl field is already cleared. This issue was observed on a ZynqMP based system. Signed-off-by: Anssi Hannula Fixes: 4df95131ea80 ("net/macb: change RX path for GEM") Tested-by: Claudiu Beznea Cc: Nicolas Ferre --- drivers/net/ethernet/cadence/macb_main.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 0bc2aab7be40..430b7a0f5436 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -935,14 +935,19 @@ static void gem_rx_refill(struct macb_queue *queue) if (entry == bp->rx_ring_size - 1) paddr |= MACB_BIT(RX_WRAP); - macb_set_addr(bp, desc, paddr); desc->ctrl = 0; + /* Setting addr clears RX_USED and allows reception, +* make sure ctrl is cleared first to avoid a race. +*/ + dma_wmb(); + macb_set_addr(bp, desc, paddr); /* properly align Ethernet header */ skb_reserve(skb, NET_IP_ALIGN); } else { - desc->addr &= ~MACB_BIT(RX_USED); desc->ctrl = 0; + dma_wmb(); + desc->addr &= ~MACB_BIT(RX_USED); } } -- 2.17.2
[PATCH v2 0/3] net: macb: DMA race condition fixes
Hi all, Here are a couple of race condition fixes for the macb driver. The first two are for issues observed at runtime on real HW. v2: - added received Tested-bys and Acked-bys to the first two patches - in patch 3/3, moved the timestamp protection barrier closer to the timestamp reads - in patch 3/3, removed unnecessary move of the addr assignment in gem_rx() to keep the patch minimal for maximum clarity - in patch 3/3, clarified commit message and comments The 3/3 is the same one I improperly sent last week as a standalone patch. Anssi Hannula (3): net: macb: fix random memory corruption on RX with 64-bit DMA net: macb: fix dropped RX frames due to a race net: macb: add missing barriers when reading descriptors drivers/net/ethernet/cadence/macb_main.c | 27 +++- drivers/net/ethernet/cadence/macb_ptp.c | 2 ++ 2 files changed, 24 insertions(+), 5 deletions(-) -- Anssi Hannula / Bitwise Oy