[PATCH 0/3] net: macb: DMA race condition fixes

2018-11-30 Thread Anssi Hannula
Hi all,

Here are a couple of race condition fixes for the macb driver. The first
two are issues observed on real HW.

Anssi Hannula (3):
  net: macb: fix random memory corruption on RX with 64-bit DMA
  net: macb: fix dropped RX frames due to a race
  net: macb: add missing barriers when reading buffers

 drivers/net/ethernet/cadence/macb_main.c | 34 +-
 1 file changed, 28 insertions(+), 6 deletions(-)

-- 
Anssi Hannula / Bitwise Oy




[PATCH 2/3] net: macb: fix dropped RX frames due to a race

2018-11-30 Thread Anssi Hannula
Bit RX_USED set to 0 in the address field allows the controller to write
data to the receive buffer descriptor.

The driver does not ensure the ctrl field is ready (cleared) when the
controller sees the RX_USED=0 written by the driver. The ctrl field might
only be cleared after the controller has already updated it according to
a newly received frame, causing the frame to be discarded in gem_rx() due
to unexpected ctrl field contents.

A message is logged when the above scenario occurs:

  macb ff0b.ethernet eth0: not whole frame pointed by descriptor

Fix the issue by ensuring that when the controller sees RX_USED=0 the
ctrl field is already cleared.

This issue was observed on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 4df95131ea80 ("net/macb: change RX path for GEM")
Cc: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb_main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 0bc2aab7be40..430b7a0f5436 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -935,14 +935,19 @@ static void gem_rx_refill(struct macb_queue *queue)
 
if (entry == bp->rx_ring_size - 1)
paddr |= MACB_BIT(RX_WRAP);
-   macb_set_addr(bp, desc, paddr);
desc->ctrl = 0;
+   /* Setting addr clears RX_USED and allows reception,
+* make sure ctrl is cleared first to avoid a race.
+*/
+   dma_wmb();
+   macb_set_addr(bp, desc, paddr);
 
/* properly align Ethernet header */
skb_reserve(skb, NET_IP_ALIGN);
} else {
-   desc->addr &= ~MACB_BIT(RX_USED);
desc->ctrl = 0;
+   dma_wmb();
+   desc->addr &= ~MACB_BIT(RX_USED);
}
}
 
-- 
2.17.2



[PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA

2018-11-30 Thread Anssi Hannula
64-bit DMA addresses are split in upper and lower halves that are
written in separate fields on GEM. For RX, bit 0 of the address is used
as the ownership bit (RX_USED). When the RX_USED bit is unset the
controller is allowed to write data to the buffer.

The driver does not guarantee that the controller already sees the upper
half when the RX_USED bit is cleared, possibly resulting in the
controller writing an incoming frame to an address with an incorrect
upper half and therefore possibly corrupting unrelated system memory.

Fix that by adding the necessary DMA memory barrier between the writes.

This corruption was observed on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM")
Cc: Nicolas Ferre 
Cc: Harini Katakam 
Cc: Michal Simek 
---
 drivers/net/ethernet/cadence/macb_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index d8c7ca037ae3..0bc2aab7be40 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct 
macb_dma_desc *desc, dma_addr_
if (bp->hw_dma_cap & HW_DMA_CAP_64B) {
desc_64 = macb_64b_desc(bp, desc);
desc_64->addrh = upper_32_bits(addr);
+   /* The low bits of RX address contain the RX_USED bit, clearing
+* of which allows packet RX. Make sure the high bits are also
+* visible to HW at that point.
+*/
+   dma_wmb();
}
 #endif
desc->addr = lower_32_bits(addr);
-- 
2.17.2



[PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-11-30 Thread Anssi Hannula
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptor has
been populated. However, there are no memory barriers to ensure that the
data protected by the RX_USED/TX_USED bit is up-to-date with respect to
that bit.

Fix that by adding DMA read memory barriers on those paths.

I did not observe any actual issues caused by these being missing,
though.

Tested on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb_main.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 430b7a0f5436..c93baa8621d5 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
 
/* First, update TX stats if needed */
if (skb) {
+   /* Ensure all of desc is at least as up-to-date
+* as ctrl (TX_USED bit)
+*/
+   dma_rmb();
+
if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
/* skb now belongs to timestamp buffer
 * and will be removed later
@@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int budget)
rmb();
 
rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
-   addr = macb_get_addr(bp, desc);
-   ctrl = desc->ctrl;
 
if (!rxused)
break;
 
+   /* Ensure other data is at least as up-to-date as rxused */
+   dma_rmb();
+
+   addr = macb_get_addr(bp, desc);
+   ctrl = desc->ctrl;
+
queue->rx_tail++;
count++;
 
@@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int budget)
/* Make hw descriptor updates visible to CPU */
rmb();
 
-   ctrl = desc->ctrl;
-
if (!(desc->addr & MACB_BIT(RX_USED)))
break;
 
+   /* Ensure other data is at least as up-to-date as addr */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
if (ctrl & MACB_BIT(RX_SOF)) {
if (first_frag != -1)
discard_partial_frame(queue, first_frag, tail);
-- 
2.17.2



[PATCH 0/2] net: phy: fixes to PHY_HALTED handling

2018-11-30 Thread Anssi Hannula
Hi all,

Here are a couple of fixes for PHY_HALTED/PHY_RESUMING handling.


On a related note, it feels a bit strange that AFAICS phydevs will only
be put to powerdown state (via PHY_HALTED) after the network interface
has been brought up and down once. If the ethernet interface is never
brought up, the phydev remains powered on (in PHY_READY).

This is because the phydev is only put to PHY_HALTED from phy_stop() and
phy_error(), and phy_stop() seems to be normally called only from
.ndo_stop().

But I didn't touch that now as I wasn't sure what is the intent there.


Anssi Hannula (2):
  net: phy: suspend phydev on PHY_HALTED even if there is no link
  net: phy: ensure autoneg is configured when resuming a phydev

 drivers/net/phy/phy.c | 13 +++--
 include/linux/phy.h   |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

-- 
Anssi Hannula / Bitwise Oy



[PATCH 1/2] net: phy: suspend phydev on PHY_HALTED even if there is no link

2018-11-30 Thread Anssi Hannula
When the phydev is put to PHY_HALTED (by e.g. phy_stop()) the state
machine suspends the phydev to save power.

However, this is wrongly inside a phydev->link check, causing the phydev
not to be suspended if there was no link at the time of stopping it.

Fix that by setting do_suspend regardless of whether there was a link or
not.

Signed-off-by: Anssi Hannula 
Fixes: be9dad1f9f26 ("net: phy: suspend phydev when going to HALTED")
Cc: Sebastian Hesselbarth 
---
 drivers/net/phy/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 376a0d8a2b61..d46858694db9 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -956,8 +956,8 @@ void phy_state_machine(struct work_struct *work)
if (phydev->link) {
phydev->link = 0;
phy_link_down(phydev, true);
-   do_suspend = true;
}
+   do_suspend = true;
break;
}
 
-- 
2.17.2



[PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev

2018-11-30 Thread Anssi Hannula
When a PHY_HALTED phydev is resumed by phy_start(), it is set to
PHY_RESUMING to wait for the link to come up.

However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before
autonegotiation was ever started by phy_state_machine(), autonegotiation
remains unconfigured, i.e. phy_config_aneg() is never called.

Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation
has never been configured.

Signed-off-by: Anssi Hannula 
---

This doesn't feel as clean is I'd like to, though. Maybe there is a
better way?


 drivers/net/phy/phy.c | 11 ++-
 include/linux/phy.h   |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d46858694db9..7a650cce7599 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev)
if (err < 0)
goto out_unlock;
 
+   phydev->autoneg_configured = 1;
+
if (phydev->state != PHY_HALTED) {
if (AUTONEG_ENABLE == phydev->autoneg) {
err = phy_check_link_status(phydev);
@@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev)
break;
}
 
-   phydev->state = PHY_RESUMING;
+   /* if autoneg/forcing was never configured, go back to PHY_UP
+* to make that happen
+*/
+   if (!phydev->autoneg_configured)
+   phydev->state = PHY_UP;
+   else
+   phydev->state = PHY_RESUMING;
+
break;
default:
break;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 8f927246acdb..b306d5ed9d09 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -389,6 +389,8 @@ struct phy_device {
unsigned loopback_enabled:1;
 
unsigned autoneg:1;
+   /* autoneg has been configured at least once */
+   unsigned autoneg_configured:1;
/* The most recently read link state */
unsigned link:1;
 
-- 
2.17.2



Re: [PATCH 2/3] net: macb: fix dropped RX frames due to a race

2018-12-03 Thread Anssi Hannula
Hi,

On 3.12.2018 6:52, Harini Katakam wrote:
> Hi Anssi,
> On Fri, Nov 30, 2018 at 11:53 PM Anssi Hannula  
> wrote:
>> Bit RX_USED set to 0 in the address field allows the controller to write
>> data to the receive buffer descriptor.
>>
>> The driver does not ensure the ctrl field is ready (cleared) when the
>> controller sees the RX_USED=0 written by the driver. The ctrl field might
>> only be cleared after the controller has already updated it according to
>> a newly received frame, causing the frame to be discarded in gem_rx() due
>> to unexpected ctrl field contents.
>>
>> A message is logged when the above scenario occurs:
>>
>>   macb ff0b.ethernet eth0: not whole frame pointed by descriptor
>>
>> Fix the issue by ensuring that when the controller sees RX_USED=0 the
>> ctrl field is already cleared.
>>
>> This issue was observed on a ZynqMP based system.
>>
> Thanks for the patch.
> Could you please describe the test in which this behavior was observed?

Sure. The testcase I used for the patches is:

- RT_FULL kernel,
- CPU-bound SCHED_FF RT priority 15 process (with
rcutree.kthread_prio=20 to avoid RCU starvation),
- Pyropus memtester running for 3GB (system has 4GB memory),
- "ping -f -l 5000 -s 100" running from a PC.

The "not whole frame pointed by descriptor" issue occurs within minutes
and the RX memory corruption within an hour. I did not try to reduce the
testcase to a minimum.

Both were also observed using real production loads (that of course do
not have CPU-bound RT tasks).

> Were you able to confirm that this was because of the ctrl field being
> cleared late? This error can also be observed under stress when RX UBR
> is observed.

I observed that the issue occurred without this patch, and didn't occur
after applying this patch (individually), but I didn't check it further
than that. If you have anything you'd like me to test, let me know.

> I understand it makes sense to clear ctrl field before setting RX used bit.
> But I'm trying to understand if a dmb is necessary in the receive data path.

A barrier is needed as otherwise writes to the ctrl field and addr
fields may be freely reordered as they are just regular writes to
memory, as far as I've understood it, potentially undoing the effect of
changing the order.

If you mean the third patch, same issue with reads - they may be freely
reordered (by e.g. the compiler) as they are just regular reads.

But I don't claim to be an expert on these so please correct me if I'm
wrong :)

-- 
Anssi Hannula / Bitwise Oy



Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev

2018-12-03 Thread Anssi Hannula
On 1.12.2018 0:16, Heiner Kallweit wrote:
> On 30.11.2018 19:45, Anssi Hannula wrote:
>> When a PHY_HALTED phydev is resumed by phy_start(), it is set to
>> PHY_RESUMING to wait for the link to come up.
>>
>> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before
>> autonegotiation was ever started by phy_state_machine(), autonegotiation
>> remains unconfigured, i.e. phy_config_aneg() is never called.
>>
> phy_stop() should only be called once the PHY was started. But it's
> right that we don't have full control over it because misbehaving
> drivers may call phy_stop() even if the PHY hasn't been started yet.

Having run phy_start() does not guarantee that the phy_state_machine()
has been run at least once, though.

I was able to reproduce the issue by calling phy_start(); phy_stop().
Then on the next phy_start() autoneg is not configured.

> I think phy_error() and phy_stop() should be extended to set the state
> to PHY_HALTED only if the PHY is in a started state, means:
> don't touch the state if it's DOWN, READY, or HALTED.
> In case of phy_error() it's more a precaution, because it's used within
> phylib only and AFAICS it can be triggered from a started state only.

This wouldn't fix the issue that occurs when phy_stop() is called right
after phy_start(), though, as PHY is in UP state then.

>
> So yes, there is a theoretical issue. But as you wrote already, I think
> there's a better way to deal with it.
>
> For checking whether PHY is in a started state you could use the helper
> which is being discussed here:
> https://marc.info/?l=linux-netdev&m=154360368104946&w=2
>
>
>> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation
>> has never been configured.
>>
>> Signed-off-by: Anssi Hannula 
>> ---
>>
>> This doesn't feel as clean is I'd like to, though. Maybe there is a
>> better way?
>>
>>
>>  drivers/net/phy/phy.c | 11 ++-
>>  include/linux/phy.h   |  2 ++
>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index d46858694db9..7a650cce7599 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev)
>>  if (err < 0)
>>  goto out_unlock;
>>  
>> +phydev->autoneg_configured = 1;
>> +
>>  if (phydev->state != PHY_HALTED) {
>>  if (AUTONEG_ENABLE == phydev->autoneg) {
>>  err = phy_check_link_status(phydev);
>> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev)
>>  break;
>>  }
>>  
>> -phydev->state = PHY_RESUMING;
>> +/* if autoneg/forcing was never configured, go back to PHY_UP
>> + * to make that happen
>> + */
>> +if (!phydev->autoneg_configured)
>> +phydev->state = PHY_UP;
>> +else
>> +phydev->state = PHY_RESUMING;
>> +
>>  break;
>>  default:
>>  break;
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 8f927246acdb..b306d5ed9d09 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -389,6 +389,8 @@ struct phy_device {
>>  unsigned loopback_enabled:1;
>>  
>>  unsigned autoneg:1;
>> +/* autoneg has been configured at least once */
>> +unsigned autoneg_configured:1;
>>  /* The most recently read link state */
>>  unsigned link:1;
>>  
>>

-- 
Anssi Hannula / Bitwise Oy



Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev

2018-12-04 Thread Anssi Hannula
On 3.12.2018 23:41, Heiner Kallweit wrote:
> On 03.12.2018 11:43, Anssi Hannula wrote:
>> On 1.12.2018 0:16, Heiner Kallweit wrote:
>>> On 30.11.2018 19:45, Anssi Hannula wrote:
>>>> When a PHY_HALTED phydev is resumed by phy_start(), it is set to
>>>> PHY_RESUMING to wait for the link to come up.
>>>>
>>>> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before
>>>> autonegotiation was ever started by phy_state_machine(), autonegotiation
>>>> remains unconfigured, i.e. phy_config_aneg() is never called.
>>>>
>>> phy_stop() should only be called once the PHY was started. But it's
>>> right that we don't have full control over it because misbehaving
>>> drivers may call phy_stop() even if the PHY hasn't been started yet.
>> Having run phy_start() does not guarantee that the phy_state_machine()
>> has been run at least once, though.
>>
>> I was able to reproduce the issue by calling phy_start(); phy_stop().
>> Then on the next phy_start() autoneg is not configured.
>>
> Right, phy_start() schedules the state machine run, so there is a small
> window where this can happen. Did you experience this in any real-life
> scenario?

No, I encountered this while tinkering with phy_start/stop() to look how
they interact with suspending phydevs as I wanted to suspend unused ones.

So I don't think there is any rush to fix this one.

>>> I think phy_error() and phy_stop() should be extended to set the state
>>> to PHY_HALTED only if the PHY is in a started state, means:
>>> don't touch the state if it's DOWN, READY, or HALTED.
>>> In case of phy_error() it's more a precaution, because it's used within
>>> phylib only and AFAICS it can be triggered from a started state only.
>> This wouldn't fix the issue that occurs when phy_stop() is called right
>> after phy_start(), though, as PHY is in UP state then.
>>
> After having removed state AN I was thinking already whether we really
> need to have separate states READY and HALTED. I think we wouldn't
> have this scenario if phy_stop() and phy_error() would set the state
> to READY. Merging the two states requires some upfront work, but I have
> some ideas in the back of my mind.
> Overall this should be cleaner than the proposed workaround.

Yeah, sounds better indeed.

>>> So yes, there is a theoretical issue. But as you wrote already, I think
>>> there's a better way to deal with it.
>>>
>>> For checking whether PHY is in a started state you could use the helper
>>> which is being discussed here:
>>> https://marc.info/?l=linux-netdev&m=154360368104946&w=2
>>>
>>>
>>>> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation
>>>> has never been configured.
>>>>
>>>> Signed-off-by: Anssi Hannula 
>>>> ---
>>>>
>>>> This doesn't feel as clean is I'd like to, though. Maybe there is a
>>>> better way?
>>>>
>>>>
>>>>  drivers/net/phy/phy.c | 11 ++-
>>>>  include/linux/phy.h   |  2 ++
>>>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>>> index d46858694db9..7a650cce7599 100644
>>>> --- a/drivers/net/phy/phy.c
>>>> +++ b/drivers/net/phy/phy.c
>>>> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev)
>>>>if (err < 0)
>>>>goto out_unlock;
>>>>  
>>>> +  phydev->autoneg_configured = 1;
>>>> +
>>>>if (phydev->state != PHY_HALTED) {
>>>>if (AUTONEG_ENABLE == phydev->autoneg) {
>>>>err = phy_check_link_status(phydev);
>>>> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev)
>>>>break;
>>>>}
>>>>  
>>>> -  phydev->state = PHY_RESUMING;
>>>> +  /* if autoneg/forcing was never configured, go back to PHY_UP
>>>> +   * to make that happen
>>>> +   */
>>>> +  if (!phydev->autoneg_configured)
>>>> +  phydev->state = PHY_UP;
>>>> +  else
>>>> +  phydev->state = PHY_RESUMING;
>>>> +
>>>>break;
>>>>default:
>>>>break;
>>>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>>>> index 8f927246acdb..b306d5ed9d09 100644
>>>> --- a/include/linux/phy.h
>>>> +++ b/include/linux/phy.h
>>>> @@ -389,6 +389,8 @@ struct phy_device {
>>>>unsigned loopback_enabled:1;
>>>>  
>>>>unsigned autoneg:1;
>>>> +  /* autoneg has been configured at least once */
>>>> +  unsigned autoneg_configured:1;
>>>>/* The most recently read link state */
>>>>unsigned link:1;
>>>>  
>>>>

-- 
Anssi Hannula / Bitwise Oy



Re: [PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA

2018-12-05 Thread Anssi Hannula
On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
> Hi Anssi,

Hi, and thanks for looking at these.

> Few comments... Otherwise I tested this series on a SAMA5D2 Xplained and
> SAMA5D4 Xplained under heavy traffic and it seems to behave OK.
>
> Thank you,
> Claudiu Beznea
>
> On 30.11.2018 20:21, Anssi Hannula wrote:
>> 64-bit DMA addresses are split in upper and lower halves that are
>> written in separate fields on GEM. For RX, bit 0 of the address is used
>> as the ownership bit (RX_USED). When the RX_USED bit is unset the
>> controller is allowed to write data to the buffer.
>>
>> The driver does not guarantee that the controller already sees the upper
>> half when the RX_USED bit is cleared, possibly resulting in the
>> controller writing an incoming frame to an address with an incorrect
>> upper half and therefore possibly corrupting unrelated system memory.
>>
>> Fix that by adding the necessary DMA memory barrier between the writes.>
>> This corruption was observed on a ZynqMP based system.
>>
>> Signed-off-by: Anssi Hannula 
>> Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM")
>> Cc: Nicolas Ferre 
>> Cc: Harini Katakam 
>> Cc: Michal Simek 
>> ---
>>  drivers/net/ethernet/cadence/macb_main.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index d8c7ca037ae3..0bc2aab7be40 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct 
>> macb_dma_desc *desc, dma_addr_
>>  if (bp->hw_dma_cap & HW_DMA_CAP_64B) {
>>  desc_64 = macb_64b_desc(bp, desc);
>>  desc_64->addrh = upper_32_bits(addr);
>> +/* The low bits of RX address contain the RX_USED bit, clearing
>> + * of which allows packet RX. Make sure the high bits are also
>> + * visible to HW at that point.
>> + */
>> +dma_wmb();
> I think a wmb() would fit better here so that on ARM to also force the
> flushing of caches not affected by dmb() by calling arm_heavy_mb().

Hmm, if we want to simply ensure ordering of the two writes (upper half,
lower half) to DMA memory, isn't dma_wmb() exactly for that purpose?
This situation seems to match the dma_wmb() example in
Documentation/memory-barriers.txt where data is updated before ownership
bit.

If dma_wmb() = dmb(oshst) is not enough on ARM for this purpose, should
not dma_wmb() be fixed then?
Or maybe I'm missing some difference why dma_wmb() is not enough here
but would be OK on some other case (e.g. in the memory-barriers.txt
example)?

> Thank you,
> Claudiu Beznea
>
>>  }
>>  #endif
>>  desc->addr = lower_32_bits(addr);
>>

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-05 Thread Anssi Hannula
On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>
> On 30.11.2018 20:21, Anssi Hannula wrote:
>> When reading buffer descriptors on RX or on TX completion, an
>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>> been populated. However, there are no memory barriers to ensure that the
>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>> that bit.
>>
>> Fix that by adding DMA read memory barriers on those paths.
>>
>> I did not observe any actual issues caused by these being missing,
>> though.
>>
>> Tested on a ZynqMP based system.
>>
>> Signed-off-by: Anssi Hannula 
>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>> Cc: Nicolas Ferre 
>> ---
>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 430b7a0f5436..c93baa8621d5 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>>  
>>  /* First, update TX stats if needed */
>>  if (skb) {
>> +/* Ensure all of desc is at least as up-to-date
>> + * as ctrl (TX_USED bit)
>> + */
>> +dma_rmb();
>> +
> Is this necessary? Wouldn't previous rmb() take care of this? At this time
> data specific to this descriptor was completed. The TX descriptors for next
> data to be send is updated under a locked spinlock.

The previous rmb() is before the TX_USED check, so my understanding is
that the following could happen in theory:

1. rmb().
2. Reads are reordered so that TX timestamp is read first - no barriers
are crossed.
3. HW writes timestamp and sets TX_USED (or they become visible).
4. Code checks TX_USED.
5. Code operates on timestamp that is actually garbage.

I'm not 100% sure that there isn't some lighter/cleaner way to do this
than dma_rmb(), though.

>>  if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
>>  /* skb now belongs to timestamp buffer
>>   * and will be removed later
>> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int 
>> budget)
>>  rmb();
>>  
>>  rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
>> -addr = macb_get_addr(bp, desc);
>> -ctrl = desc->ctrl;
>>  
>>  if (!rxused)
>>  break;
>>  
>> +/* Ensure other data is at least as up-to-date as rxused */
>> +dma_rmb();
> Same here, wouldn't previous rmb() should do this job?

The scenario I'm concerned about here (and in the last hunk) is:

1. rmb().
2. ctrl is read (i.e. ctrl read reordered before addr read).
3. HW updates to ctrl and addr become visible.
4. RX_USED check.
5. code operates on garbage ctrl.

I think it may be OK to move the earlier rmb() outside the loop so that
there is an rmb() only before and after the RX loop, as I don't at least
immediately see any hard requirement to do it on each loop pass (unlike
the added dma_rmb()). But my intent was to fix issues instead of
optimization so I didn't look too closely into that.

>> +
>> +addr = macb_get_addr(bp, desc);
>> +ctrl = desc->ctrl;
>> +
>>  queue->rx_tail++;
>>  count++;
>>  
>> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int 
>> budget)
>>  /* Make hw descriptor updates visible to CPU */
>>  rmb();
>>  
>> -    ctrl = desc->ctrl;
>> -
>>  if (!(desc->addr & MACB_BIT(RX_USED)))
>>  break;
>>  
>> +/* Ensure other data is at least as up-to-date as addr */
>> +dma_rmb();
> Ditto
>
>> +
>> +ctrl = desc->ctrl;
>> +
>>  if (ctrl & MACB_BIT(RX_SOF)) {
>>  if (first_frag != -1)
>>  discard_partial_frame(queue, first_frag, tail);
>>

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-07 Thread Anssi Hannula
On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote:
> Hi Anssi,

Hi!

> On 05.12.2018 16:00, Anssi Hannula wrote:
>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>> On 30.11.2018 20:21, Anssi Hannula wrote:
>>>> When reading buffer descriptors on RX or on TX completion, an
>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>>>> been populated. However, there are no memory barriers to ensure that the
>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>>>> that bit.
>>>>
>>>> Fix that by adding DMA read memory barriers on those paths.
>>>>
>>>> I did not observe any actual issues caused by these being missing,
>>>> though.
>>>>
>>>> Tested on a ZynqMP based system.
>>>>
>>>> Signed-off-by: Anssi Hannula 
>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>>>> Cc: Nicolas Ferre 
>>>> ---
>>>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>>>> b/drivers/net/ethernet/cadence/macb_main.c
>>>> index 430b7a0f5436..c93baa8621d5 100644
>>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue 
>>>> *queue)
>>>>  
>>>>/* First, update TX stats if needed */
>>>>if (skb) {
>>>> +  /* Ensure all of desc is at least as up-to-date
>>>> +   * as ctrl (TX_USED bit)
>>>> +   */
>>>> +  dma_rmb();
>>>> +
>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>>> data specific to this descriptor was completed. The TX descriptors for next
>>> data to be send is updated under a locked spinlock.
>> The previous rmb() is before the TX_USED check, so my understanding is
>> that the following could happen in theory:
> We are using this IP on and ARM architecture, so, with regards to rmb(), I
> understand from [1] that dsb completes when:
> "All explicit memory accesses before this instruction complete.
> All Cache, Branch predictor and TLB maintenance operations before this
> instruction complete."
>
>> 1. rmb().
> According to [1] this should end after all previous instructions (loads,
> stores) ends.
>
>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>> are crossed.
> But, as per [1], no onward instruction will be reached until all
> instruction prior to dsb ends, so, after rmb() all descriptor's members
> should be updated, right?

The descriptor that triggered the TX interrupt should be visible now, yes.
However, the controller may be writing to any other descriptors at the
same time as the loop is processing through them, as there are multiple
TX buffers.

>> 3. HW writes timestamp and sets TX_USED (or they become visible).
> I expect hardware to set TX_USED and timestamp before raising TX complete
> interrupt. If so, there should be no on-flight updates of this descriptor,
> right? Hardware raised a TX_USED bit read interrupt when it reads a
> descriptor like this and hangs TX.

For the first iteration of the loop, that is correct - there should be
no in-flight writes from controller as it already raised the interrupt.
However, the following iterations of the loop process descriptors that
may or may not have the interrupt raised yet, and therefore may still
have in-flight writes.

>> 4. Code checks TX_USED.
>> 5. Code operates on timestamp that is actually garbage.
>>
>> I'm not 100% sure that there isn't some lighter/cleaner way to do this
>> than dma_rmb(), though.
> If you still think this scenario could happen why not calling a dsb in
> gem_ptp_do_timestamp(). I feel like that is a proper place to call it.

OK, I will move it there. Unless we arrive at a conclusion that it is
unnecessary altogether, of course :)

> Moreover, there is bit 32 of desc->ctrl which tells you if a valid
> timestamp was placed in the descriptor. But, again, I expect the timestamp
> and TX_USED to be set by hardware before raising TX complete interrupt.

Yes, but since my concern is that without barriers in between,
desc->ctrl might be read after ts_1/ts_2, so th

Re: [PATCH 0/3] net: macb: DMA race condition fixes

2018-12-07 Thread Anssi Hannula
On 5.12.2018 22:35, David Miller wrote:
> From: Anssi Hannula 
> Date: Fri, 30 Nov 2018 20:21:34 +0200
>
>> Here are a couple of race condition fixes for the macb driver. The first
>> two are issues observed on real HW.
> It looks like there is still an active discussion about the memory
> barriers in patch #3 being excessive.
>
> Once that is sorted out to everyone's satisfaction, would you
> please repost this series with appropriate ACKs, reviewed-by's,
> tested-by's, etc. added?
>
> Thank you.


OK, I'll do that once everything is sorted.

Nicolas, were Claudiu's tests the ones you wanted to wait for or are we
waiting for more tests?

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-11 Thread Anssi Hannula
On 10.12.2018 12:34, claudiu.bez...@microchip.com wrote:
> On 07.12.2018 14:00, Anssi Hannula wrote:
>> On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote:
>>> Hi Anssi,
>> Hi!
>>
>>> On 05.12.2018 16:00, Anssi Hannula wrote:
>>>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>>>> On 30.11.2018 20:21, Anssi Hannula wrote:
>>>>>> When reading buffer descriptors on RX or on TX completion, an
>>>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>>>>>> been populated. However, there are no memory barriers to ensure that the
>>>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>>>>>> that bit.
>>>>>>
>>>>>> Fix that by adding DMA read memory barriers on those paths.
>>>>>>
>>>>>> I did not observe any actual issues caused by these being missing,
>>>>>> though.
>>>>>>
>>>>>> Tested on a ZynqMP based system.
>>>>>>
>>>>>> Signed-off-by: Anssi Hannula 
>>>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>>>>>> Cc: Nicolas Ferre 
>>>>>> ---
>>>>>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>>>>>> b/drivers/net/ethernet/cadence/macb_main.c
>>>>>> index 430b7a0f5436..c93baa8621d5 100644
>>>>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue 
>>>>>> *queue)
>>>>>>  
>>>>>>  /* First, update TX stats if needed */
>>>>>>  if (skb) {
>>>>>> +/* Ensure all of desc is at least as 
>>>>>> up-to-date
>>>>>> + * as ctrl (TX_USED bit)
>>>>>> + */
>>>>>> +dma_rmb();
>>>>>> +
>>>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>>>>> data specific to this descriptor was completed. The TX descriptors for 
>>>>> next
>>>>> data to be send is updated under a locked spinlock.
>>>> The previous rmb() is before the TX_USED check, so my understanding is
>>>> that the following could happen in theory:
>>> We are using this IP on and ARM architecture, so, with regards to rmb(), I
>>> understand from [1] that dsb completes when:
>>> "All explicit memory accesses before this instruction complete.
>>> All Cache, Branch predictor and TLB maintenance operations before this
>>> instruction complete."
>>>
>>>> 1. rmb().
>>> According to [1] this should end after all previous instructions (loads,
>>> stores) ends.
>>>
>>>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>>>> are crossed.
>>> But, as per [1], no onward instruction will be reached until all
>>> instruction prior to dsb ends, so, after rmb() all descriptor's members
>>> should be updated, right?
>> The descriptor that triggered the TX interrupt should be visible now, yes.
>> However, the controller may be writing to any other descriptors at the
>> same time as the loop is processing through them, as there are multiple
>> TX buffers.
> Yes, but there is the "rmb()" after "desc = macb_tx_desc(queue, tail);" at
> the beginning of loop intended for that... In the 2nd loop you are
> operating with the same descriptor which was read in the first loop.

I'm concerned about the 2nd iteration of the first for loop in
macb_tx_interrupt().
That operates on a different descriptor due to tail++, and that
different descriptor may have in-flight writes from controller as the
interrupt may or may not already be raised for it.

Or am I missing something?

>>>> 3. HW writes timestamp and sets TX_USED (or they become visible).
>>> I expect hardware to set TX_USED and timestamp before raising TX complete
>>> interrupt. If so, there should be no on-flight updates of this descrip

[PATCH 3/3 v2] net: macb: add missing barriers when reading descriptors

2018-12-12 Thread Anssi Hannula
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptors have
been populated, i.e. the ownership has been transferred. However, there
are no memory barriers to ensure that the data protected by the
RX_USED/TX_USED bit is up-to-date with respect to that bit.

Specifically:

- TX timestamp descriptors may be loaded before ctrl is loaded for the
  TX_USED check, which is racy as the descriptors may be updated between
  the loads, causing old timestamp descriptor data to be used.

- RX ctrl may be loaded before addr is loaded for the RX_USED check,
  which is racy as a new frame may be written between the loads, causing
  old ctrl descriptor data to be used.
  This issue exists for both macb_rx() and gem_rx() variants.

Fix the races by adding DMA read memory barriers on those paths and
reordering the reads in macb_rx().

I have not observed any actual problems in practice caused by these
being missing, though.

Tested on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: Nicolas Ferre 
---

The discussion is still ongoing on whether I have used the barriers
correctly here, but in the meantime, here's an updated version of the
patch.

v2:
- moved the timestamp protection barrier closer to the timestamp reads
- removed unnecessary move of the addr assignment in gem_rx() to keep
  the patch minimal for maximum clarity
- clarified commit message and comments


 drivers/net/ethernet/cadence/macb_main.c | 13 ++---
 drivers/net/ethernet/cadence/macb_ptp.c  |  2 ++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 430b7a0f5436..10a2bb44612b 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1001,11 +1001,15 @@ static int gem_rx(struct macb_queue *queue, int budget)
 
rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
addr = macb_get_addr(bp, desc);
-   ctrl = desc->ctrl;
 
if (!rxused)
break;
 
+   /* Ensure ctrl is at least as up-to-date as rxused */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
queue->rx_tail++;
count++;
 
@@ -1180,11 +1184,14 @@ static int macb_rx(struct macb_queue *queue, int budget)
/* Make hw descriptor updates visible to CPU */
rmb();
 
-   ctrl = desc->ctrl;
-
if (!(desc->addr & MACB_BIT(RX_USED)))
break;
 
+   /* Ensure ctrl is at least as up-to-date as addr */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
if (ctrl & MACB_BIT(RX_SOF)) {
if (first_frag != -1)
discard_partial_frame(queue, first_frag, tail);
diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
b/drivers/net/ethernet/cadence/macb_ptp.c
index cd5296b84229..a6dc47edc4cf 100644
--- a/drivers/net/ethernet/cadence/macb_ptp.c
+++ b/drivers/net/ethernet/cadence/macb_ptp.c
@@ -319,6 +319,8 @@ int gem_ptp_txstamp(struct macb_queue *queue, struct 
sk_buff *skb,
desc_ptp = macb_ptp_desc(queue->bp, desc);
tx_timestamp = &queue->tx_timestamps[head];
tx_timestamp->skb = skb;
+   /* ensure ts_1/ts_2 is loaded after ctrl (TX_USED check) */
+   dma_rmb();
tx_timestamp->desc_ptp.ts_1 = desc_ptp->ts_1;
tx_timestamp->desc_ptp.ts_2 = desc_ptp->ts_2;
/* move head */
-- 
2.17.2



Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers

2018-12-12 Thread Anssi Hannula
On 12.12.2018 12:58, claudiu.bez...@microchip.com wrote:
>
> On 11.12.2018 15:21, Anssi Hannula wrote:
>> On 10.12.2018 12:34, claudiu.bez...@microchip.com wrote:
>>> On 07.12.2018 14:00, Anssi Hannula wrote:
>>>> On 6.12.2018 16:14, claudiu.bez...@microchip.com wrote:
>>>>> Hi Anssi,
>>>> Hi!
>>>>
>>>>> On 05.12.2018 16:00, Anssi Hannula wrote:
>>>>>> On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote:
>>>>>>> On 30.11.2018 20:21, Anssi Hannula wrote:
>>>>>>>> When reading buffer descriptors on RX or on TX completion, an
>>>>>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>>>>>>>> been populated. However, there are no memory barriers to ensure that 
>>>>>>>> the
>>>>>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>>>>>>>> that bit.
>>>>>>>>
>>>>>>>> Fix that by adding DMA read memory barriers on those paths.
>>>>>>>>
>>>>>>>> I did not observe any actual issues caused by these being missing,
>>>>>>>> though.
>>>>>>>>
>>>>>>>> Tested on a ZynqMP based system.
>>>>>>>>
>>>>>>>> Signed-off-by: Anssi Hannula 
>>>>>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>>>>>>>> Cc: Nicolas Ferre 
>>>>>>>> ---
>>>>>>>>  drivers/net/ethernet/cadence/macb_main.c | 20 
>>>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>>>>>>>> b/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> index 430b7a0f5436..c93baa8621d5 100644
>>>>>>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue 
>>>>>>>> *queue)
>>>>>>>>  
>>>>>>>>/* First, update TX stats if needed */
>>>>>>>>if (skb) {
>>>>>>>> +  /* Ensure all of desc is at least as 
>>>>>>>> up-to-date
>>>>>>>> +   * as ctrl (TX_USED bit)
>>>>>>>> +   */
>>>>>>>> +  dma_rmb();
>>>>>>>> +
>>>>>>> Is this necessary? Wouldn't previous rmb() take care of this? At this 
>>>>>>> time
>>>>>>> data specific to this descriptor was completed. The TX descriptors for 
>>>>>>> next
>>>>>>> data to be send is updated under a locked spinlock.
>>>>>> The previous rmb() is before the TX_USED check, so my understanding is
>>>>>> that the following could happen in theory:
>>>>> We are using this IP on and ARM architecture, so, with regards to rmb(), I
>>>>> understand from [1] that dsb completes when:
>>>>> "All explicit memory accesses before this instruction complete.
>>>>> All Cache, Branch predictor and TLB maintenance operations before this
>>>>> instruction complete."
>>>>>
>>>>>> 1. rmb().
>>>>> According to [1] this should end after all previous instructions (loads,
>>>>> stores) ends.
>>>>>
>>>>>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>>>>>> are crossed.
>>>>> But, as per [1], no onward instruction will be reached until all
>>>>> instruction prior to dsb ends, so, after rmb() all descriptor's members
>>>>> should be updated, right?
>>>> The descriptor that triggered the TX interrupt should be visible now, yes.
>>>> However, the controller may be writing to any other descriptors at the
>>>> same time as the loop is processing through them, as there are multiple
>>>> TX buffers.
>>> Yes, but there is the "rmb()" after "desc = macb_tx_desc(q

Re: [PATCH v3 net-next 04/12] net: ethernet: Use phy_set_max_speed() to limit advertised speed

2018-11-22 Thread Anssi Hannula
Hi,

On 12.9.2018 2:53, Andrew Lunn wrote:
> Many Ethernet MAC drivers want to limit the PHY to only advertise a
> maximum speed of 100Mbs or 1Gbps. Rather than using a mask, make use
> of the helper function phy_set_max_speed().

But what if the PHY does not support 1Gbps in the first place?

This now adds 1Gbps to ->supported and ->advertising even on PHYs that
do not support 1Gbps or have the max speed limited to e.g. 100M via the
"max-speed" device tree property, breaking things.

Unless I'm missing something on how this is supposed to work now :)

> Signed-off-by: Andrew Lunn 
> Reviewed-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/8390/ax88796.c   |  4 +---
>  drivers/net/ethernet/aeroflex/greth.c |  4 ++--
>  drivers/net/ethernet/agere/et131x.c   | 12 ++--
>  drivers/net/ethernet/allwinner/sun4i-emac.c   |  3 +--
>  drivers/net/ethernet/altera/altera_tse_main.c |  5 +
>  drivers/net/ethernet/amd/au1000_eth.c | 12 +---
>  drivers/net/ethernet/broadcom/bcm63xx_enet.c  | 10 ++
>  drivers/net/ethernet/broadcom/genet/bcmmii.c  |  2 +-
>  drivers/net/ethernet/broadcom/sb1250-mac.c| 11 ++-
>  drivers/net/ethernet/broadcom/tg3.c   |  8 
>  drivers/net/ethernet/cadence/macb_main.c  |  4 ++--
>  drivers/net/ethernet/cortina/gemini.c |  2 +-
>  drivers/net/ethernet/dnet.c   |  4 ++--
>  drivers/net/ethernet/ethoc.c  |  5 +
>  drivers/net/ethernet/freescale/fec_main.c |  4 ++--
>  drivers/net/ethernet/freescale/ucc_geth.c |  7 +--
>  drivers/net/ethernet/lantiq_etop.c| 11 ++-
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c   |  4 ++--
>  drivers/net/ethernet/nxp/lpc_eth.c|  3 +--
>  drivers/net/ethernet/rdc/r6040.c  | 12 ++--
>  drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c   |  4 ++--
>  drivers/net/ethernet/smsc/smsc911x.c  |  5 +++--
>  drivers/net/ethernet/smsc/smsc9420.c  |  5 +++--
>  drivers/net/ethernet/socionext/sni_ave.c  |  6 ++
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  3 +--
>  drivers/net/ethernet/toshiba/tc35815.c|  2 +-
>  drivers/net/ethernet/xilinx/xilinx_emaclite.c |  3 +--
>  drivers/staging/mt7621-eth/mdio.c |  2 +-
>  28 files changed, 47 insertions(+), 110 deletions(-)
[...]
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index 16e4ef7d7185..bd4095c3a031 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -544,9 +544,9 @@ static int macb_mii_probe(struct net_device *dev)
>  
>   /* mask with MAC supported features */
>   if (macb_is_gem(bp) && bp->caps & MACB_CAPS_GIGABIT_MODE_AVAILABLE)
> - phydev->supported &= PHY_GBIT_FEATURES;
> + phy_set_max_speed(phydev, SPEED_1000);
>   else
> - phydev->supported &= PHY_BASIC_FEATURES;
> + phy_set_max_speed(phydev, SPEED_100);
>  
>   if (bp->caps & MACB_CAPS_NO_GIGABIT_HALF)
>   phydev->supported &= ~SUPPORTED_1000baseT_Half;

[...]

-- 
Anssi Hannula / Bitwise Oy



[PATCH] net: macb: do not disable MDIO bus when closing interface

2018-08-08 Thread Anssi Hannula
macb_close() calls macb_reset_hw() which zeroes NCR register, including
the MPE (Management Port Enable) bit.

This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus which is still registered.

Fix that by keeping the MPE bit set.

Signed-off-by: Anssi Hannula 
---
 drivers/net/ethernet/cadence/macb_main.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index dc09f9a8a49b..3ca98fc32144 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2030,12 +2030,13 @@ static void macb_reset_hw(struct macb *bp)
unsigned int q;
 
/* Disable RX and TX (XXX: Should we halt the transmission
-* more gracefully?)
+* more gracefully?) but keep management port open since there
+* may be other users of the mdio bus
 */
-   macb_writel(bp, NCR, 0);
+   macb_writel(bp, NCR, MACB_BIT(MPE));
 
/* Clear the stats registers (XXX: Update stats first?) */
-   macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
+   macb_writel(bp, NCR, MACB_BIT(CLRSTAT) | MACB_BIT(MPE));
 
/* Clear all status flags */
macb_writel(bp, TSR, -1);
-- 
2.16.3



Re: [PATCH] net: macb: do not disable MDIO bus when closing interface

2018-08-09 Thread Anssi Hannula
On 9.8.2018 11:26, Claudiu Beznea wrote:
> On 08.08.2018 15:19, Anssi Hannula wrote:
>> macb_close() calls macb_reset_hw() which zeroes NCR register, including
>> the MPE (Management Port Enable) bit.
>>
>> This will prevent accessing any other PHYs for other Ethernet MACs on
>> the MDIO bus which is still registered.
>>
>> Fix that by keeping the MPE bit set.
>>
>> Signed-off-by: Anssi Hannula 
>> ---
>>  drivers/net/ethernet/cadence/macb_main.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index dc09f9a8a49b..3ca98fc32144 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -2030,12 +2030,13 @@ static void macb_reset_hw(struct macb *bp)
>>  unsigned int q;
>>  
>>  /* Disable RX and TX (XXX: Should we halt the transmission
>> - * more gracefully?)
>> + * more gracefully?) but keep management port open since there
>> + * may be other users of the mdio bus
>>   */
>> -macb_writel(bp, NCR, 0);
>> +macb_writel(bp, NCR, MACB_BIT(MPE));
> Would be better to read the NCR and clear only RX and TX bits, something like:
>   val = macb_readl(bp, NCR);
>
>   /* Disable TX and RX. */
>   val &= ~(MACB_BIT(TE) | MACB_BIT(RE));
>
>   /* Clear statistics */
>   val |= MACB_BIT(CLRSTAT);
>
>   macb_writel(bp, NCR, val);
>
> MPE should have been enabled by previous operations.

macb_reset_hw() is called in init path too, though, so maybe clearing
all bits is intentional / wanted to get the controller to a known state,
even though the comment only mentions TX/RX?

If so, maybe the below would be better? But if you think just clearing
TE/RE is better I'll just do it like you suggested.

    val = macb_readl(bp, NCR);

    /* Keep only MDIO port active */
    val &= MACB_BIT(MPE);

    /* Clear statistics */
    val |= MACB_BIT(CLRSTAT);

    macb_writel(bp, NCR, val);



> Thank you,
> Claudiu Beznea
>
>>  
>>  /* Clear the stats registers (XXX: Update stats first?) */
>> -macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
>> +macb_writel(bp, NCR, MACB_BIT(CLRSTAT) | MACB_BIT(MPE));
>>  
>>  /* Clear all status flags */
>>  macb_writel(bp, TSR, -1);
>>

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



Re: [PATCH] net: macb: do not disable MDIO bus when closing interface

2018-08-09 Thread Anssi Hannula
On 9.8.2018 18:14, Andrew Lunn wrote:
> Hi Anssi

Hi!

>> macb_reset_hw() is called in init path too, though, so maybe clearing
>> all bits is intentional / wanted to get the controller to a known state,
>> even though the comment only mentions TX/RX?
> You need to be careful here. Once of_mdiobus_register() is called, the
> MDIO should be usable. If you happen to have an Ethernet switch on the
> bus, it could be probed then. The DSA driver will start using the bus.
> Or if you have a second PHY, connected to some other MAC, it could be
> used by the other MAC.  This all happens in the macb_probe function.
>
> Sometime later, the interface will be up'ed. At this point macb_open()
> is called, which calls macb_init_hw(), which calls
> macb_reset_hw(). What you don't want happening is changes to the NCR
> at this point breaking an MDIO transaction which might be going on.
>
> Ideally, the MPE should be enabled before of_mdiobus_register(), and
> left alone until mdiobus_unregister() is called in macb_remove().

Yep, fixing the use case of having PHYs of other MACs is why I wrote the
patch :)

Currently the reset code disables MPE while other MACs are using PHYs on
the bus.

-- 
Anssi Hannula / Bitwise Oy
+358 503803997



[PATCH] net: macb: do not disable MDIO bus at open/close time

2018-08-20 Thread Anssi Hannula
macb_reset_hw() is called from macb_close() and indirectly from
macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
(Management Port Enable) bit.

This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus, which remains registered at macb_reset_hw() time, until
macb_init_hw() is called from macb_open() which sets the MPE bit again.

I.e. currently the MDIO bus has a short disruption at open time and is
disabled at close time until the interface is opened again.

Fix that by only touching the RE and TE bits when enabling and disabling
RX/TX.

Fixes: 6c36a7074436 ("macb: Use generic PHY layer")
Signed-off-by: Anssi Hannula 
---

Claudiu Beznea wrote:
> On 10.08.2018 09:22, Anssi Hannula wrote:
>>
>> macb_reset_hw() is called in init path too,
>
> I only see it in macb_close() and macb_open() called from macb_init_hw().

Yeah, macb_init_hw() is what I meant :)


 drivers/net/ethernet/cadence/macb_main.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index dc09f9a8a49b..6501e9b3785a 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp)
 {
struct macb_queue *queue;
unsigned int q;
+   u32 ctrl = macb_readl(bp, NCR);
 
/* Disable RX and TX (XXX: Should we halt the transmission
 * more gracefully?)
 */
-   macb_writel(bp, NCR, 0);
+   ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
 
/* Clear the stats registers (XXX: Update stats first?) */
-   macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
+   ctrl |= MACB_BIT(CLRSTAT);
+
+   macb_writel(bp, NCR, ctrl);
 
/* Clear all status flags */
macb_writel(bp, TSR, -1);
@@ -2170,6 +2173,7 @@ static void macb_init_hw(struct macb *bp)
unsigned int q;
 
u32 config;
+   u32 ctrl;
 
macb_reset_hw(bp);
macb_set_hwaddr(bp);
@@ -2223,7 +2227,9 @@ static void macb_init_hw(struct macb *bp)
}
 
/* Enable TX and RX */
-   macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE));
+   ctrl = macb_readl(bp, NCR);
+   ctrl |= MACB_BIT(RE) | MACB_BIT(TE);
+   macb_writel(bp, NCR, ctrl);
 }
 
 /* The hash address register is 64 bits long and takes up two
-- 
2.17.1



[PATCH v2] net: macb: do not disable MDIO bus at open/close time

2018-08-23 Thread Anssi Hannula
macb_reset_hw() is called from macb_close() and indirectly from
macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
(Management Port Enable) bit.

This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus, which remains registered at macb_reset_hw() time, until
macb_init_hw() is called from macb_open() which sets the MPE bit again.

I.e. currently the MDIO bus has a short disruption at open time and is
disabled at close time until the interface is opened again.

Fix that by only touching the RE and TE bits when enabling and disabling
RX/TX.

v2: Make macb_init_hw() NCR write a single statement.

Fixes: 6c36a7074436 ("macb: Use generic PHY layer")
Signed-off-by: Anssi Hannula 
---
 drivers/net/ethernet/cadence/macb_main.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index dc09f9a8a49b..225a7c8bad2d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp)
 {
struct macb_queue *queue;
unsigned int q;
+   u32 ctrl = macb_readl(bp, NCR);
 
/* Disable RX and TX (XXX: Should we halt the transmission
 * more gracefully?)
 */
-   macb_writel(bp, NCR, 0);
+   ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
 
/* Clear the stats registers (XXX: Update stats first?) */
-   macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
+   ctrl |= MACB_BIT(CLRSTAT);
+
+   macb_writel(bp, NCR, ctrl);
 
/* Clear all status flags */
macb_writel(bp, TSR, -1);
@@ -2223,7 +2226,7 @@ static void macb_init_hw(struct macb *bp)
}
 
/* Enable TX and RX */
-   macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE));
+   macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE));
 }
 
 /* The hash address register is 64 bits long and takes up two
-- 
2.17.1



[PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow

2017-02-14 Thread Anssi Hannula
xilinx_emaclite looks at the received data to try to determine the
Ethernet packet length but does not properly clamp it if
proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
overflow and a panic via skb_panic() as the length exceeds the allocated
skb size.

Fix those cases.

Also add an additional unconditional check with WARN_ON() at the end.

Signed-off-by: Anssi Hannula 
Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver")
---

 drivers/net/ethernet/xilinx/xilinx_emaclite.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c 
b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
index e3070fd8..bd6470f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -369,7 +369,7 @@ static int xemaclite_send_data(struct net_local *drvdata, 
u8 *data,
  *
  * Return: Total number of bytes received
  */
-static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data)
+static u16 xemaclite_recv_data(struct net_local *drvdata, u8 *data, int maxlen)
 {
void __iomem *addr;
u16 length, proto_type;
@@ -409,7 +409,7 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, 
u8 *data)
 
/* Check if received ethernet frame is a raw ethernet frame
 * or an IP packet or an ARP packet */
-   if (proto_type > (ETH_FRAME_LEN + ETH_FCS_LEN)) {
+   if (proto_type > ETH_DATA_LEN) {
 
if (proto_type == ETH_P_IP) {
length = ((ntohl(__raw_readl(addr +
@@ -417,6 +417,7 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, 
u8 *data)
XEL_RXBUFF_OFFSET)) >>
XEL_HEADER_SHIFT) &
XEL_RPLR_LENGTH_MASK);
+   length = min_t(u16, length, ETH_DATA_LEN);
length += ETH_HLEN + ETH_FCS_LEN;
 
} else if (proto_type == ETH_P_ARP)
@@ -429,6 +430,9 @@ static u16 xemaclite_recv_data(struct net_local *drvdata, 
u8 *data)
/* Use the length in the frame, plus the header and trailer */
length = proto_type + ETH_HLEN + ETH_FCS_LEN;
 
+   if (WARN_ON(length > maxlen))
+   length = maxlen;
+
/* Read from the EmacLite device */
xemaclite_aligned_read((u32 __force *) (addr + XEL_RXBUFF_OFFSET),
data, length);
@@ -603,7 +607,7 @@ static void xemaclite_rx_handler(struct net_device *dev)
 
skb_reserve(skb, 2);
 
-   len = xemaclite_recv_data(lp, (u8 *) skb->data);
+   len = xemaclite_recv_data(lp, (u8 *) skb->data, len);
 
if (!len) {
dev->stats.rx_errors++;
-- 
2.8.3



[PATCH 2/2] net: xilinx_emaclite: fix freezes due to unordered I/O

2017-02-14 Thread Anssi Hannula
The xilinx_emaclite uses __raw_writel and __raw_readl for register
accesses. Those functions do not imply any kind of memory barriers and
they may be reordered.

The driver does not seem to take that into account, though, and the
driver does not satisfy the ordering requirements of the hardware.
For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
which try to set MDIO address before initiating the transaction.

I'm seeing system freezes with the driver with GCC 5.4 and current
Linux kernels on Zynq-7000 SoC immediately when trying to use the
interface.

In commit 123c1407af87 ("net: emaclite: Do not use microblaze and ppc
IO functions") the driver was switched from non-generic
in_be32/out_be32 (memory barriers, big endian) to
__raw_readl/__raw_writel (no memory barriers, native endian), so
apparently the device follows system endianness and the driver was
originally written with the assumption of memory barriers.

Rather than try to hunt for each case of missing barrier, just switch
the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
on endianness instead.

Tested on little-endian Zynq-7000 ARM SoC FPGA.

Signed-off-by: Anssi Hannula 
Fixes: 123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO
functions")

---

Not sure what would be the correct way to fix this. This way seems to
work, at least, but maybe something else would be better?

 drivers/net/ethernet/xilinx/xilinx_emaclite.c | 116 ++
 1 file changed, 62 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c 
b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
index bd6470f..69e31ce 100644
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -100,6 +100,14 @@
 /* BUFFER_ALIGN(adr) calculates the number of bytes to the next alignment. */
 #define BUFFER_ALIGN(adr) ((ALIGNMENT - ((u32) adr)) % ALIGNMENT)
 
+#ifdef __BIG_ENDIAN
+#define xemaclite_readlioread32be
+#define xemaclite_writel   iowrite32be
+#else
+#define xemaclite_readlioread32
+#define xemaclite_writel   iowrite32
+#endif
+
 /**
  * struct net_local - Our private per device data
  * @ndev:  instance of the network device
@@ -156,15 +164,15 @@ static void xemaclite_enable_interrupts(struct net_local 
*drvdata)
u32 reg_data;
 
/* Enable the Tx interrupts for the first Buffer */
-   reg_data = __raw_readl(drvdata->base_addr + XEL_TSR_OFFSET);
-   __raw_writel(reg_data | XEL_TSR_XMIT_IE_MASK,
-drvdata->base_addr + XEL_TSR_OFFSET);
+   reg_data = xemaclite_readl(drvdata->base_addr + XEL_TSR_OFFSET);
+   xemaclite_writel(reg_data | XEL_TSR_XMIT_IE_MASK,
+drvdata->base_addr + XEL_TSR_OFFSET);
 
/* Enable the Rx interrupts for the first buffer */
-   __raw_writel(XEL_RSR_RECV_IE_MASK, drvdata->base_addr + XEL_RSR_OFFSET);
+   xemaclite_writel(XEL_RSR_RECV_IE_MASK, drvdata->base_addr + 
XEL_RSR_OFFSET);
 
/* Enable the Global Interrupt Enable */
-   __raw_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET);
+   xemaclite_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + 
XEL_GIER_OFFSET);
 }
 
 /**
@@ -179,17 +187,17 @@ static void xemaclite_disable_interrupts(struct net_local 
*drvdata)
u32 reg_data;
 
/* Disable the Global Interrupt Enable */
-   __raw_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + XEL_GIER_OFFSET);
+   xemaclite_writel(XEL_GIER_GIE_MASK, drvdata->base_addr + 
XEL_GIER_OFFSET);
 
/* Disable the Tx interrupts for the first buffer */
-   reg_data = __raw_readl(drvdata->base_addr + XEL_TSR_OFFSET);
-   __raw_writel(reg_data & (~XEL_TSR_XMIT_IE_MASK),
-drvdata->base_addr + XEL_TSR_OFFSET);
+   reg_data = xemaclite_readl(drvdata->base_addr + XEL_TSR_OFFSET);
+   xemaclite_writel(reg_data & (~XEL_TSR_XMIT_IE_MASK),
+drvdata->base_addr + XEL_TSR_OFFSET);
 
/* Disable the Rx interrupts for the first buffer */
-   reg_data = __raw_readl(drvdata->base_addr + XEL_RSR_OFFSET);
-   __raw_writel(reg_data & (~XEL_RSR_RECV_IE_MASK),
-drvdata->base_addr + XEL_RSR_OFFSET);
+   reg_data = xemaclite_readl(drvdata->base_addr + XEL_RSR_OFFSET);
+   xemaclite_writel(reg_data & (~XEL_RSR_RECV_IE_MASK),
+drvdata->base_addr + XEL_RSR_OFFSET);
 }
 
 /**
@@ -321,7 +329,7 @@ static int xemaclite_send_data(struct net_local *drvdata, 
u8 *data,
byte_count = ETH_FRAME_LEN;
 
/* Check if the expected buffer is available */
-   reg_data = __raw_readl(addr + XEL_TSR_OFFSET);
+   reg_data = xemaclite_readl(addr + XEL_TSR_OFFSET);
if ((reg_data & (XEL_TSR_XMIT_BUSY_MASK |
 XEL_T

Re: [PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow

2017-02-15 Thread Anssi Hannula
On 14.2.2017 22:12, David Miller wrote:
> From: Anssi Hannula 
> Date: Tue, 14 Feb 2017 19:11:44 +0200
>
>> xilinx_emaclite looks at the received data to try to determine the
>> Ethernet packet length but does not properly clamp it if
>> proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
>> overflow and a panic via skb_panic() as the length exceeds the allocated
>> skb size.
>>
>> Fix those cases.
>>
>> Also add an additional unconditional check with WARN_ON() at the end.
>>
>> Signed-off-by: Anssi Hannula 
>> Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver")
> Why does this driver do all of this crazy stuff parsing the packet
> headers?
>
> It should be able to just read the length provided by the device
> at XEL_RPLR_OFFSET and just use that.

Looks like XEL_RPLR_OFFSET == XEL_HEADER_OFFSET + XEL_RXBUFF_OFFSET and
that is where the driver reads the on-wire Type/Length field.

Looking through the product guide [1] I don't see the actual receive
packet length provided anywhere, so I guess that is why the crazy stuff
is done.

[1]
https://www.xilinx.com/support/documentation/ip_documentation/axi_ethernetlite/v3_0/pg135-axi-ethernetlite.pdf

-- 
Anssi Hannula / Bitwise Oy
+358503803997



[PATCH v2 3/3] net: macb: add missing barriers when reading descriptors

2018-12-17 Thread Anssi Hannula
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptors have
been populated, i.e. the ownership has been transferred. However, there
are no memory barriers to ensure that the data protected by the
RX_USED/TX_USED bit is up-to-date with respect to that bit.

Specifically:

- TX timestamp descriptors may be loaded before ctrl is loaded for the
  TX_USED check, which is racy as the descriptors may be updated between
  the loads, causing old timestamp descriptor data to be used.

- RX ctrl may be loaded before addr is loaded for the RX_USED check,
  which is racy as a new frame may be written between the loads, causing
  old ctrl descriptor data to be used.
  This issue exists for both macb_rx() and gem_rx() variants.

Fix the races by adding DMA read memory barriers on those paths and
reordering the reads in macb_rx().

I have not observed any actual problems in practice caused by these
being missing, though.

Tested on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb_main.c | 13 ++---
 drivers/net/ethernet/cadence/macb_ptp.c  |  2 ++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 430b7a0f5436..10a2bb44612b 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1001,11 +1001,15 @@ static int gem_rx(struct macb_queue *queue, int budget)
 
rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false;
addr = macb_get_addr(bp, desc);
-   ctrl = desc->ctrl;
 
if (!rxused)
break;
 
+   /* Ensure ctrl is at least as up-to-date as rxused */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
queue->rx_tail++;
count++;
 
@@ -1180,11 +1184,14 @@ static int macb_rx(struct macb_queue *queue, int budget)
/* Make hw descriptor updates visible to CPU */
rmb();
 
-   ctrl = desc->ctrl;
-
if (!(desc->addr & MACB_BIT(RX_USED)))
break;
 
+   /* Ensure ctrl is at least as up-to-date as addr */
+   dma_rmb();
+
+   ctrl = desc->ctrl;
+
if (ctrl & MACB_BIT(RX_SOF)) {
if (first_frag != -1)
discard_partial_frame(queue, first_frag, tail);
diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
b/drivers/net/ethernet/cadence/macb_ptp.c
index cd5296b84229..a6dc47edc4cf 100644
--- a/drivers/net/ethernet/cadence/macb_ptp.c
+++ b/drivers/net/ethernet/cadence/macb_ptp.c
@@ -319,6 +319,8 @@ int gem_ptp_txstamp(struct macb_queue *queue, struct 
sk_buff *skb,
desc_ptp = macb_ptp_desc(queue->bp, desc);
tx_timestamp = &queue->tx_timestamps[head];
tx_timestamp->skb = skb;
+   /* ensure ts_1/ts_2 is loaded after ctrl (TX_USED check) */
+   dma_rmb();
tx_timestamp->desc_ptp.ts_1 = desc_ptp->ts_1;
tx_timestamp->desc_ptp.ts_2 = desc_ptp->ts_2;
/* move head */
-- 
2.17.2



[PATCH v2 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA

2018-12-17 Thread Anssi Hannula
64-bit DMA addresses are split in upper and lower halves that are
written in separate fields on GEM. For RX, bit 0 of the address is used
as the ownership bit (RX_USED). When the RX_USED bit is unset the
controller is allowed to write data to the buffer.

The driver does not guarantee that the controller already sees the upper
half when the RX_USED bit is cleared, possibly resulting in the
controller writing an incoming frame to an address with an incorrect
upper half and therefore possibly corrupting unrelated system memory.

Fix that by adding the necessary DMA memory barrier between the writes.

This corruption was observed on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM")
Acked-by: Harini Katakam 
Tested-by: Claudiu Beznea 
Cc: Nicolas Ferre 
Cc: Michal Simek 
---
 drivers/net/ethernet/cadence/macb_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index d8c7ca037ae3..0bc2aab7be40 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -682,6 +682,11 @@ static void macb_set_addr(struct macb *bp, struct 
macb_dma_desc *desc, dma_addr_
if (bp->hw_dma_cap & HW_DMA_CAP_64B) {
desc_64 = macb_64b_desc(bp, desc);
desc_64->addrh = upper_32_bits(addr);
+   /* The low bits of RX address contain the RX_USED bit, clearing
+* of which allows packet RX. Make sure the high bits are also
+* visible to HW at that point.
+*/
+   dma_wmb();
}
 #endif
desc->addr = lower_32_bits(addr);
-- 
2.17.2



[PATCH v2 2/3] net: macb: fix dropped RX frames due to a race

2018-12-17 Thread Anssi Hannula
Bit RX_USED set to 0 in the address field allows the controller to write
data to the receive buffer descriptor.

The driver does not ensure the ctrl field is ready (cleared) when the
controller sees the RX_USED=0 written by the driver. The ctrl field might
only be cleared after the controller has already updated it according to
a newly received frame, causing the frame to be discarded in gem_rx() due
to unexpected ctrl field contents.

A message is logged when the above scenario occurs:

  macb ff0b.ethernet eth0: not whole frame pointed by descriptor

Fix the issue by ensuring that when the controller sees RX_USED=0 the
ctrl field is already cleared.

This issue was observed on a ZynqMP based system.

Signed-off-by: Anssi Hannula 
Fixes: 4df95131ea80 ("net/macb: change RX path for GEM")
Tested-by: Claudiu Beznea 
Cc: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb_main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 0bc2aab7be40..430b7a0f5436 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -935,14 +935,19 @@ static void gem_rx_refill(struct macb_queue *queue)
 
if (entry == bp->rx_ring_size - 1)
paddr |= MACB_BIT(RX_WRAP);
-   macb_set_addr(bp, desc, paddr);
desc->ctrl = 0;
+   /* Setting addr clears RX_USED and allows reception,
+* make sure ctrl is cleared first to avoid a race.
+*/
+   dma_wmb();
+   macb_set_addr(bp, desc, paddr);
 
/* properly align Ethernet header */
skb_reserve(skb, NET_IP_ALIGN);
} else {
-   desc->addr &= ~MACB_BIT(RX_USED);
desc->ctrl = 0;
+   dma_wmb();
+   desc->addr &= ~MACB_BIT(RX_USED);
}
}
 
-- 
2.17.2



[PATCH v2 0/3] net: macb: DMA race condition fixes

2018-12-17 Thread Anssi Hannula
Hi all,

Here are a couple of race condition fixes for the macb driver. The first
two are for issues observed at runtime on real HW.

v2:
- added received Tested-bys and Acked-bys to the first two patches
- in patch 3/3, moved the timestamp protection barrier closer to the
  timestamp reads
- in patch 3/3, removed unnecessary move of the addr assignment in
  gem_rx() to keep the patch minimal for maximum clarity
- in patch 3/3, clarified commit message and comments

The 3/3 is the same one I improperly sent last week as a standalone
patch.

Anssi Hannula (3):
  net: macb: fix random memory corruption on RX with 64-bit DMA
  net: macb: fix dropped RX frames due to a race
  net: macb: add missing barriers when reading descriptors

 drivers/net/ethernet/cadence/macb_main.c | 27 +++-
 drivers/net/ethernet/cadence/macb_ptp.c  |  2 ++
 2 files changed, 24 insertions(+), 5 deletions(-)

-- 
Anssi Hannula / Bitwise Oy