Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu
net-next is closed anyway.

-- 
Ueimor


Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu
net-next is closed anyway.

-- 
Ueimor


Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu
Chunhao Lin  :
[...]
> @@ -5878,6 +5881,20 @@ static void rtl_pcie_state_l2l3_enable(struct 
> rtl8169_private *tp, bool enable)
>   RTL_W8(Config3, data);
>  }
>  
> +static void rtl_hw_internal_aspm_clkreq_enable(struct rtl8169_private *tp,
> + bool enable)
> +{
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + if (enable) {
> + RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
> + } else {
> + RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
> + }
> +}

s/enable(..., false)/disable()/

static void rtl_hw_internal_aspm_clkreq_enable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
}

static void rtl_hw_internal_aspm_clkreq_disable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
}

If you really want to factor something out, you may use helpers that set
or clear bits according to the 3-uple (tp, register, bits) but
foo_enable(..., false) is pointlessly convoluted.

-- 
Ueimor


Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu
Chunhao Lin  :
[...]
> @@ -5878,6 +5881,20 @@ static void rtl_pcie_state_l2l3_enable(struct 
> rtl8169_private *tp, bool enable)
>   RTL_W8(Config3, data);
>  }
>  
> +static void rtl_hw_internal_aspm_clkreq_enable(struct rtl8169_private *tp,
> + bool enable)
> +{
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + if (enable) {
> + RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
> + } else {
> + RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
> + }
> +}

s/enable(..., false)/disable()/

static void rtl_hw_internal_aspm_clkreq_enable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
}

static void rtl_hw_internal_aspm_clkreq_disable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
}

If you really want to factor something out, you may use helpers that set
or clear bits according to the 3-uple (tp, register, bits) but
foo_enable(..., false) is pointlessly convoluted.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu
Peter Zijlstra <pet...@infradead.org> :
> On Fri, Jan 19, 2018 at 02:11:18AM +0100, Francois Romieu wrote:
> > Peter Zijlstra <pet...@infradead.org> :
> > [...]
> > > There is only 1 variable afaict. Memory barriers need at least 2 in
> > > order to be able to do _anything_.
> > 
> > I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
> > two variables ?
> 
> There wasn't any cur_tx in the code you provided.

/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
#define TX_FRAGS_READY_FOR(tp,nr_frags) \
(TX_SLOTS_AVAIL(tp) >= (nr_frags + 1))

#define TX_SLOTS_AVAIL(tp) \
(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)

Both are also used in rtl_tx.

I don't get your point. Even a single variable is scattered through
the system.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu
Peter Zijlstra  :
> On Fri, Jan 19, 2018 at 02:11:18AM +0100, Francois Romieu wrote:
> > Peter Zijlstra  :
> > [...]
> > > There is only 1 variable afaict. Memory barriers need at least 2 in
> > > order to be able to do _anything_.
> > 
> > I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
> > two variables ?
> 
> There wasn't any cur_tx in the code you provided.

/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
#define TX_FRAGS_READY_FOR(tp,nr_frags) \
(TX_SLOTS_AVAIL(tp) >= (nr_frags + 1))

#define TX_SLOTS_AVAIL(tp) \
(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)

Both are also used in rtl_tx.

I don't get your point. Even a single variable is scattered through
the system.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu
Jia-Ju Bai <baijiaju1...@gmail.com> :
> 
> On 2018/1/19 9:11, Francois Romieu wrote:
> > Jia-Ju Bai <baijiaju1...@gmail.com> :
> > [...]
> > > The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> > >  if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> > >  netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> > >  goto err_stop_0;
> > >  }
> > > But there is no memory barrier around this code.
> > > 
> > > Is there a possible data race here?
> > This code would not even be needed if rtl8169_start_xmit was only your
> > usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
> > (see r8169_csum_workaround).
> > 
> > If the test is not a no-op in this GSO context, it's racy.
> > 
> 
> Thanks for reply.
> I didn't clearly understand your meaning...

It's fine.

> I wonder whether there is a possible data race and whether a "smp_mb" is
> needed before this code?
> By the way, do you mean that this code can be removed?

This code may be removed in a driver that properly stops itself its
tx queueing in the ndo_start_xmit handler (I would still keep it as
a bug detection helper but it's just a matter of taste). That's what
the r8169 driver used to aim at.

However, since e974604b453e87f8d864371786375d3d511fdf56, there is a piece
of code where the r8169 driver iteratively uses its own ndo_start_xmit
(without even checking its return value) in r8169_csum_workaround.
It is racy. Now, let's forget races for a few seconds: how is
r8169_csum_workaround supposed to work at all given that it does not care
if (the "unlikely(...)" test in) rtl8169_start_xmit succeeds or not ?

rtl8169_start_xmit can leave the skb as-is or map it to hardware descriptors
(whence late release in rtl_tx). net/core/dev.c::dev_hard_start_xmit cares.
r8169_csum_workaround doesn't.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu
Jia-Ju Bai  :
> 
> On 2018/1/19 9:11, Francois Romieu wrote:
> > Jia-Ju Bai  :
> > [...]
> > > The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> > >  if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> > >  netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> > >  goto err_stop_0;
> > >  }
> > > But there is no memory barrier around this code.
> > > 
> > > Is there a possible data race here?
> > This code would not even be needed if rtl8169_start_xmit was only your
> > usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
> > (see r8169_csum_workaround).
> > 
> > If the test is not a no-op in this GSO context, it's racy.
> > 
> 
> Thanks for reply.
> I didn't clearly understand your meaning...

It's fine.

> I wonder whether there is a possible data race and whether a "smp_mb" is
> needed before this code?
> By the way, do you mean that this code can be removed?

This code may be removed in a driver that properly stops itself its
tx queueing in the ndo_start_xmit handler (I would still keep it as
a bug detection helper but it's just a matter of taste). That's what
the r8169 driver used to aim at.

However, since e974604b453e87f8d864371786375d3d511fdf56, there is a piece
of code where the r8169 driver iteratively uses its own ndo_start_xmit
(without even checking its return value) in r8169_csum_workaround.
It is racy. Now, let's forget races for a few seconds: how is
r8169_csum_workaround supposed to work at all given that it does not care
if (the "unlikely(...)" test in) rtl8169_start_xmit succeeds or not ?

rtl8169_start_xmit can leave the skb as-is or map it to hardware descriptors
(whence late release in rtl_tx). net/core/dev.c::dev_hard_start_xmit cares.
r8169_csum_workaround doesn't.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu
Jia-Ju Bai  :
[...]
> The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> goto err_stop_0;
> }
> But there is no memory barrier around this code.
> 
> Is there a possible data race here?

This code would not even be needed if rtl8169_start_xmit was only your
usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
(see r8169_csum_workaround).

If the test is not a no-op in this GSO context, it's racy.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu
Jia-Ju Bai  :
[...]
> The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> goto err_stop_0;
> }
> But there is no memory barrier around this code.
> 
> Is there a possible data race here?

This code would not even be needed if rtl8169_start_xmit was only your
usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
(see r8169_csum_workaround).

If the test is not a no-op in this GSO context, it's racy.

-- 
Ueimor


Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu
Peter Zijlstra  :
[...]
> There is only 1 variable afaict. Memory barriers need at least 2 in
> order to be able to do _anything_.

I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
two variables ?

-- 
Ueimor



Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu
Peter Zijlstra  :
[...]
> There is only 1 variable afaict. Memory barriers need at least 2 in
> order to be able to do _anything_.

I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
two variables ?

-- 
Ueimor



Re: [PATCH 1/2] r8169: reinstate ALDPS for power saving

2017-11-15 Thread Francois Romieu
David Miller  :
[...]
> The amount of coverage this change is going to get is very small as
> well, meaning an even greater chance of regressions.

Yes.

> Therefore the only acceptable way to handle this is to have
> a white-list, specific chips that have been explicitly tested
> and are known to work with this feature, rather than the other
> way around.
> 
> Furthermore, you're not even checking the chip version, you're
> checking instead whether the firmware is loaded or not.  That
> doesn't seem like a safe way to guard this at all.

Actually the chip specific xyz_hw_phy_config methods call the relevant
aldps enabling helper _but_ the 8168evl dedicated xyz_hw_phy_config
doesn't. The firmware loaded check is just a distraction for the
busy reviewer.

-- 
Ueimor



Re: [PATCH 1/2] r8169: reinstate ALDPS for power saving

2017-11-15 Thread Francois Romieu
David Miller  :
[...]
> The amount of coverage this change is going to get is very small as
> well, meaning an even greater chance of regressions.

Yes.

> Therefore the only acceptable way to handle this is to have
> a white-list, specific chips that have been explicitly tested
> and are known to work with this feature, rather than the other
> way around.
> 
> Furthermore, you're not even checking the chip version, you're
> checking instead whether the firmware is loaded or not.  That
> doesn't seem like a safe way to guard this at all.

Actually the chip specific xyz_hw_phy_config methods call the relevant
aldps enabling helper _but_ the 8168evl dedicated xyz_hw_phy_config
doesn't. The firmware loaded check is just a distraction for the
busy reviewer.

-- 
Ueimor



Re: linux-next: Signed-off-by missing for commit in the net-next tree

2017-11-01 Thread Francois Romieu
Kirill Smelkov  :
[...]
> I was keeping you in To and Cc all the time but got no reply at all since my
> first posting from ~ 1 month ago.

I thought it was longer than that. Sorry for the frustrating excess delay.

As Eric already said there is no problem and I am perfectly fine with
the current attribution of this code.

Use of errno.h::ELNRNG is really unusual but it's a different topic.

-- 
Ueimor


Re: linux-next: Signed-off-by missing for commit in the net-next tree

2017-11-01 Thread Francois Romieu
Kirill Smelkov  :
[...]
> I was keeping you in To and Cc all the time but got no reply at all since my
> first posting from ~ 1 month ago.

I thought it was longer than that. Sorry for the frustrating excess delay.

As Eric already said there is no problem and I am perfectly fine with
the current attribution of this code.

Use of errno.h::ELNRNG is really unusual but it's a different topic.

-- 
Ueimor


Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup

2017-10-05 Thread Francois Romieu
Daniel Drake  :
[...]
> Also, is there a standard behaviour defined for ethernet drivers
> regarding wake-on-LAN? r8169 appears to enable wake-on-LAN by default
> if it believes the hardware is capable of it,

If so it isn't its designed behavior.

The r8169 driver does not enable specific WoL event source (unicast packet,
link, etc.). It should keep the current settings unless one of those holds:
- explicit wol config from userspace (obviously :o) )
- runtime pm requires different settings to resume. The change should
  be temporary (save before suspend, restore after resume).

The device is supposed to require both an event source + Config1.PMEnable.

A problem may happen if some event source bit is already set while
Config1.PMEnable is not. The driver has been forcing Config1.PMEnable
since 5d06a99f543e734ceb53bbc9e550537be97f0c49. One may thus experience
transition from inconsistent wol settings to enabled ones (if you want
to dig it, check beforehand if Config1.PMEnable is really read-write or
hardwired to 1).

Whatever it seems rather orthogonal with your wake over wifi style instant
resume symptom.

-- 
Ueimor


Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup

2017-10-05 Thread Francois Romieu
Daniel Drake  :
[...]
> Also, is there a standard behaviour defined for ethernet drivers
> regarding wake-on-LAN? r8169 appears to enable wake-on-LAN by default
> if it believes the hardware is capable of it,

If so it isn't its designed behavior.

The r8169 driver does not enable specific WoL event source (unicast packet,
link, etc.). It should keep the current settings unless one of those holds:
- explicit wol config from userspace (obviously :o) )
- runtime pm requires different settings to resume. The change should
  be temporary (save before suspend, restore after resume).

The device is supposed to require both an event source + Config1.PMEnable.

A problem may happen if some event source bit is already set while
Config1.PMEnable is not. The driver has been forcing Config1.PMEnable
since 5d06a99f543e734ceb53bbc9e550537be97f0c49. One may thus experience
transition from inconsistent wol settings to enabled ones (if you want
to dig it, check beforehand if Config1.PMEnable is really read-write or
hardwired to 1).

Whatever it seems rather orthogonal with your wake over wifi style instant
resume symptom.

-- 
Ueimor


Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu
David Miller  :
[...]
> Oops, this will need to be sent as a relative fixup as I've applied these
> two patches to net-next, sorry Francois.

No problem. It works perfectly this way.

-- 
Ueimor


Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu
David Miller  :
[...]
> Oops, this will need to be sent as a relative fixup as I've applied these
> two patches to net-next, sorry Francois.

No problem. It works perfectly this way.

-- 
Ueimor


Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 8480dbf..a043fb1 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
[...]
> @@ -506,8 +506,9 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>   skbuff = dpriv->rx_skbuff;
>   for (i = 0; i < RX_RING_SIZE; i++) {
>   if (*skbuff) {
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> - RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU),
> +  DMA_FROM_DEVICE);

 RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

[...]
> @@ -664,8 +665,8 @@ static inline void dscc4_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   goto refill;
>   }
>   pkt_len = TO_SIZE(le32_to_cpu(rx_fd->state2));
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> -  RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

dma_unmap_single(d, le32_to_cpu(rx_fd->data), RX_MAX(HDLC_MAX_MRU),
 DMA_FROM_DEVICE);

[...]
> @@ -782,8 +783,8 @@ static int dscc4_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   rc = -ENOMEM;
>  
> - priv->iqcfg = (__le32 *) pci_alloc_consistent(pdev,
> - IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma);
> + priv->iqcfg = (__le32 *)dma_alloc_coherent(>dev,
> + IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma, GFP_KERNEL);

- the cast can go away
- please replace >dev with a local variable

priv->iqcfg = dma_alloc_coherent(d, IRQ_RING_SIZE*sizeof(__le32),
 >iqcfg_dma, GFP_KERNEL);

Same thing for iqtx and iqrx (beware of copy + tx/rx mismatch).

Thanks.

-- 
Ueimor


Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 8480dbf..a043fb1 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
[...]
> @@ -506,8 +506,9 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>   skbuff = dpriv->rx_skbuff;
>   for (i = 0; i < RX_RING_SIZE; i++) {
>   if (*skbuff) {
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> - RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU),
> +  DMA_FROM_DEVICE);

 RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

[...]
> @@ -664,8 +665,8 @@ static inline void dscc4_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   goto refill;
>   }
>   pkt_len = TO_SIZE(le32_to_cpu(rx_fd->state2));
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> -  RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

dma_unmap_single(d, le32_to_cpu(rx_fd->data), RX_MAX(HDLC_MAX_MRU),
 DMA_FROM_DEVICE);

[...]
> @@ -782,8 +783,8 @@ static int dscc4_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   rc = -ENOMEM;
>  
> - priv->iqcfg = (__le32 *) pci_alloc_consistent(pdev,
> - IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma);
> + priv->iqcfg = (__le32 *)dma_alloc_coherent(>dev,
> + IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma, GFP_KERNEL);

- the cast can go away
- please replace >dev with a local variable

priv->iqcfg = dma_alloc_coherent(d, IRQ_RING_SIZE*sizeof(__le32),
 >iqcfg_dma, GFP_KERNEL);

Same thing for iqtx and iqrx (beware of copy + tx/rx mismatch).

Thanks.

-- 
Ueimor


Re: [PATCH net-next v2] wan: dscc4: add checks for dma mapping errors

2017-08-08 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830f..6a9ffac 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -518,23 +518,31 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>  static inline int try_get_rx_skb(struct dscc4_dev_priv *dpriv,
>struct net_device *dev)
>  {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   unsigned int dirty = dpriv->rx_dirty%RX_RING_SIZE;
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);

For the edification of the masses, you may follow a strict inverted
xmas tree style (aka longer lines first as long as it does not bug).

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);

Use a local struct pci_dev *pdev and it fits on a single line.

At some point it will probably be converted to plain dma api and use a 'd' dev.

[...]
> @@ -1887,16 +1903,22 @@ static struct sk_buff *dscc4_init_dummy_skb(struct 
> dscc4_dev_priv *dpriv)
>  
>   skb = dev_alloc_skb(DUMMY_SKB_SIZE);
>   if (skb) {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   int last = dpriv->tx_dirty%TX_RING_SIZE;
>   struct TxFD *tx_fd = dpriv->tx_fd + last;
> + dma_addr_t addr;
>  
>   skb->len = DUMMY_SKB_SIZE;
>   skb_copy_to_linear_data(skb, version,
>   strlen(version) % DUMMY_SKB_SIZE);
>   tx_fd->state = FrameEnd | TO_STATE_TX(DUMMY_SKB_SIZE);
> - tx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -  skb->data, DUMMY_SKB_SIZE,
> -  PCI_DMA_TODEVICE));
> + addr = pci_map_single(pdev, skb->data, DUMMY_SKB_SIZE,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + return NULL;
> + }
> + tx_fd->data = cpu_to_le32(addr);
>   dpriv->tx_skbuff[last] = skb;
>   }
>   return skb;

It isn't technically wrong but please don't update tx_fd before the mapping
succeeds. It will look marginally better.

Thanks.

-- 
Ueimor


Re: [PATCH net-next v2] wan: dscc4: add checks for dma mapping errors

2017-08-08 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830f..6a9ffac 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -518,23 +518,31 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>  static inline int try_get_rx_skb(struct dscc4_dev_priv *dpriv,
>struct net_device *dev)
>  {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   unsigned int dirty = dpriv->rx_dirty%RX_RING_SIZE;
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);

For the edification of the masses, you may follow a strict inverted
xmas tree style (aka longer lines first as long as it does not bug).

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);

Use a local struct pci_dev *pdev and it fits on a single line.

At some point it will probably be converted to plain dma api and use a 'd' dev.

[...]
> @@ -1887,16 +1903,22 @@ static struct sk_buff *dscc4_init_dummy_skb(struct 
> dscc4_dev_priv *dpriv)
>  
>   skb = dev_alloc_skb(DUMMY_SKB_SIZE);
>   if (skb) {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   int last = dpriv->tx_dirty%TX_RING_SIZE;
>   struct TxFD *tx_fd = dpriv->tx_fd + last;
> + dma_addr_t addr;
>  
>   skb->len = DUMMY_SKB_SIZE;
>   skb_copy_to_linear_data(skb, version,
>   strlen(version) % DUMMY_SKB_SIZE);
>   tx_fd->state = FrameEnd | TO_STATE_TX(DUMMY_SKB_SIZE);
> - tx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -  skb->data, DUMMY_SKB_SIZE,
> -  PCI_DMA_TODEVICE));
> + addr = pci_map_single(pdev, skb->data, DUMMY_SKB_SIZE,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + return NULL;
> + }
> + tx_fd->data = cpu_to_le32(addr);
>   dpriv->tx_skbuff[last] = skb;
>   }
>   return skb;

It isn't technically wrong but please don't update tx_fd before the mapping
succeeds. It will look marginally better.

Thanks.

-- 
Ueimor


Re: [PATCH] wan: dscc4: add checks for dma mapping errors

2017-08-07 Thread Francois Romieu
Alexey Khoroshilov  :
> The driver does not check if mapping dma memory succeed.
> The patch adds the checks and failure handling.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Please amend your subject line as:

Subject: [PATCH net-next v2 1/1] dscc4: add checks for dma mapping errors.

Rationale: davem is not supposed to guess the branch the patch should be
applied to.

[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830ffcae2..1a94f0a95b2c 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);
>   struct sk_buff *skb;
> - int ret = 0;
> + dma_addr_t addr;
>  
>   skb = dev_alloc_skb(len);
>   dpriv->rx_skbuff[dirty] = skb;
> - if (skb) {
> - skb->protocol = hdlc_type_trans(skb, dev);
> - rx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -   skb->data, len, PCI_DMA_FROMDEVICE));
> - } else {
> - rx_fd->data = 0;
> - ret = -1;
> - }
> - return ret;
> + if (!skb)
> + goto err_out;
> +
> + skb->protocol = hdlc_type_trans(skb, dev);
> + addr = pci_map_single(dpriv->pci_priv->pdev,
> +   skb->data, len, PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr))
> + goto err_free_skb;

Nit: please use a local 'struct pci_dev *pdev = dpriv->pci_priv->pdev;'

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(ppriv->pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + dev->stats.tx_errors++;

It should read 'dev->stats.tx_dropped++'.

-- 
Ueimor


Re: [PATCH] wan: dscc4: add checks for dma mapping errors

2017-08-07 Thread Francois Romieu
Alexey Khoroshilov  :
> The driver does not check if mapping dma memory succeed.
> The patch adds the checks and failure handling.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Please amend your subject line as:

Subject: [PATCH net-next v2 1/1] dscc4: add checks for dma mapping errors.

Rationale: davem is not supposed to guess the branch the patch should be
applied to.

[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830ffcae2..1a94f0a95b2c 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);
>   struct sk_buff *skb;
> - int ret = 0;
> + dma_addr_t addr;
>  
>   skb = dev_alloc_skb(len);
>   dpriv->rx_skbuff[dirty] = skb;
> - if (skb) {
> - skb->protocol = hdlc_type_trans(skb, dev);
> - rx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -   skb->data, len, PCI_DMA_FROMDEVICE));
> - } else {
> - rx_fd->data = 0;
> - ret = -1;
> - }
> - return ret;
> + if (!skb)
> + goto err_out;
> +
> + skb->protocol = hdlc_type_trans(skb, dev);
> + addr = pci_map_single(dpriv->pci_priv->pdev,
> +   skb->data, len, PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr))
> + goto err_free_skb;

Nit: please use a local 'struct pci_dev *pdev = dpriv->pci_priv->pdev;'

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(ppriv->pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + dev->stats.tx_errors++;

It should read 'dev->stats.tx_dropped++'.

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-25 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> module_pci_driver - is not used in other drivers in the same segments, it
> is necessary ?

/me checks... Ok, there seems to be some overenthusiastic copy'paste.

See drivers/net/ethernet/intel/ixgb/ixgb_main.c:
[...]
/**
 * ixgb_init_module - Driver Registration Routine
 *
 * ixgb_init_module is the first routine called when the driver is
 * loaded. All it does is register with the PCI subsystem.
 **/

static int __init
ixgb_init_module(void)
{
pr_info("%s - version %s\n", ixgb_driver_string, ixgb_driver_version);
pr_info("%s\n", ixgb_copyright);

return pci_register_driver(_driver);
}

module_init(ixgb_init_module);

/**
 * ixgb_exit_module - Driver Exit Cleanup Routine
 *
 * ixgb_exit_module is called just before the driver is removed
 * from memory.
 **/

static void __exit
ixgb_exit_module(void)
{
pci_unregister_driver(_driver);
}

module_exit(ixgb_exit_module);

Driver version ought to be fed through ethtool, if ever. Copyright message
mildly contributes to a better world. So the whole stuff above could be:

module_pci_driver(ixgb_driver);

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-25 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> module_pci_driver - is not used in other drivers in the same segments, it
> is necessary ?

/me checks... Ok, there seems to be some overenthusiastic copy'paste.

See drivers/net/ethernet/intel/ixgb/ixgb_main.c:
[...]
/**
 * ixgb_init_module - Driver Registration Routine
 *
 * ixgb_init_module is the first routine called when the driver is
 * loaded. All it does is register with the PCI subsystem.
 **/

static int __init
ixgb_init_module(void)
{
pr_info("%s - version %s\n", ixgb_driver_string, ixgb_driver_version);
pr_info("%s\n", ixgb_copyright);

return pci_register_driver(_driver);
}

module_init(ixgb_init_module);

/**
 * ixgb_exit_module - Driver Exit Cleanup Routine
 *
 * ixgb_exit_module is called just before the driver is removed
 * from memory.
 **/

static void __exit
ixgb_exit_module(void)
{
pci_unregister_driver(_driver);
}

module_exit(ixgb_exit_module);

Driver version ought to be fed through ethtool, if ever. Copyright message
mildly contributes to a better world. So the whole stuff above could be:

module_pci_driver(ixgb_driver);

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-24 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> hinic_remove - If insmod failed and someone calls rmmod, we will get a
> crash because the resource are already free. Therefore we test if the
> device exists, please tell me if you meant to something different

The module won't even proceed through the pci_driver remove method if
the probe method failed. See drivers/pci/bus.c::pci_bus_add_device and
track 'dev->is_added'. You don't need to believe me: try it.

I have no idea where your crash comes from but something does not
look quite right.

(use module_pci_driver() to save some boilerplate code btw)

[...]
> The priv data is in type void * because the
> caller can use any struct that it wants, like the priv data in Linux
> (netdev, irq, tasklet, work..) -

I disagree. A driver is a piece of glue between hardware and software.
It fills a kernel's contract. It is not supposed to introduce opaque
data (even if it's hard to resist).

> we can change it but if we will pass different struct
> in the future, we will have to change the prototype of the functions.

It's fine. If I do something wrong - and at some point I will - I'd
rather have it detected at compile time. Nobody wants to waste precious
hardware lab testing time because of excess void *.

> According to the other void *:
> The wq struct is used for cmdq, sq and rq. Therefore the wqe is in type
> void *. There are 4 operations get_wqe, write_wqe, read_wqe and put_wqe - 
> there
> is no option that one function will be fed with a wrong pointer because the 
> caller
> should use what it got in get_wqe function.
> 
> When something is used as multiple types, it can be used as void * or union.
> Usually, I would prefer union. But, in this case if we will use union, maybe
> there is a chance of using the wrong wqe type in the wrong work queue type.

union * will at least catch being fed a wrong type. void * won't notice.

Let's take a practical example: hinic_sq_get_sges.

void hinic_sq_get_sges(void *wqe, struct hinic_sge *sges, int nr_sges)
   ^
{
struct hinic_sq_wqe *sq_wqe = (struct hinic_sq_wqe *)wqe;


static void free_all_tx_skbs(struct hinic_txq *txq)
{
struct hinic_dev *nic_dev = netdev_priv(txq->netdev);
struct hinic_sq *sq = txq->sq;
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


static int free_tx_poll(struct napi_struct *napi, int budget)
{
[...]
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


Why is it:

void hinic_sq_get_sges(void *wqe, ...

instead of:

void hinic_sq_get_sges(struct hinic_sq_wqe *wqe, ...

Because of a future change ?

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-24 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> hinic_remove - If insmod failed and someone calls rmmod, we will get a
> crash because the resource are already free. Therefore we test if the
> device exists, please tell me if you meant to something different

The module won't even proceed through the pci_driver remove method if
the probe method failed. See drivers/pci/bus.c::pci_bus_add_device and
track 'dev->is_added'. You don't need to believe me: try it.

I have no idea where your crash comes from but something does not
look quite right.

(use module_pci_driver() to save some boilerplate code btw)

[...]
> The priv data is in type void * because the
> caller can use any struct that it wants, like the priv data in Linux
> (netdev, irq, tasklet, work..) -

I disagree. A driver is a piece of glue between hardware and software.
It fills a kernel's contract. It is not supposed to introduce opaque
data (even if it's hard to resist).

> we can change it but if we will pass different struct
> in the future, we will have to change the prototype of the functions.

It's fine. If I do something wrong - and at some point I will - I'd
rather have it detected at compile time. Nobody wants to waste precious
hardware lab testing time because of excess void *.

> According to the other void *:
> The wq struct is used for cmdq, sq and rq. Therefore the wqe is in type
> void *. There are 4 operations get_wqe, write_wqe, read_wqe and put_wqe - 
> there
> is no option that one function will be fed with a wrong pointer because the 
> caller
> should use what it got in get_wqe function.
> 
> When something is used as multiple types, it can be used as void * or union.
> Usually, I would prefer union. But, in this case if we will use union, maybe
> there is a chance of using the wrong wqe type in the wrong work queue type.

union * will at least catch being fed a wrong type. void * won't notice.

Let's take a practical example: hinic_sq_get_sges.

void hinic_sq_get_sges(void *wqe, struct hinic_sge *sges, int nr_sges)
   ^
{
struct hinic_sq_wqe *sq_wqe = (struct hinic_sq_wqe *)wqe;


static void free_all_tx_skbs(struct hinic_txq *txq)
{
struct hinic_dev *nic_dev = netdev_priv(txq->netdev);
struct hinic_sq *sq = txq->sq;
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


static int free_tx_poll(struct napi_struct *napi, int budget)
{
[...]
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


Why is it:

void hinic_sq_get_sges(void *wqe, ...

instead of:

void hinic_sq_get_sges(struct hinic_sq_wqe *wqe, ...

Because of a future change ?

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-19 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> new file mode 100644
> index 000..fbc9de4
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
[...]
> +/**
> + * hinic_init_hwdev - Initialize the NIC HW
> + * @hwdev: the NIC HW device that is returned from the initialization
> + * @pdev: the NIC pci device
> + *
> + * Return 0 - Success, negative - Failure
> + *
> + * Initialize the NIC HW device and return a pointer to it in the first arg
> + **/

Return a pointer and use ERR_PTR / IS_ERR ?

> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev)
> +{
> + struct hinic_pfhwdev *pfhwdev;
> + struct hinic_hwif *hwif;
> + int err;
> +
> + hwif = devm_kzalloc(>dev, sizeof(*hwif), GFP_KERNEL);
> + if (!hwif)
> + return -ENOMEM;
> +
> + err = hinic_init_hwif(hwif, pdev);
> + if (err) {
> + dev_err(>dev, "Failed to init HW interface\n");
> + return err;
> + }
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "Unsupported PCI Function type\n");
> + err = -EFAULT;
> + goto func_type_err;
> + }
> +
> + pfhwdev = devm_kzalloc(>dev, sizeof(*pfhwdev), GFP_KERNEL);
> + if (!pfhwdev) {
> + err = -ENOMEM;
> + goto pfhwdev_alloc_err;

Intel, Mellanox, Broadcom, Amazon and friends use "err_xyz" labels.

Please consider using the same style.

[...]
> +void hinic_free_hwdev(struct hinic_hwdev *hwdev)
> +{
> + struct hinic_hwif *hwif = hwdev->hwif;
> + struct pci_dev *pdev = hwif->pdev;
> + struct hinic_pfhwdev *pfhwdev;
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "unsupported PCI Function type\n");
> + return;
> + }

If it succeeded in hinic_init_hwdev, how could it fail here ?

If it failed in hinic_init_hwdev, hinic_free_hwdev should not even
be called.

-> remove ?

[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_main.c
> new file mode 100644
> index 000..c61c769
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
[...]
> +static void hinic_remove(struct pci_dev *pdev)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct hinic_dev *nic_dev;
> +
> + if (!netdev)
> + return;

Your driver is flawed if this test can ever succeed.

[...]
> +static int __init hinic_init(void)
> +{
> + return pci_register_driver(_driver);
> +}
> +
> +static void __exit hinic_exit(void)
> +{
> + pci_unregister_driver(_driver);
> +}

Use module_pci_driver(hinic_driver).

Remove hinic_init() and hinic_exit().

> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h 
> b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
> new file mode 100644
> index 000..1d92617
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
[...]
> +#ifndef HINIC_PCI_ID_TBL_H
> +#define HINIC_PCI_ID_TBL_H
> +
> +#ifndef PCI_VENDOR_ID_HUAWEI
> +#define PCI_VENDOR_ID_HUAWEI0x19e5
> +#endif

Useless: it duplicates include/linux/pci_ids.h

> +
> +#ifndef PCI_DEVICE_ID_HI1822_PF
> +#define PCI_DEVICE_ID_HI1822_PF 0x1822
> +#endif

Please move it to the .c file where it is actually used.


Extra:

grep -E 'void\ \*' drivers/net/ethernet/huawei/hinic/* makes me nervous.

At some point one function will be fed with a wrong pointer and the
compiler won't notice it.

For instance hinic_sq_read_wqe is only called with  There's no
reason to declare it using a 'void **' argument.

-- 
Ueimor


Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-19 Thread Francois Romieu
Aviad Krawczyk  :
[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> new file mode 100644
> index 000..fbc9de4
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
[...]
> +/**
> + * hinic_init_hwdev - Initialize the NIC HW
> + * @hwdev: the NIC HW device that is returned from the initialization
> + * @pdev: the NIC pci device
> + *
> + * Return 0 - Success, negative - Failure
> + *
> + * Initialize the NIC HW device and return a pointer to it in the first arg
> + **/

Return a pointer and use ERR_PTR / IS_ERR ?

> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev)
> +{
> + struct hinic_pfhwdev *pfhwdev;
> + struct hinic_hwif *hwif;
> + int err;
> +
> + hwif = devm_kzalloc(>dev, sizeof(*hwif), GFP_KERNEL);
> + if (!hwif)
> + return -ENOMEM;
> +
> + err = hinic_init_hwif(hwif, pdev);
> + if (err) {
> + dev_err(>dev, "Failed to init HW interface\n");
> + return err;
> + }
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "Unsupported PCI Function type\n");
> + err = -EFAULT;
> + goto func_type_err;
> + }
> +
> + pfhwdev = devm_kzalloc(>dev, sizeof(*pfhwdev), GFP_KERNEL);
> + if (!pfhwdev) {
> + err = -ENOMEM;
> + goto pfhwdev_alloc_err;

Intel, Mellanox, Broadcom, Amazon and friends use "err_xyz" labels.

Please consider using the same style.

[...]
> +void hinic_free_hwdev(struct hinic_hwdev *hwdev)
> +{
> + struct hinic_hwif *hwif = hwdev->hwif;
> + struct pci_dev *pdev = hwif->pdev;
> + struct hinic_pfhwdev *pfhwdev;
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "unsupported PCI Function type\n");
> + return;
> + }

If it succeeded in hinic_init_hwdev, how could it fail here ?

If it failed in hinic_init_hwdev, hinic_free_hwdev should not even
be called.

-> remove ?

[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_main.c
> new file mode 100644
> index 000..c61c769
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
[...]
> +static void hinic_remove(struct pci_dev *pdev)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct hinic_dev *nic_dev;
> +
> + if (!netdev)
> + return;

Your driver is flawed if this test can ever succeed.

[...]
> +static int __init hinic_init(void)
> +{
> + return pci_register_driver(_driver);
> +}
> +
> +static void __exit hinic_exit(void)
> +{
> + pci_unregister_driver(_driver);
> +}

Use module_pci_driver(hinic_driver).

Remove hinic_init() and hinic_exit().

> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h 
> b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
> new file mode 100644
> index 000..1d92617
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
[...]
> +#ifndef HINIC_PCI_ID_TBL_H
> +#define HINIC_PCI_ID_TBL_H
> +
> +#ifndef PCI_VENDOR_ID_HUAWEI
> +#define PCI_VENDOR_ID_HUAWEI0x19e5
> +#endif

Useless: it duplicates include/linux/pci_ids.h

> +
> +#ifndef PCI_DEVICE_ID_HI1822_PF
> +#define PCI_DEVICE_ID_HI1822_PF 0x1822
> +#endif

Please move it to the .c file where it is actually used.


Extra:

grep -E 'void\ \*' drivers/net/ethernet/huawei/hinic/* makes me nervous.

At some point one function will be fed with a wrong pointer and the
compiler won't notice it.

For instance hinic_sq_read_wqe is only called with  There's no
reason to declare it using a 'void **' argument.

-- 
Ueimor


Re: [PATCH net] sky2: Do not deadlock on sky2_hw_down

2017-05-25 Thread Francois Romieu
David Miller  :
> From: Joshua Emele 
> Date: Wed, 24 May 2017 15:43:18 -0700
[...]
> > The sky2_hw_down uses sky2_tx_complete to free pending frames stuck in
> > the HW queue. Because sky2_hw_down can be called from a process context,
> > the call to u64_stats_update_begin can result in deadlock.
> > 
> > Because the statistics do not require update as part of the sky2_hw_down
> > sequence, prevent the update to avoid the deadlock.
> 
> I disagree.  Taking the interface down should cause events in the
> statistics to be lost.
>
> You're going to have to find a way to fix this without eliding
> the stats increments.

NAPI processing is already disabled at this stage in the device close()
path (and sky2_netpoll() uses napi_schedule).

It's possible to add a conditionnal bh or irq disabling instruction to
silent the warning but it should not be needed at all.

-- 
Ueimor


Re: [PATCH net] sky2: Do not deadlock on sky2_hw_down

2017-05-25 Thread Francois Romieu
David Miller  :
> From: Joshua Emele 
> Date: Wed, 24 May 2017 15:43:18 -0700
[...]
> > The sky2_hw_down uses sky2_tx_complete to free pending frames stuck in
> > the HW queue. Because sky2_hw_down can be called from a process context,
> > the call to u64_stats_update_begin can result in deadlock.
> > 
> > Because the statistics do not require update as part of the sky2_hw_down
> > sequence, prevent the update to avoid the deadlock.
> 
> I disagree.  Taking the interface down should cause events in the
> statistics to be lost.
>
> You're going to have to find a way to fix this without eliding
> the stats increments.

NAPI processing is already disabled at this stage in the device close()
path (and sky2_netpoll() uses napi_schedule).

It's possible to add a conditionnal bh or irq disabling instruction to
silent the warning but it should not be needed at all.

-- 
Ueimor


Re: [PATCH v2] net: natsemi: ns83820: add checks for dma mapping error

2017-04-22 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
> b/drivers/net/ethernet/natsemi/ns83820.c
> index 729095db3e08..dfc64e1e31f9 100644
> --- a/drivers/net/ethernet/natsemi/ns83820.c
> +++ b/drivers/net/ethernet/natsemi/ns83820.c
[...]
> @@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
> sk_buff *skb,
>   netif_start_queue(ndev);
>  
>   return NETDEV_TX_OK;
> +
> +dma_error:
> + do {
> + free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
> + desc = dev->tx_descs + (free_idx * DESC_SIZE);
> + cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
> + len = cmdsts & CMDSTS_LEN_MASK;
> + buf = desc_addr_get(desc + DESC_BUFPTR);
> + if (desc == first_desc)
> + pci_unmap_single(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);
> + else
> + pci_unmap_page(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);

(use tabs + spaces to indent: code should line up right after the parenthesis)

(premature line breaks imho)

(nevermind, both can be avoided :o) )

> + desc[DESC_CMDSTS] = cpu_to_le32(0);
> + mb();
> + } while (desc != first_desc);
> +
> +dma_error_first:
> + dev_kfree_skb_any(skb);
> + ndev->stats.tx_errors++;
^ -> should be tx_dropped
> + return NETDEV_TX_OK;
>  }

You only need a single test in the error loop if you mimic the map loop.
Something like:

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095d..5e2dbc9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -1160,9 +1160,11 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
 
buf = skb_frag_dma_map(>pci_dev->dev, frag, 0,
   skb_frag_size(frag), DMA_TO_DEVICE);
+   if (dma_mapping_error(>pci_dev->dev, buf))
+   goto err_unmap_frags;
dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
(long long)buf, (long) page_to_pfn(frag->page),
frag->page_offset);
len = skb_frag_size(frag);
frag++;
nr_frags--;
@@ -1181,8 +1184,27 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
/* Check again: we may have raced with a tx done irq */
if (stopped && (dev->tx_done_idx != tx_done_idx) && start_tx_okay(dev))
netif_start_queue(ndev);
-
+out:
return NETDEV_TX_OK;
+
+err_unmap_frags:
+   while (1) {
+   buf = desc_addr_get(desc + DESC_BUFPTR);
+   if (!--nr_frags)
+   break;
+
+   pci_unmap_page(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+   free_idx = (free_idx - 1) % NR_TX_DESC;
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   len = le32_to_cpu(desc + DESC_CMDSTS) & CMDSTS_LEN_MASK;
+   }
+   pci_unmap_single(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+err_free_skb:
+   dev_kfree_skb_any(skb);
+   ndev->stats.tx_dropped++;
+   goto out;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)


Thinking more about it, the driver seems rather unsafe if a failing
start_xmit closely follows a succeeding one. The driver should imho
map frags first *then* plug the remaining hole in the descriptor ring.
Until it does, the implicit assumption about descriptor ownership that
the error unroll loop relies on may be wrong.

-- 
Ueimor


Re: [PATCH v2] net: natsemi: ns83820: add checks for dma mapping error

2017-04-22 Thread Francois Romieu
Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
> b/drivers/net/ethernet/natsemi/ns83820.c
> index 729095db3e08..dfc64e1e31f9 100644
> --- a/drivers/net/ethernet/natsemi/ns83820.c
> +++ b/drivers/net/ethernet/natsemi/ns83820.c
[...]
> @@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
> sk_buff *skb,
>   netif_start_queue(ndev);
>  
>   return NETDEV_TX_OK;
> +
> +dma_error:
> + do {
> + free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
> + desc = dev->tx_descs + (free_idx * DESC_SIZE);
> + cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
> + len = cmdsts & CMDSTS_LEN_MASK;
> + buf = desc_addr_get(desc + DESC_BUFPTR);
> + if (desc == first_desc)
> + pci_unmap_single(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);
> + else
> + pci_unmap_page(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);

(use tabs + spaces to indent: code should line up right after the parenthesis)

(premature line breaks imho)

(nevermind, both can be avoided :o) )

> + desc[DESC_CMDSTS] = cpu_to_le32(0);
> + mb();
> + } while (desc != first_desc);
> +
> +dma_error_first:
> + dev_kfree_skb_any(skb);
> + ndev->stats.tx_errors++;
^ -> should be tx_dropped
> + return NETDEV_TX_OK;
>  }

You only need a single test in the error loop if you mimic the map loop.
Something like:

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095d..5e2dbc9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -1160,9 +1160,11 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
 
buf = skb_frag_dma_map(>pci_dev->dev, frag, 0,
   skb_frag_size(frag), DMA_TO_DEVICE);
+   if (dma_mapping_error(>pci_dev->dev, buf))
+   goto err_unmap_frags;
dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
(long long)buf, (long) page_to_pfn(frag->page),
frag->page_offset);
len = skb_frag_size(frag);
frag++;
nr_frags--;
@@ -1181,8 +1184,27 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
/* Check again: we may have raced with a tx done irq */
if (stopped && (dev->tx_done_idx != tx_done_idx) && start_tx_okay(dev))
netif_start_queue(ndev);
-
+out:
return NETDEV_TX_OK;
+
+err_unmap_frags:
+   while (1) {
+   buf = desc_addr_get(desc + DESC_BUFPTR);
+   if (!--nr_frags)
+   break;
+
+   pci_unmap_page(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+   free_idx = (free_idx - 1) % NR_TX_DESC;
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   len = le32_to_cpu(desc + DESC_CMDSTS) & CMDSTS_LEN_MASK;
+   }
+   pci_unmap_single(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+err_free_skb:
+   dev_kfree_skb_any(skb);
+   ndev->stats.tx_dropped++;
+   goto out;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)


Thinking more about it, the driver seems rather unsafe if a failing
start_xmit closely follows a succeeding one. The driver should imho
map frags first *then* plug the remaining hole in the descriptor ring.
Until it does, the implicit assumption about descriptor ownership that
the error unroll loop relies on may be wrong.

-- 
Ueimor


Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

2017-04-08 Thread Francois Romieu
David Miller  :
[...]
> One theory is that the interrupt masking isn't working properly
> and interrupts are still arriving and hitting the NAPI state even
> when we are actively polling NAPI.
> 
> And this problem was masked by the locking done here.

Yes.

Ville, can you rule out irq sharing between the 8139 and some other
device ? It's a candidate for unexpected interrupt handler invocation
with older pc, even with properly working hardware.

-- 
Ueimor


Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

2017-04-08 Thread Francois Romieu
David Miller  :
[...]
> One theory is that the interrupt masking isn't working properly
> and interrupts are still arriving and hitting the NAPI state even
> when we are actively polling NAPI.
> 
> And this problem was masked by the locking done here.

Yes.

Ville, can you rule out irq sharing between the 8139 and some other
device ? It's a candidate for unexpected interrupt handler invocation
with older pc, even with properly working hardware.

-- 
Ueimor


Re: net/atm: warning in alloc_tx/__might_sleep

2017-01-10 Thread Francois Romieu
Eric Dumazet  :
> On Tue, Jan 10, 2017 at 9:35 AM, Cong Wang  wrote:
> > On Mon, Jan 9, 2017 at 9:20 AM, Andrey Konovalov  
> > wrote:
> >
> > The fix should be straight-forward. Mind to try the attached patch?
> 
> 
> You forgot to remove schedule() ?

It may be clearer to split alloc_tx in two parts: only the unsleepable
"if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {" part of it
contributes to the inner "while (!(skb = alloc_tx(vcc, eff))) {" block.

See net/atm/common.c
[...]
static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
{
struct sk_buff *skb;
struct sock *sk = sk_atm(vcc);

if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {
pr_debug("Sorry: wmem_alloc = %d, size = %d, sndbuf = %d\n",
 sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
return NULL;
}
while (!(skb = alloc_skb(size, GFP_KERNEL)))
schedule();
pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
atomic_add(skb->truesize, >sk_wmem_alloc);
return skb;
}

The waiting stuff is related to vcc drain but the code makes it look as
if it were also related to skb alloc (it isn't).

It may be obvious for you but it took me a while to figure what the
code is supposed to achieve.

-- 
Ueimor


Re: net/atm: warning in alloc_tx/__might_sleep

2017-01-10 Thread Francois Romieu
Eric Dumazet  :
> On Tue, Jan 10, 2017 at 9:35 AM, Cong Wang  wrote:
> > On Mon, Jan 9, 2017 at 9:20 AM, Andrey Konovalov  
> > wrote:
> >
> > The fix should be straight-forward. Mind to try the attached patch?
> 
> 
> You forgot to remove schedule() ?

It may be clearer to split alloc_tx in two parts: only the unsleepable
"if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {" part of it
contributes to the inner "while (!(skb = alloc_tx(vcc, eff))) {" block.

See net/atm/common.c
[...]
static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
{
struct sk_buff *skb;
struct sock *sk = sk_atm(vcc);

if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {
pr_debug("Sorry: wmem_alloc = %d, size = %d, sndbuf = %d\n",
 sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
return NULL;
}
while (!(skb = alloc_skb(size, GFP_KERNEL)))
schedule();
pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
atomic_add(skb->truesize, >sk_wmem_alloc);
return skb;
}

The waiting stuff is related to vcc drain but the code makes it look as
if it were also related to skb alloc (it isn't).

It may be obvious for you but it took me a while to figure what the
code is supposed to achieve.

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-19 Thread Francois Romieu
Pavel Machek  :
[...]
> Considering the memory barriers... is something like this neccessary
> in the via-rhine ?

Yes.

> AFAICT... we need a barrier after making sure that descriptor is no
> longer owned by DMA (to make sure we don't get stale data in rest of
> descriptor)... and we need a barrier before giving the descriptor to
> the dma, to make sure DMA engine sees the complete update?

I would not expect stale data while processing a single transmit
descriptor as the transmit completion does not use the rest of the
descriptor at all in the via-rhine driver. However I agree that transmit
descriptors should be read by the cpu with adequate ordering so the
dma_rmb() should stay.

Same kind of narrative for dma_wmb rhine_rx (s/read/written/ and
s/cpu/device/).

> diff --git a/drivers/net/ethernet/via/via-rhine.c 
> b/drivers/net/ethernet/via/via-rhine.c
> index ba5c542..3806e72 100644
> --- a/drivers/net/ethernet/via/via-rhine.c
> +++ b/drivers/net/ethernet/via/via-rhine.c
[...]
> @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit)
>  
>   if (desc_status & DescOwn)
>   break;
> + dma_rmb();
>  

I agree with your explanation for this one (late vlan processing in a 
different word from the same descriptor).

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-19 Thread Francois Romieu
Pavel Machek  :
[...]
> Considering the memory barriers... is something like this neccessary
> in the via-rhine ?

Yes.

> AFAICT... we need a barrier after making sure that descriptor is no
> longer owned by DMA (to make sure we don't get stale data in rest of
> descriptor)... and we need a barrier before giving the descriptor to
> the dma, to make sure DMA engine sees the complete update?

I would not expect stale data while processing a single transmit
descriptor as the transmit completion does not use the rest of the
descriptor at all in the via-rhine driver. However I agree that transmit
descriptors should be read by the cpu with adequate ordering so the
dma_rmb() should stay.

Same kind of narrative for dma_wmb rhine_rx (s/read/written/ and
s/cpu/device/).

> diff --git a/drivers/net/ethernet/via/via-rhine.c 
> b/drivers/net/ethernet/via/via-rhine.c
> index ba5c542..3806e72 100644
> --- a/drivers/net/ethernet/via/via-rhine.c
> +++ b/drivers/net/ethernet/via/via-rhine.c
[...]
> @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit)
>  
>   if (desc_status & DescOwn)
>   break;
> + dma_rmb();
>  

I agree with your explanation for this one (late vlan processing in a 
different word from the same descriptor).

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-17 Thread Francois Romieu
Pavel Machek  :
[...]
> Won't this up/down the interface, in a way userspace can observe?

It won't up/down the interface as it doesn't exactly mimic what the
network code does (there's more than rtnl_lock).

For the same reason it's broken if it races with the transmit path: it
can release driver resources while the transmit path uses these.

Btw the points below may not matter/hurt much for a proof a concept
but they would need to be addressed as well:
1) unchecked (and avoidable) extra error paths due to stmmac_release()
2) racy cancel_work_sync. Low probability as it is, an irq + error could
   take place right after cancel_work_sync

Lino, have you considered via-rhine.c since its "move work from irq to
workqueue context" changes that started in
7ab87ff4c770eed71e3777936299292739fcd0fe [*] ?

It's a shameless plug - originated in r8169.c - but it should be rather
close to what the sxgbe and friends require. Thought / opinion ?

[*] Including fixes/changes in:
- 3a5a883a8a663b930908cae4abe5ec913b9b2fd2
- e1efa87241272104d6a12c8b9fcdc4f62634d447
- 810f19bcb862f8889b27e0c9d9eceac9593925dd
- e45af497950a89459a0c4b13ffd91e1729fffef4
- a926592f5e4e900f3fa903298c4619a131e60963
- 559bcac35facfed49ab4f408e162971612dcfdf3

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-17 Thread Francois Romieu
Pavel Machek  :
[...]
> Won't this up/down the interface, in a way userspace can observe?

It won't up/down the interface as it doesn't exactly mimic what the
network code does (there's more than rtnl_lock).

For the same reason it's broken if it races with the transmit path: it
can release driver resources while the transmit path uses these.

Btw the points below may not matter/hurt much for a proof a concept
but they would need to be addressed as well:
1) unchecked (and avoidable) extra error paths due to stmmac_release()
2) racy cancel_work_sync. Low probability as it is, an irq + error could
   take place right after cancel_work_sync

Lino, have you considered via-rhine.c since its "move work from irq to
workqueue context" changes that started in
7ab87ff4c770eed71e3777936299292739fcd0fe [*] ?

It's a shameless plug - originated in r8169.c - but it should be rather
close to what the sxgbe and friends require. Thought / opinion ?

[*] Including fixes/changes in:
- 3a5a883a8a663b930908cae4abe5ec913b9b2fd2
- e1efa87241272104d6a12c8b9fcdc4f62634d447
- 810f19bcb862f8889b27e0c9d9eceac9593925dd
- e45af497950a89459a0c4b13ffd91e1729fffef4
- a926592f5e4e900f3fa903298c4619a131e60963
- 559bcac35facfed49ab4f408e162971612dcfdf3

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-08 Thread Francois Romieu
Lino Sanfilippo  :
[...]
> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
> this is caused by that locking scheme (in a way I have not figured out yet)
> or if it is a different issue.

stmmac_tx_err races with stmmac_xmit.

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-08 Thread Francois Romieu
Lino Sanfilippo  :
[...]
> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
> this is caused by that locking scheme (in a way I have not figured out yet)
> or if it is a different issue.

stmmac_tx_err races with stmmac_xmit.

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-07 Thread Francois Romieu
Lino Sanfilippo  :
> The driver uses a private lock for synchronization between the xmit
> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> is not set, the xmit function is also called with the xmit_lock held.
> 
> On the other hand the xmit completion handler first takes the private lock
> and (in case that the tx queue has been stopped) the xmit_lock, leading
> to a reverse locking order and the potential danger of a deadlock.

netif_tx_stop_queue is used by:
1. xmit function before releasing lock and returning.
2. sxgbe_restart_tx_queue()
   <- sxgbe_tx_interrupt
   <- sxgbe_reset_all_tx_queues()
  <- sxgbe_tx_timeout()

Given xmit won't be called again until tx queue is enabled, it's not clear
how a deadlock could happen due to #1.

Regardless of deadlocks anywhere else, #2 has some serious problem due to
the lack of exclusion between the tx queue restart handler and the xmit
handler.

-- 
Ueimor


Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-07 Thread Francois Romieu
Lino Sanfilippo  :
> The driver uses a private lock for synchronization between the xmit
> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> is not set, the xmit function is also called with the xmit_lock held.
> 
> On the other hand the xmit completion handler first takes the private lock
> and (in case that the tx queue has been stopped) the xmit_lock, leading
> to a reverse locking order and the potential danger of a deadlock.

netif_tx_stop_queue is used by:
1. xmit function before releasing lock and returning.
2. sxgbe_restart_tx_queue()
   <- sxgbe_tx_interrupt
   <- sxgbe_reset_all_tx_queues()
  <- sxgbe_tx_timeout()

Given xmit won't be called again until tx queue is enabled, it's not clear
how a deadlock could happen due to #1.

Regardless of deadlocks anywhere else, #2 has some serious problem due to
the lack of exclusion between the tx queue restart handler and the xmit
handler.

-- 
Ueimor


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Francois Romieu
Mark Lord  :
[...]
> >From tracing through the powerpc arch code, this is the buffer that
> is being directly DMA'd into.  And the USB layer does an invalidate_dcache
> on that entire buffer before initiating the DMA (confirmed via printk).
> 
> The driver itself NEVER writes anything to that buffer,
> and nobody else has a pointer to it other than the USB host controller,
> so there's nothing else that can write to it either.
> 
> According to the driver writer, the chip should only ever write a fresh
> rx_desc struct at the beginning of a buffer, never ASCII data.
> 
> So how does that buffer end up containing ASCII data from the NFS transfers?

Through aliasing the URB was given a page that contains said (previously)
received file. The ethernet chip/usb host does not write anything in it.
There could be a device or a driver problem but it may not be the real
problem.

So far the analysis focused on "how was this corrupted content written into
this receive buffer page ?". If I read David correctly (?) the "nobody
else has a pointer to it other than the USB host controller" point may be
replaced with "the pointer to it aliases some already used page".

-- 
Ueimor


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Francois Romieu
Mark Lord  :
[...]
> >From tracing through the powerpc arch code, this is the buffer that
> is being directly DMA'd into.  And the USB layer does an invalidate_dcache
> on that entire buffer before initiating the DMA (confirmed via printk).
> 
> The driver itself NEVER writes anything to that buffer,
> and nobody else has a pointer to it other than the USB host controller,
> so there's nothing else that can write to it either.
> 
> According to the driver writer, the chip should only ever write a fresh
> rx_desc struct at the beginning of a buffer, never ASCII data.
> 
> So how does that buffer end up containing ASCII data from the NFS transfers?

Through aliasing the URB was given a page that contains said (previously)
received file. The ethernet chip/usb host does not write anything in it.
There could be a device or a driver problem but it may not be the real
problem.

So far the analysis focused on "how was this corrupted content written into
this receive buffer page ?". If I read David correctly (?) the "nobody
else has a pointer to it other than the USB host controller" point may be
replaced with "the pointer to it aliases some already used page".

-- 
Ueimor


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-14 Thread Francois Romieu
Hayes Wang <hayesw...@realtek.com> :
> Francois Romieu [mailto:rom...@fr.zoreil.com]
> > Sent: Friday, November 11, 2016 8:13 PM
> [...]
> > Invalid packet size corrupted receive descriptors in Realtek's device
> > reminds of CVE-2009-4537.
> 
> Do you mean that the driver would get a packet exceed the size
> which is set to RxMaxSize ?

If it was possible to get it wrong once, it should be possible to
get it wrong twice, especially if some part of the hardware design
is recycled. I don't mean anything else.

I won't speculate about some cache consistency issue or some badly
aborted dma transaction to explain the memory corruption.

-- 
Ueimor


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-14 Thread Francois Romieu
Hayes Wang  :
> Francois Romieu [mailto:rom...@fr.zoreil.com]
> > Sent: Friday, November 11, 2016 8:13 PM
> [...]
> > Invalid packet size corrupted receive descriptors in Realtek's device
> > reminds of CVE-2009-4537.
> 
> Do you mean that the driver would get a packet exceed the size
> which is set to RxMaxSize ?

If it was possible to get it wrong once, it should be possible to
get it wrong twice, especially if some part of the hardware design
is recycled. I don't mean anything else.

I won't speculate about some cache consistency issue or some badly
aborted dma transaction to explain the memory corruption.

-- 
Ueimor


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-11 Thread Francois Romieu
Hayes Wang  :
> For some platforms, the data in memory is not the same with the one
> from the device. That is, the data of memory is unbelievable. The
> check is used to find out this situation.

Invalid packet size corrupted receive descriptors in Realtek's device
reminds of CVE-2009-4537.

Is the silicium of both devices different enough to prevent the same
exploit to happen ?

-- 
Ueimor


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-11 Thread Francois Romieu
Hayes Wang  :
> For some platforms, the data in memory is not the same with the one
> from the device. That is, the data of memory is unbelievable. The
> check is used to find out this situation.

Invalid packet size corrupted receive descriptors in Realtek's device
reminds of CVE-2009-4537.

Is the silicium of both devices different enough to prevent the same
exploit to happen ?

-- 
Ueimor


Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask

2016-10-14 Thread Francois Romieu
Ard Biesheuvel <ard.biesheu...@linaro.org> :
> PCI devices that are 64-bit DMA capable should set the coherent
> DMA mask as well as the streaming DMA mask. On some architectures,
> these are managed separately, and so the coherent DMA mask will be
> left at its default value of 32 if it is not set explicitly. This
> results in errors such as
> 
>  r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>  hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000
>  swiotlb: coherent allocation failed for device :02:00.0 size=4096
>  CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
>  Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
> 
> on systems without memory that is 32-bit addressable by PCI devices.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

Acked-by: Francois Romieu <rom...@fr.zoreil.com>

Unless someone plans to plug an acenic, a 83820 (pci-e gem board, anyone ?)
on top of a pci <-> pci-e adapter on this kind of motherboard, no other
network driver that uses the pci_... dma api exhibits this mixed 32 / 64 bit
support bug. I haven't checked devices with 32 < mask < 64 nor plain DMA api
converted ones.

-- 
Ueimor


Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask

2016-10-14 Thread Francois Romieu
Ard Biesheuvel  :
> PCI devices that are 64-bit DMA capable should set the coherent
> DMA mask as well as the streaming DMA mask. On some architectures,
> these are managed separately, and so the coherent DMA mask will be
> left at its default value of 32 if it is not set explicitly. This
> results in errors such as
> 
>  r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>  hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000
>  swiotlb: coherent allocation failed for device :02:00.0 size=4096
>  CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
>  Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
> 
> on systems without memory that is 32-bit addressable by PCI devices.
> 
> Signed-off-by: Ard Biesheuvel 

Acked-by: Francois Romieu 

Unless someone plans to plug an acenic, a 83820 (pci-e gem board, anyone ?)
on top of a pci <-> pci-e adapter on this kind of motherboard, no other
network driver that uses the pci_... dma api exhibits this mixed 32 / 64 bit
support bug. I haven't checked devices with 32 < mask < 64 nor plain DMA api
converted ones.

-- 
Ueimor


Re: [PATCH] via-velocity: remove null pointer check on array tdinfo->skb_dma

2016-09-08 Thread Francois Romieu
Colin King <colin.k...@canonical.com> :
> From: Colin Ian King <colin.k...@canonical.com>
> 
> tdinfo->skb_dma is a 7 element array of dma_addr_t hence cannot be
> null, so the pull pointer check on tdinfo->skb_dma  is redundant.
> Remove it.
> 
> Signed-off-by: Colin Ian King <colin.k...@canonical.com>

Acked-by: Francois Romieu <rom...@fr.zoreil.com>

-- 
Ueimor


Re: [PATCH] via-velocity: remove null pointer check on array tdinfo->skb_dma

2016-09-08 Thread Francois Romieu
Colin King  :
> From: Colin Ian King 
> 
> tdinfo->skb_dma is a 7 element array of dma_addr_t hence cannot be
> null, so the pull pointer check on tdinfo->skb_dma  is redundant.
> Remove it.
> 
> Signed-off-by: Colin Ian King 

Acked-by: Francois Romieu 

-- 
Ueimor


Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-31 Thread Francois Romieu
Salil Kapur  :
> I was looking at v3.12. Can we submit patches for stable versions?

It has already been fixed in 3.12.8 as 56399d8b44beae5b80e1eda0350ab6af72baf4d0
("netpoll: Fix missing TXQ unlock and and OOPS.") by davem.

3.12.8 dates back to 2014/01. The current 3.12.x version is 3.12.62.

-- 
Ueimor


Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-31 Thread Francois Romieu
Salil Kapur  :
> I was looking at v3.12. Can we submit patches for stable versions?

It has already been fixed in 3.12.8 as 56399d8b44beae5b80e1eda0350ab6af72baf4d0
("netpoll: Fix missing TXQ unlock and and OOPS.") by davem.

3.12.8 dates back to 2014/01. The current 3.12.x version is 3.12.62.

-- 
Ueimor


Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-30 Thread Francois Romieu
Salil Kapur  :
[...]
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index fc75c9e..9124f76 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -386,8 +386,10 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> sk_buff *skb,
>   
> !vlan_hw_offload_capable(netif_skb_features(skb),
>
> skb->vlan_proto)) {
>   skb = __vlan_put_tag(skb, 
> skb->vlan_proto, vlan_tx_tag_get(skb));
> - if (unlikely(!skb))
> + if (unlikely(!skb)) {
> + __netif_tx_unlock(txq);
>   break;
> + }
>   skb->vlan_tci = 0;
>   }
>  

Your kernel is outdated: __vlan_put_tag has disappeared from net/core/netpoll.c
since 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ("vlan: rename __vlan_put_tag to
vlan_insert_tag_set_proto") by Jiri Pirko somewhere in 2014.

-- 
Ueimor


Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-30 Thread Francois Romieu
Salil Kapur  :
[...]
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index fc75c9e..9124f76 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -386,8 +386,10 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> sk_buff *skb,
>   
> !vlan_hw_offload_capable(netif_skb_features(skb),
>
> skb->vlan_proto)) {
>   skb = __vlan_put_tag(skb, 
> skb->vlan_proto, vlan_tx_tag_get(skb));
> - if (unlikely(!skb))
> + if (unlikely(!skb)) {
> + __netif_tx_unlock(txq);
>   break;
> + }
>   skb->vlan_tci = 0;
>   }
>  

Your kernel is outdated: __vlan_put_tag has disappeared from net/core/netpoll.c
since 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ("vlan: rename __vlan_put_tag to
vlan_insert_tag_set_proto") by Jiri Pirko somewhere in 2014.

-- 
Ueimor


Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu
Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor


Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu
Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Francois Romieu
Hau  :
[...]
> > Either the driver resumes the device so that it can perform requested
> > operation or it signals .set_wol failure when the device is suspended.
> > 
> > If the driver does something else, "spam removal" translates to "silent
> > failure".
> 
> Because "tp->saved_wolopts" will be used to set hardware wol capability in
> rtl8169_runtime_resume().  So I prefer to keep "wol->wolopts" to
> " tp->saved_wolopts " in runtime suspend state and set this to this
> "wol->wolopts" to hardware in in rtl8169_runtime_resume(). 

It would be fine if it could be proven that rtl8169_runtime_resume() will
always be run before software state is lost.

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Francois Romieu
Hau  :
[...]
> > Either the driver resumes the device so that it can perform requested
> > operation or it signals .set_wol failure when the device is suspended.
> > 
> > If the driver does something else, "spam removal" translates to "silent
> > failure".
> 
> Because "tp->saved_wolopts" will be used to set hardware wol capability in
> rtl8169_runtime_resume().  So I prefer to keep "wol->wolopts" to
> " tp->saved_wolopts " in runtime suspend state and set this to this
> "wol->wolopts" to hardware in in rtl8169_runtime_resume(). 

It would be fine if it could be proven that rtl8169_runtime_resume() will
always be run before software state is lost.

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private 
> *tp)
>  static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
> *wol)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
> + struct pci_dev *pdev = tp->pci_dev;

Nit: you may directly use "struct device *d = >pci_dev->dev;"

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private 
> *tp)
>  static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
> *wol)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
> + struct pci_dev *pdev = tp->pci_dev;

Nit: you may directly use "struct device *d = >pci_dev->dev;"

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
> struct ethtool_wolinfo *wol)
>   tp->features |= RTL_FEATURE_WOL;
>   else
>   tp->features &= ~RTL_FEATURE_WOL;
> - __rtl8169_set_wol(tp, wol->wolopts);
> + if (pm_runtime_active(>dev))
> + __rtl8169_set_wol(tp, wol->wolopts);
> + else
> + tp->saved_wolopts = wol->wolopts;
>  
>   rtl_unlock_work(tp);
>  
>   device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
>  
> + pm_runtime_put_noidle(>dev);
> +
>   return 0;

Either the driver resumes the device so that it can perform requested
operation or it signals .set_wol failure when the device is suspended.

If the driver does something else, "spam removal" translates to
"silent failure".

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
> struct ethtool_wolinfo *wol)
>   tp->features |= RTL_FEATURE_WOL;
>   else
>   tp->features &= ~RTL_FEATURE_WOL;
> - __rtl8169_set_wol(tp, wol->wolopts);
> + if (pm_runtime_active(>dev))
> + __rtl8169_set_wol(tp, wol->wolopts);
> + else
> + tp->saved_wolopts = wol->wolopts;
>  
>   rtl_unlock_work(tp);
>  
>   device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
>  
> + pm_runtime_put_noidle(>dev);
> +
>   return 0;

Either the driver resumes the device so that it can perform requested
operation or it signals .set_wol failure when the device is suspended.

If the driver does something else, "spam removal" translates to
"silent failure".

-- 
Ueimor


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-20 Thread Francois Romieu
Netanel Belgazal  :
[...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
> > b/drivers/net/ethernet/amazon/ena/ena_com.h
> > new file mode 100644
> > index 000..e49ba43
> > --- /dev/null
> > [...]
> > +static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
> > *intr_reg,
> > +  u32 rx_delay_interval,
> > +  u32 tx_delay_interval,
> > +  bool unmask)
> > +{
> > +   intr_reg->intr_control = 0;
> > +   intr_reg->intr_control |= rx_delay_interval &
> > +   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> > +
> > +   intr_reg->intr_control |=
> > +   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
> > +   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> >   ^^ TX ?
> >
> > Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
> > queueing is still enabled. Please drop packet and return NETDEV_TX_OK.
> Ack.
> 
> Thanks for your review.

Ack/nack regarding use of ..._RX_INTR_DELAY_MASK with ..._TX_INTR_DELAY_SHIFT ?

-- 
Ueimor


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-20 Thread Francois Romieu
Netanel Belgazal  :
[...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
> > b/drivers/net/ethernet/amazon/ena/ena_com.h
> > new file mode 100644
> > index 000..e49ba43
> > --- /dev/null
> > [...]
> > +static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
> > *intr_reg,
> > +  u32 rx_delay_interval,
> > +  u32 tx_delay_interval,
> > +  bool unmask)
> > +{
> > +   intr_reg->intr_control = 0;
> > +   intr_reg->intr_control |= rx_delay_interval &
> > +   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> > +
> > +   intr_reg->intr_control |=
> > +   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
> > +   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> >   ^^ TX ?
> >
> > Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
> > queueing is still enabled. Please drop packet and return NETDEV_TX_OK.
> Ack.
> 
> Thanks for your review.

Ack/nack regarding use of ..._RX_INTR_DELAY_MASK with ..._TX_INTR_DELAY_SHIFT ?

-- 
Ueimor


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-17 Thread Francois Romieu
Netanel Belgazal  :
[...]

More stuff below.

> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
> +  struct ena_admin_aq_get_stats_cmd *get_cmd,
> +  struct ena_admin_acq_get_stats_resp *get_resp,

At first sight it should be possible avoid one pointer argument with:

struct ena_something {
struct ena_admin_aq_get_stats_cmd cmd;
struct ena_admin_acq_get_stats_resp resp;

};

[...]
> +int ena_com_get_dev_extended_stats(struct ena_com_dev *ena_dev, char *buff,
> +u32 len)

Where is it used ?

> +{
> + int ret = 0;
> + struct ena_admin_aq_get_stats_cmd get_cmd;
> + struct ena_admin_acq_get_stats_resp get_resp;
> + void *virt_addr;

int rc = -ENOMEM;
char *str;

> + dma_addr_t phys_addr;
> +
> + virt_addr = dma_alloc_coherent(ena_dev->dmadev,
> +len,
> +_addr,
> +GFP_KERNEL | __GFP_ZERO);
> + if (!virt_addr) {
> + ret = -ENOMEM;
> + goto done;
> + }
> + memset(_cmd, 0x0, sizeof(get_cmd));
> + ret = ena_com_mem_addr_set(ena_dev,
> +_cmd.u.control_buffer.address,
> +phys_addr);
> + if (unlikely(ret)) {
> + ena_trc_err("memory address set failed\n");
> + return ret;
> + }
> + get_cmd.u.control_buffer.length = len;
> +
> + get_cmd.device_id = ena_dev->stats_func;
> + get_cmd.queue_idx = ena_dev->stats_queue;
> +
> + ret = ena_get_dev_stats(ena_dev, _cmd, _resp,
> + ENA_ADMIN_GET_STATS_TYPE_EXTENDED);
> + if (ret < 0)
> + goto free_ext_stats_mem;
> +
> + ret = snprintf(buff, len, "%s", (char *)virt_addr);

So ENA_ADMIN_GET_STATS_TYPE_EXTENDED fills a buffer with bytes in the
[0; 127] range, right ?

> +
> +free_ext_stats_mem:
> + dma_free_coherent(ena_dev->dmadev, len, virt_addr, phys_addr);
> +done:
> + return ret;
> +}
> +
> +int ena_com_set_dev_mtu(struct ena_com_dev *ena_dev, int mtu)
> +{
> + struct ena_com_admin_queue *admin_queue;
> + struct ena_admin_set_feat_cmd cmd;
> + struct ena_admin_set_feat_resp resp;
> + int ret = 0;

Should be -ENODEV or left uninitialized.

> +
> + if (unlikely(!ena_dev)) {
> + ena_trc_err("%s : ena_dev is NULL\n", __func__);
> + return -ENODEV;
> + }

Does it mean that ena_com_dev may go away while the net_device is in use ?

> +
> + if (!ena_com_check_supported_feature_id(ena_dev, ENA_ADMIN_MTU)) {
> + ena_trc_info("Feature %d isn't supported\n", ENA_ADMIN_MTU);
> + return -EPERM;
> + }

rc = ena_com_check_supported_feature_id(...
if (rc < 0) {
ena_trc_info(...
goto out;
}
> +
> + memset(, 0x0, sizeof(cmd));
> + admin_queue = _dev->admin_queue;
> +
> + cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
> + cmd.aq_common_descriptor.flags = 0;
> + cmd.feat_common.feature_id = ENA_ADMIN_MTU;
> + cmd.u.mtu.mtu = mtu;
> +
> + ret = ena_com_execute_admin_command(admin_queue,
> + (struct ena_admin_aq_entry *),
> + sizeof(cmd),
> + (struct ena_admin_acq_entry *),
> + sizeof(resp));
> +
> + if (unlikely(ret)) {
> + ena_trc_err("Failed to set mtu %d. error: %d\n", mtu, ret);
> + return -EINVAL;

ena_com_execute_admin_command should return a proper status code.

[...]
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
b/drivers/net/ethernet/amazon/ena/ena_com.h
new file mode 100644
index 000..e49ba43
--- /dev/null
[...]
+static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
*intr_reg,
+  u32 rx_delay_interval,
+  u32 tx_delay_interval,
+  bool unmask)
+{
+   intr_reg->intr_control = 0;
+   intr_reg->intr_control |= rx_delay_interval &
+   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
+
+   intr_reg->intr_control |=
+   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
+   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
  ^^ TX ?

Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
queueing is still enabled. Please drop packet and return NETDEV_TX_OK.

-- 
Ueimor


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-17 Thread Francois Romieu
Netanel Belgazal  :
[...]

More stuff below.

> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
> +  struct ena_admin_aq_get_stats_cmd *get_cmd,
> +  struct ena_admin_acq_get_stats_resp *get_resp,

At first sight it should be possible avoid one pointer argument with:

struct ena_something {
struct ena_admin_aq_get_stats_cmd cmd;
struct ena_admin_acq_get_stats_resp resp;

};

[...]
> +int ena_com_get_dev_extended_stats(struct ena_com_dev *ena_dev, char *buff,
> +u32 len)

Where is it used ?

> +{
> + int ret = 0;
> + struct ena_admin_aq_get_stats_cmd get_cmd;
> + struct ena_admin_acq_get_stats_resp get_resp;
> + void *virt_addr;

int rc = -ENOMEM;
char *str;

> + dma_addr_t phys_addr;
> +
> + virt_addr = dma_alloc_coherent(ena_dev->dmadev,
> +len,
> +_addr,
> +GFP_KERNEL | __GFP_ZERO);
> + if (!virt_addr) {
> + ret = -ENOMEM;
> + goto done;
> + }
> + memset(_cmd, 0x0, sizeof(get_cmd));
> + ret = ena_com_mem_addr_set(ena_dev,
> +_cmd.u.control_buffer.address,
> +phys_addr);
> + if (unlikely(ret)) {
> + ena_trc_err("memory address set failed\n");
> + return ret;
> + }
> + get_cmd.u.control_buffer.length = len;
> +
> + get_cmd.device_id = ena_dev->stats_func;
> + get_cmd.queue_idx = ena_dev->stats_queue;
> +
> + ret = ena_get_dev_stats(ena_dev, _cmd, _resp,
> + ENA_ADMIN_GET_STATS_TYPE_EXTENDED);
> + if (ret < 0)
> + goto free_ext_stats_mem;
> +
> + ret = snprintf(buff, len, "%s", (char *)virt_addr);

So ENA_ADMIN_GET_STATS_TYPE_EXTENDED fills a buffer with bytes in the
[0; 127] range, right ?

> +
> +free_ext_stats_mem:
> + dma_free_coherent(ena_dev->dmadev, len, virt_addr, phys_addr);
> +done:
> + return ret;
> +}
> +
> +int ena_com_set_dev_mtu(struct ena_com_dev *ena_dev, int mtu)
> +{
> + struct ena_com_admin_queue *admin_queue;
> + struct ena_admin_set_feat_cmd cmd;
> + struct ena_admin_set_feat_resp resp;
> + int ret = 0;

Should be -ENODEV or left uninitialized.

> +
> + if (unlikely(!ena_dev)) {
> + ena_trc_err("%s : ena_dev is NULL\n", __func__);
> + return -ENODEV;
> + }

Does it mean that ena_com_dev may go away while the net_device is in use ?

> +
> + if (!ena_com_check_supported_feature_id(ena_dev, ENA_ADMIN_MTU)) {
> + ena_trc_info("Feature %d isn't supported\n", ENA_ADMIN_MTU);
> + return -EPERM;
> + }

rc = ena_com_check_supported_feature_id(...
if (rc < 0) {
ena_trc_info(...
goto out;
}
> +
> + memset(, 0x0, sizeof(cmd));
> + admin_queue = _dev->admin_queue;
> +
> + cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
> + cmd.aq_common_descriptor.flags = 0;
> + cmd.feat_common.feature_id = ENA_ADMIN_MTU;
> + cmd.u.mtu.mtu = mtu;
> +
> + ret = ena_com_execute_admin_command(admin_queue,
> + (struct ena_admin_aq_entry *),
> + sizeof(cmd),
> + (struct ena_admin_acq_entry *),
> + sizeof(resp));
> +
> + if (unlikely(ret)) {
> + ena_trc_err("Failed to set mtu %d. error: %d\n", mtu, ret);
> + return -EINVAL;

ena_com_execute_admin_command should return a proper status code.

[...]
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
b/drivers/net/ethernet/amazon/ena/ena_com.h
new file mode 100644
index 000..e49ba43
--- /dev/null
[...]
+static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
*intr_reg,
+  u32 rx_delay_interval,
+  u32 tx_delay_interval,
+  bool unmask)
+{
+   intr_reg->intr_control = 0;
+   intr_reg->intr_control |= rx_delay_interval &
+   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
+
+   intr_reg->intr_control |=
+   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
+   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
  ^^ TX ?

Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
queueing is still enabled. Please drop packet and return NETDEV_TX_OK.

-- 
Ueimor


Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-16 Thread Francois Romieu
Netanel Belgazal  :
[...]

Very limited review below.

> diff --git a/Documentation/networking/ena.txt 
> b/Documentation/networking/ena.txt
> new file mode 100644
> index 000..528544f
> --- /dev/null
> +++ b/Documentation/networking/ena.txt
> @@ -0,0 +1,330 @@
> +Linux kernel driver for Elastic Network Adapter (ENA) family:
> +=
> +
> +Overview:
> +=
> +The ENA driver provides a modern Ethernet device interface optimized
> +for high performance and low CPU overhead.

Marketing boilerplate. How much of it will still be true in 6 months ?
6 years ?

Hint: stay technical. If in doubt, make it boring. :o)

[...]
> +ENA devices allow high speed and low overhead Ethernet traffic
> +processing by providing a dedicated Tx/Rx queue pair per host CPU, a
> +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive
> +interrupt moderation, and CPU cacheline optimized data placement.

How many queues exactly ?

[...]
> +Memory Allocations:
> +===
> +DMA Coherent buffers are allocated for the following DMA rings:
> +- Tx submission ring (For regular mode; for LLQ mode it is allocated
> +  using kzalloc)
> +- Tx completion ring
> +- Rx submission ring
> +- Rx completion ring
> +- Admin submission ring
> +- Admin completion ring
> +- AENQ ring

Useless documentation (implementation detail).

Nobody will maintain it when the driver changes. Please remove.

[...]
> +MTU:
> +
> +The driver supports an arbitrarily large MTU with a maximum that is
> +negotiated with the device. The driver configures MTU using the
> +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
> +via ifconfig(8) and ip(8).

Please avoid advertising legacy ifconfig.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> new file mode 100644
> index 000..8516e5e
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
[...]
> +/* admin commands opcodes */
> +enum ena_admin_aq_opcode {
> + /* create submission queue */
> + ENA_ADMIN_CREATE_SQ = 1,
> +
> + /* destroy submission queue */
> + ENA_ADMIN_DESTROY_SQ = 2,
> +
> + /* create completion queue */
> + ENA_ADMIN_CREATE_CQ = 3,
> +
> + /* destroy completion queue */
> + ENA_ADMIN_DESTROY_CQ = 4,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_GET_FEATURE = 8,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_SET_FEATURE = 9,
> +
> + /* get statistics */
> + ENA_ADMIN_GET_STATS = 11,
> +};

The documentation above simply duplicates the member names -> useless
visual payload.

Please replace space with tab before "=" to line things up.

[...]
> +/* Basic Statistics Command. */
> +struct ena_admin_basic_stats {
> + /* word 0 :  */
> + u32 tx_bytes_low;
> +
> + /* word 1 :  */
> + u32 tx_bytes_high;
> +
> + /* word 2 :  */
> + u32 tx_pkts_low;
> +
> + /* word 3 :  */
> + u32 tx_pkts_high;
> +
> + /* word 4 :  */
> + u32 rx_bytes_low;
> +
> + /* word 5 :  */
> + u32 rx_bytes_high;
> +
> + /* word 6 :  */
> + u32 rx_pkts_low;
> +
> + /* word 7 :  */
> + u32 rx_pkts_high;
> +
> + /* word 8 :  */
> + u32 rx_drops_low;
> +
> + /* word 9 :  */
> + u32 rx_drops_high;
> +};

struct zorg {
u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
};

-> No comment needed to figure the offset.
   Comment removed => no need to check if offset starts at word 0 or word 1.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static inline int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
> +struct ena_common_mem_addr *ena_addr,
> +dma_addr_t addr)
> +{
> + if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
> + ena_trc_err("dma address has more bits that the device 
> supports\n");
> + return -EINVAL;
> + }
> +
> + ena_addr->mem_addr_low = (u32)addr;
> + ena_addr->mem_addr_high =
> + ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 32)) >> 32);

No need to "... & GENMASK" given the test above.

> +
> + return 0;
> +}
> +
> +static int ena_com_admin_init_sq(struct ena_com_admin_queue *queue)
> +{
> + queue->sq.entries =
> + dma_alloc_coherent(queue->q_dmadev,
> +ADMIN_SQ_SIZE(queue->q_depth),
> +>sq.dma_addr,
> +GFP_KERNEL | __GFP_ZERO);

1. Use 

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-16 Thread Francois Romieu
Netanel Belgazal  :
[...]

Very limited review below.

> diff --git a/Documentation/networking/ena.txt 
> b/Documentation/networking/ena.txt
> new file mode 100644
> index 000..528544f
> --- /dev/null
> +++ b/Documentation/networking/ena.txt
> @@ -0,0 +1,330 @@
> +Linux kernel driver for Elastic Network Adapter (ENA) family:
> +=
> +
> +Overview:
> +=
> +The ENA driver provides a modern Ethernet device interface optimized
> +for high performance and low CPU overhead.

Marketing boilerplate. How much of it will still be true in 6 months ?
6 years ?

Hint: stay technical. If in doubt, make it boring. :o)

[...]
> +ENA devices allow high speed and low overhead Ethernet traffic
> +processing by providing a dedicated Tx/Rx queue pair per host CPU, a
> +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive
> +interrupt moderation, and CPU cacheline optimized data placement.

How many queues exactly ?

[...]
> +Memory Allocations:
> +===
> +DMA Coherent buffers are allocated for the following DMA rings:
> +- Tx submission ring (For regular mode; for LLQ mode it is allocated
> +  using kzalloc)
> +- Tx completion ring
> +- Rx submission ring
> +- Rx completion ring
> +- Admin submission ring
> +- Admin completion ring
> +- AENQ ring

Useless documentation (implementation detail).

Nobody will maintain it when the driver changes. Please remove.

[...]
> +MTU:
> +
> +The driver supports an arbitrarily large MTU with a maximum that is
> +negotiated with the device. The driver configures MTU using the
> +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
> +via ifconfig(8) and ip(8).

Please avoid advertising legacy ifconfig.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> new file mode 100644
> index 000..8516e5e
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
[...]
> +/* admin commands opcodes */
> +enum ena_admin_aq_opcode {
> + /* create submission queue */
> + ENA_ADMIN_CREATE_SQ = 1,
> +
> + /* destroy submission queue */
> + ENA_ADMIN_DESTROY_SQ = 2,
> +
> + /* create completion queue */
> + ENA_ADMIN_CREATE_CQ = 3,
> +
> + /* destroy completion queue */
> + ENA_ADMIN_DESTROY_CQ = 4,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_GET_FEATURE = 8,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_SET_FEATURE = 9,
> +
> + /* get statistics */
> + ENA_ADMIN_GET_STATS = 11,
> +};

The documentation above simply duplicates the member names -> useless
visual payload.

Please replace space with tab before "=" to line things up.

[...]
> +/* Basic Statistics Command. */
> +struct ena_admin_basic_stats {
> + /* word 0 :  */
> + u32 tx_bytes_low;
> +
> + /* word 1 :  */
> + u32 tx_bytes_high;
> +
> + /* word 2 :  */
> + u32 tx_pkts_low;
> +
> + /* word 3 :  */
> + u32 tx_pkts_high;
> +
> + /* word 4 :  */
> + u32 rx_bytes_low;
> +
> + /* word 5 :  */
> + u32 rx_bytes_high;
> +
> + /* word 6 :  */
> + u32 rx_pkts_low;
> +
> + /* word 7 :  */
> + u32 rx_pkts_high;
> +
> + /* word 8 :  */
> + u32 rx_drops_low;
> +
> + /* word 9 :  */
> + u32 rx_drops_high;
> +};

struct zorg {
u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
};

-> No comment needed to figure the offset.
   Comment removed => no need to check if offset starts at word 0 or word 1.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static inline int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
> +struct ena_common_mem_addr *ena_addr,
> +dma_addr_t addr)
> +{
> + if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
> + ena_trc_err("dma address has more bits that the device 
> supports\n");
> + return -EINVAL;
> + }
> +
> + ena_addr->mem_addr_low = (u32)addr;
> + ena_addr->mem_addr_high =
> + ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 32)) >> 32);

No need to "... & GENMASK" given the test above.

> +
> + return 0;
> +}
> +
> +static int ena_com_admin_init_sq(struct ena_com_admin_queue *queue)
> +{
> + queue->sq.entries =
> + dma_alloc_coherent(queue->q_dmadev,
> +ADMIN_SQ_SIZE(queue->q_depth),
> +>sq.dma_addr,
> +GFP_KERNEL | __GFP_ZERO);

1. Use dma_zalloc_coherent().

2. Style

Re: r8169: Unconditionally disabling ASPM

2016-05-15 Thread Francois Romieu
Paul Menzel  :
[...]
> As over five years have passed now, do you think that is still needed?
> I wonder why no module parameter was added back then, where users could
> enable ASPM if it works on their systems? Because there is no such
> situation and it always fails?

It was enabled again (d64ec841517a25f6d468bde9f67e5b4cffdc67c7) then
disabled (4521e1a94279ce610d3f9b7945c17d581f804242). It's closer
to 3.5 years :o)

Module parameters are frowned upon.

Lin, is there some interest in selectively [*] enabling (or disabling)
ASPM support in the r8169 driver or will it be unreliable ?

[*] Based on DMI information for instance.

-- 
Ueimor


Re: r8169: Unconditionally disabling ASPM

2016-05-15 Thread Francois Romieu
Paul Menzel  :
[...]
> As over five years have passed now, do you think that is still needed?
> I wonder why no module parameter was added back then, where users could
> enable ASPM if it works on their systems? Because there is no such
> situation and it always fails?

It was enabled again (d64ec841517a25f6d468bde9f67e5b4cffdc67c7) then
disabled (4521e1a94279ce610d3f9b7945c17d581f804242). It's closer
to 3.5 years :o)

Module parameters are frowned upon.

Lin, is there some interest in selectively [*] enabling (or disabling)
ASPM support in the r8169 driver or will it be unreliable ?

[*] Based on DMI information for instance.

-- 
Ueimor


Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu
Florian Westphal <f...@strlen.de> :
> Francois Romieu <rom...@fr.zoreil.com> wrote:
[...]
> > Play it safe and keep the implicit local_irq_{save / restore} call ?
> > 
> > It may not be needed but it will help avoiding any unexpected regression
> > report pointing at the NETDEV_TX_LOCKED removal change.
> 
> I thought about that but it doesn't prevent the irq handler from
> running on another CPU, so leaving it around seemed like cargo culting
> to me...

I don't mind removing it in a different patch at all. I'd rather see
the commit history underline that it's unrelated to whatever
NETDEV_TX_LOCKED / LLTX change.

> I don't have an atl1c, but the atl1e in my laptop seems to work fine
> with the (similar) change.
> 
> If you disagree I can respin with local_irq_save of course, but, if
> 'playing it safe' is main goal then its simpler to convert
> spin_trylock_irqsave to spin_lock_irqsave.

Your call, really.

-- 
Ueimor


Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu
Florian Westphal  :
> Francois Romieu  wrote:
[...]
> > Play it safe and keep the implicit local_irq_{save / restore} call ?
> > 
> > It may not be needed but it will help avoiding any unexpected regression
> > report pointing at the NETDEV_TX_LOCKED removal change.
> 
> I thought about that but it doesn't prevent the irq handler from
> running on another CPU, so leaving it around seemed like cargo culting
> to me...

I don't mind removing it in a different patch at all. I'd rather see
the commit history underline that it's unrelated to whatever
NETDEV_TX_LOCKED / LLTX change.

> I don't have an atl1c, but the atl1e in my laptop seems to work fine
> with the (similar) change.
> 
> If you disagree I can respin with local_irq_save of course, but, if
> 'playing it safe' is main goal then its simpler to convert
> spin_trylock_irqsave to spin_lock_irqsave.

Your call, really.

-- 
Ueimor


Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu
Florian Westphal  :
[...]
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index d0084d4..a3200ea 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
[...]
> @@ -2217,16 +2215,10 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff 
> *skb,
>   }
>  
>   tpd_req = atl1c_cal_tpd_req(skb);
> - if (!spin_trylock_irqsave(>tx_lock, flags)) {
> - if (netif_msg_pktdata(adapter))
> - dev_info(>pdev->dev, "tx locked\n");
> - return NETDEV_TX_LOCKED;
> - }
>  
>   if (atl1c_tpd_avail(adapter, type) < tpd_req) {
>   /* no enough descriptor, just stop queue */
>   netif_stop_queue(netdev);
> - spin_unlock_irqrestore(>tx_lock, flags);
>   return NETDEV_TX_BUSY;
>   }
>

Play it safe and keep the implicit local_irq_{save / restore} call ?

It may not be needed but it will help avoiding any unexpected regression
report pointing at the NETDEV_TX_LOCKED removal change.

-- 
Ueimor


Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu
Florian Westphal  :
[...]
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index d0084d4..a3200ea 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
[...]
> @@ -2217,16 +2215,10 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff 
> *skb,
>   }
>  
>   tpd_req = atl1c_cal_tpd_req(skb);
> - if (!spin_trylock_irqsave(>tx_lock, flags)) {
> - if (netif_msg_pktdata(adapter))
> - dev_info(>pdev->dev, "tx locked\n");
> - return NETDEV_TX_LOCKED;
> - }
>  
>   if (atl1c_tpd_avail(adapter, type) < tpd_req) {
>   /* no enough descriptor, just stop queue */
>   netif_stop_queue(netdev);
> - spin_unlock_irqrestore(>tx_lock, flags);
>   return NETDEV_TX_BUSY;
>   }
>

Play it safe and keep the implicit local_irq_{save / restore} call ?

It may not be needed but it will help avoiding any unexpected regression
report pointing at the NETDEV_TX_LOCKED removal change.

-- 
Ueimor


Re: [PATCH] net: sxgbe: fix error paths in sxgbe_platform_probe()

2016-03-27 Thread Francois Romieu
Rasmus Villemoes <li...@rasmusvillemoes.dk> :
> We need to use post-decrement to ensure that irq_dispose_mapping is
> also called on priv->rxq[0]->irq_no; moreover, if one of the above for
> loops failed already at i==0 (so we reach one of these labels with
> that value of i), we'll enter an essentially infinite loop of
> out-of-bounds accesses.
> 
> Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>

(ok, i is signed)

Reviewed-by: Francois Romieu <rom...@fr.zoreil.com>

Someone messed with my review on 2014/03/25 and got it wrong. :o/

Two years after the initial submission, there is zero change regarding use
of sxgbe_core_ops for extension or manageability. The extra indirection is
ripe for removal during next net-next.

-- 
Ueimor


Re: [PATCH] net: sxgbe: fix error paths in sxgbe_platform_probe()

2016-03-27 Thread Francois Romieu
Rasmus Villemoes  :
> We need to use post-decrement to ensure that irq_dispose_mapping is
> also called on priv->rxq[0]->irq_no; moreover, if one of the above for
> loops failed already at i==0 (so we reach one of these labels with
> that value of i), we'll enter an essentially infinite loop of
> out-of-bounds accesses.
> 
> Signed-off-by: Rasmus Villemoes 

(ok, i is signed)

Reviewed-by: Francois Romieu 

Someone messed with my review on 2014/03/25 and got it wrong. :o/

Two years after the initial submission, there is zero change regarding use
of sxgbe_core_ops for extension or manageability. The extra indirection is
ripe for removal during next net-next.

-- 
Ueimor


Re: SYN flooding on port 80 + DMAR:[DMA Write] faults

2016-03-12 Thread Francois Romieu
Toralf Förster  :
> Today my server (64 bit hardened Gentoo kernel) was faced a SYN-flood attack.
> I do wonder if the DMAR events points to an issue in the kernel ?

Please send a compressed log including all 'fault addr' lines as well
as the (module probe time) XID line from the r8169 driver.

-- 
Ueimor


Re: SYN flooding on port 80 + DMAR:[DMA Write] faults

2016-03-12 Thread Francois Romieu
Toralf Förster  :
> Today my server (64 bit hardened Gentoo kernel) was faced a SYN-flood attack.
> I do wonder if the DMAR events points to an issue in the kernel ?

Please send a compressed log including all 'fault addr' lines as well
as the (module probe time) XID line from the r8169 driver.

-- 
Ueimor


Re: [PATCH net] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.

2016-03-09 Thread Francois Romieu
Hau  :
[...] 
> Unless pcie nic has bug, pcie nic does not need to reset phy to let phy link 
> on.
> 
> There is a counter for phy speed down. If phy is in link down state, this
> counter will start to count down. When it count to 0, phy will speed down.
> Reset phy will reset this counter and prevent phy from speed down.

Thanks for the clarification.

-- 
Ueimor


Re: [PATCH net] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.

2016-03-09 Thread Francois Romieu
Hau  :
[...] 
> Unless pcie nic has bug, pcie nic does not need to reset phy to let phy link 
> on.
> 
> There is a counter for phy speed down. If phy is in link down state, this
> counter will start to count down. When it count to 0, phy will speed down.
> Reset phy will reset this counter and prevent phy from speed down.

Thanks for the clarification.

-- 
Ueimor


Re: [PATCH net] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.

2016-03-08 Thread Francois Romieu
Chunhao Lin  :
> For pcie nic, after setting link speed and thers is no link  driver does not 
> need
> to do phy reset untill link up.
> 
> For some pcie nics, to do this will also reset phy speed down counter and 
> prevent
> phy from auto speed down.
> 
> This patch fix the issue reported in following link.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1547151
> 
> Signed-off-by: Chunhao Lin 
> ---
>  drivers/net/ethernet/realtek/r8169.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index dd2cf37..94f08f1 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1999,7 +1999,8 @@ static int rtl8169_set_speed(struct net_device *dev,
>   goto out;
>  
>   if (netif_running(dev) && (autoneg == AUTONEG_ENABLE) &&
> - (advertising & ADVERTISED_1000baseT_Full)) {
> + (advertising & ADVERTISED_1000baseT_Full) &&
> + !pci_is_pcie(tp->pci_dev)) {
>   mod_timer(>timer, jiffies + RTL8169_PHY_TIMEOUT);
>   }
>  out:

Can you clarify:
- actually this patch does not care about the link at all. So when there's
  link no phy reset is needed either, right ?
- does "this" in "to do this" means that
  1. phy reset prevents phy from auto speed down
  2. avoiding phy reset prevents phy from auto speed down
  I would expect 1. from the rtl_wol_pll_power_down + rtl_speed_down +
  rtl8169_set_speed combo (i.e. we want the driver to allow auto speed down)
  but it's a bit ambiguous.

-- 
Ueimor


Re: [PATCH net] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.

2016-03-08 Thread Francois Romieu
Chunhao Lin  :
> For pcie nic, after setting link speed and thers is no link  driver does not 
> need
> to do phy reset untill link up.
> 
> For some pcie nics, to do this will also reset phy speed down counter and 
> prevent
> phy from auto speed down.
> 
> This patch fix the issue reported in following link.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1547151
> 
> Signed-off-by: Chunhao Lin 
> ---
>  drivers/net/ethernet/realtek/r8169.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index dd2cf37..94f08f1 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1999,7 +1999,8 @@ static int rtl8169_set_speed(struct net_device *dev,
>   goto out;
>  
>   if (netif_running(dev) && (autoneg == AUTONEG_ENABLE) &&
> - (advertising & ADVERTISED_1000baseT_Full)) {
> + (advertising & ADVERTISED_1000baseT_Full) &&
> + !pci_is_pcie(tp->pci_dev)) {
>   mod_timer(>timer, jiffies + RTL8169_PHY_TIMEOUT);
>   }
>  out:

Can you clarify:
- actually this patch does not care about the link at all. So when there's
  link no phy reset is needed either, right ?
- does "this" in "to do this" means that
  1. phy reset prevents phy from auto speed down
  2. avoiding phy reset prevents phy from auto speed down
  I would expect 1. from the rtl_wol_pll_power_down + rtl_speed_down +
  rtl8169_set_speed combo (i.e. we want the driver to allow auto speed down)
  but it's a bit ambiguous.

-- 
Ueimor


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-28 Thread Francois Romieu
Mike Galbraith  :
[...]
> Hrm, relatively new + tasklet woes rings a bell.  Ah, that..
> 
> 
> What's worse is that at the point where this code was written it was
> already well known that tasklets are a steaming pile of crap and
> should die.
> 
> 
> Source thereof https://lwn.net/Articles/588457/

tasklets are ingrained in the dmaengine API (see 
Documentation/dmaengine/client.txt
and drivers/dma/virt-dma.h::vchan_cookie_complete).

Moving everything to irq context or handling his own sub-{jiffy/ms} timer
while losing async dma doesn't exactly smell like roses either. :o(

-- 
Ueimor


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-28 Thread Francois Romieu
Mike Galbraith  :
[...]
> Hrm, relatively new + tasklet woes rings a bell.  Ah, that..
> 
> 
> What's worse is that at the point where this code was written it was
> already well known that tasklets are a steaming pile of crap and
> should die.
> 
> 
> Source thereof https://lwn.net/Articles/588457/

tasklets are ingrained in the dmaengine API (see 
Documentation/dmaengine/client.txt
and drivers/dma/virt-dma.h::vchan_cookie_complete).

Moving everything to irq context or handling his own sub-{jiffy/ms} timer
while losing async dma doesn't exactly smell like roses either. :o(

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix nic sometimes doesn't work after changing the mac address.

2016-02-26 Thread Francois Romieu
Chunhao Lin  :
> When there is no AC power, NIC doesn't work after changing mac address.
> Please refer to following link.
> http://www.spinics.net/lists/netdev/msg356572.html
> 
> This issue is caused by runtime power management. When there is no AC power, 
> if we
> put NIC down (ifconfig down), the driver will be put in runtime suspend state 
> and
> device will in D3 state. In this time, driver cannot access hardware 
> regisers. So
> if you set new mac address during this time, it will not work. And then, after
> resume, the NIC will keep using the old mac address and so the network will 
> not
> work normally.
> 
> In this patch I add detecting runtime pm state when setting mac address. If
> driver is in runtime suspend, I will skip setting mac address and  set the new
> mac address during runtime resume.

Instead of taking the device out of suspended mode to perform the required
action, the driver is moving to a model where 1) said action may be scheduled
to a later time - or result from past time work - and 2) rpm handler must
handle a lot of pm unrelated work.

rtl8169_ethtool_ops.{get_wol, get_regs, get_settings} aren't even fixed
yet (what about the .set_xyz handlers ?).

I can't help thinking that the driver should return to a state where it
stupidly does what it is asked to. No software caching, plain device
access, resume when needed, suspend as "suspend" instead of suspend as
"anticipate whatever may happen to avoid waking up".

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix nic sometimes doesn't work after changing the mac address.

2016-02-26 Thread Francois Romieu
Chunhao Lin  :
> When there is no AC power, NIC doesn't work after changing mac address.
> Please refer to following link.
> http://www.spinics.net/lists/netdev/msg356572.html
> 
> This issue is caused by runtime power management. When there is no AC power, 
> if we
> put NIC down (ifconfig down), the driver will be put in runtime suspend state 
> and
> device will in D3 state. In this time, driver cannot access hardware 
> regisers. So
> if you set new mac address during this time, it will not work. And then, after
> resume, the NIC will keep using the old mac address and so the network will 
> not
> work normally.
> 
> In this patch I add detecting runtime pm state when setting mac address. If
> driver is in runtime suspend, I will skip setting mac address and  set the new
> mac address during runtime resume.

Instead of taking the device out of suspended mode to perform the required
action, the driver is moving to a model where 1) said action may be scheduled
to a later time - or result from past time work - and 2) rpm handler must
handle a lot of pm unrelated work.

rtl8169_ethtool_ops.{get_wol, get_regs, get_settings} aren't even fixed
yet (what about the .set_xyz handlers ?).

I can't help thinking that the driver should return to a state where it
stupidly does what it is asked to. No software caching, plain device
access, resume when needed, suspend as "suspend" instead of suspend as
"anticipate whatever may happen to avoid waking up".

-- 
Ueimor


Re: [PATCH net v4] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-24 Thread Francois Romieu
Chunhao Lin  :
[...]

Fine with me.

Is there any chance for the set of chipset dependent registers that
are safe to be read when in D3 state to be documented ?

-- 
Ueimor


Re: [PATCH net v4] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-24 Thread Francois Romieu
Chunhao Lin  :
[...]

Fine with me.

Is there any chance for the set of chipset dependent registers that
are safe to be read when in D3 state to be documented ?

-- 
Ueimor


Re: [PATCH net v3] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-23 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 537974c..404be51 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -7853,6 +7859,11 @@ static int rtl8169_runtime_suspend(struct device 
> *device)
>  
>   rtl8169_net_suspend(dev);
>  
> + /* Update counters before going runtime suspend */
> + if (netif_running(dev))

This test is useless (always true):

- rtl_open
  [...]
  pm_runtime_get_sync(>dev);
  [...]
  tp->TxDescArray = blah
  [...]

- rtl8169_close
  [...]
  pm_runtime_get_sync(>dev);
  [...]
  tp->TxDescArray = NULL;

- rtl8169_runtime_suspend
  [...]
  if (!tp->TxDescArray)
  return 0;

(the implicit smp barriers are mildly obvious, ok)

-- 
Ueimor


Re: [PATCH net v3] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-23 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 537974c..404be51 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -7853,6 +7859,11 @@ static int rtl8169_runtime_suspend(struct device 
> *device)
>  
>   rtl8169_net_suspend(dev);
>  
> + /* Update counters before going runtime suspend */
> + if (netif_running(dev))

This test is useless (always true):

- rtl_open
  [...]
  pm_runtime_get_sync(>dev);
  [...]
  tp->TxDescArray = blah
  [...]

- rtl8169_close
  [...]
  pm_runtime_get_sync(>dev);
  [...]
  tp->TxDescArray = NULL;

- rtl8169_runtime_suspend
  [...]
  if (!tp->TxDescArray)
  return 0;

(the implicit smp barriers are mildly obvious, ok)

-- 
Ueimor


Re: [PATCH net v2] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-22 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 537974c..a645f8d 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7730,10 +7730,13 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
>   void __iomem *ioaddr = tp->mmio_addr;
> + struct pci_dev *pdev = tp->pci_dev;

+   struct device *d = >pci_dev->dev;

(the patch does not use pdev alone)

[...]
> @@ -7761,7 +7764,8 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>* Fetch additonal counter values missing in stats collected by driver
>* from tally counters.
>*/
> - rtl8169_update_counters(dev);
> + if (pm_runtime_active(>dev))
> + rtl8169_update_counters(dev);

pm_runtime_active() won't change after pm_runtime_get_noresume(). You may
set a boolean active = pm_runtime_active(d) before testing netif_running().

[...]
> @@ -7842,6 +7848,12 @@ static int rtl8169_runtime_suspend(struct device 
> *device)
>   struct pci_dev *pdev = to_pci_dev(device);
>   struct net_device *dev = pci_get_drvdata(pdev);
>   struct rtl8169_private *tp = netdev_priv(dev);
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + /* Update counters before going runtime suspend */
> + if (netif_running(dev))
> + rtl8169_rx_missed(dev, ioaddr);
> + rtl8169_update_counters(dev);
>  
>   if (!tp->TxDescArray)
>   return 0;

Nits:

- the tp->TxDescArray test provides the required synchronization: see
  rtl8169_{open/close} and their pm_runtime_{get / put}.

- ioaddr is not really needed : tp->mmio_addr appears only once and it does
  not mess the 72..80 cols limit.

- even if the device can only be automatically runtime suspended some time
  after a link down event, you may address davem's point regarding stats
  reliability and move rtl8169_rx_missed + rtl8169_update_counters after
  rtl8169_net_suspend.

-- 
Ueimor


Re: [PATCH net v2] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-22 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 537974c..a645f8d 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7730,10 +7730,13 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
>   void __iomem *ioaddr = tp->mmio_addr;
> + struct pci_dev *pdev = tp->pci_dev;

+   struct device *d = >pci_dev->dev;

(the patch does not use pdev alone)

[...]
> @@ -7761,7 +7764,8 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>* Fetch additonal counter values missing in stats collected by driver
>* from tally counters.
>*/
> - rtl8169_update_counters(dev);
> + if (pm_runtime_active(>dev))
> + rtl8169_update_counters(dev);

pm_runtime_active() won't change after pm_runtime_get_noresume(). You may
set a boolean active = pm_runtime_active(d) before testing netif_running().

[...]
> @@ -7842,6 +7848,12 @@ static int rtl8169_runtime_suspend(struct device 
> *device)
>   struct pci_dev *pdev = to_pci_dev(device);
>   struct net_device *dev = pci_get_drvdata(pdev);
>   struct rtl8169_private *tp = netdev_priv(dev);
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + /* Update counters before going runtime suspend */
> + if (netif_running(dev))
> + rtl8169_rx_missed(dev, ioaddr);
> + rtl8169_update_counters(dev);
>  
>   if (!tp->TxDescArray)
>   return 0;

Nits:

- the tp->TxDescArray test provides the required synchronization: see
  rtl8169_{open/close} and their pm_runtime_{get / put}.

- ioaddr is not really needed : tp->mmio_addr appears only once and it does
  not mess the 72..80 cols limit.

- even if the device can only be automatically runtime suspended some time
  after a link down event, you may address davem's point regarding stats
  reliability and move rtl8169_rx_missed + rtl8169_update_counters after
  rtl8169_net_suspend.

-- 
Ueimor


  1   2   3   4   5   6   7   8   9   10   >