from:"Francois Romieu"

Re: [PATCH v2 net] r8169: fix NAPI handling under high load

2018-10-19 Thread Francois Romieu

Eric Dumazet  :
> On 10/18/2018 03:59 PM, Francois Romieu wrote:
> > Eric Dumazet  :
> > [...]
> >> One has to wonder why rtl8169_poll(), which might be called in a loop 
> >> under DOS,
> >> has to call rtl_ack_events() ?
> > 
> > So as to cover a wider temporal range before any event can trigger an
> > extra irq. I was more worried about irq cost than about IO cost (and
> > I still am).
> > 
> Normally the IRQ would not be enabled under DOS.

Yes.

My concern was not the DOS situation when NAPI runs at full speed.
As far as I was able to experiment with it, the driver did not seem
too bad here.

The location of the ack targets the interim situation where the IRQ
rate can increase before NAPI kicks in. By increasing the time range
whose events can be acked, the maximum irq rate should be lowered.

-- 
Ueimor

Re: [PATCH v2 net] r8169: fix NAPI handling under high load

2018-10-18 Thread Francois Romieu

Eric Dumazet  :
[...]
> One has to wonder why rtl8169_poll(), which might be called in a loop under 
> DOS,
> has to call rtl_ack_events() ?

So as to cover a wider temporal range before any event can trigger an
extra irq. I was more worried about irq cost than about IO cost (and
I still am).

-- 
Ueimor

Re: [PATCH net] r8169: fix NAPI handling under high load

2018-10-17 Thread Francois Romieu

Holger Hoffstätte  :
[...]
> I continued to use the BQL patch in my private tree after it was reverted
> and also had occasional timeouts, but *only* after I started playing
> with ethtool to change offload settings. Without offloads or the BQL patch
> everything has been rock-solid since then.
> The other weird problem was that timeouts would occur on an otherwise
> *completely idle* system. Since that occasionally borked my NFS server
> over night I ultimately removed BQL as well. Rock-solid since then.

The bug will induce delayed rx processing when a spike of "load" is
followed by an idle period. I do not see how bql would matter per se though.

-- 
Ueimor

Re: [PATCH net] r8169: fix NAPI handling under high load

2018-10-17 Thread Francois Romieu

Heiner Kallweit  :
[...]
> This issue has been there more or less forever (at least it exists in
> 3.16 already), so I can't provide a "Fixes" tag. 

Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.

-- 
Ueimor

Re: [PATCH net-next] r8169: perform reset synchronously in __rtl8169_resume

2018-05-22 Thread Francois Romieu

David Miller  :
> From: Heiner Kallweit 
> Date: Mon, 21 May 2018 19:01:19 +0200
> 
> > The driver uses pm_runtime_get_sync() in few places and relies on the
> > device being fully runtime-resumed after this call. So far however
> > the runtime resume callback triggers an asynchronous reset. 
> > Avoid this and perform the reset synchronously.
[...]
> Given what we know about ->ndo_open() and the checks by the callers of
> __rtl8169_resume(), the netif_running() test seems superfluous or
> wrong.
> 
> Either way you need to resolve this somehow.

Actually I do not see why the asynchronous reset would be a problem.

Aforementioned places that use pm_runtime_get_sync() are rtl_{open/close}

1. I understand that rtl_open needs to reach synchronously something like
   a D0 state but it does not need anything else, whence the current no-op
   in the driver runtime suspend/resume handlers (that I should have
   remembered btw).

2. rtl_close() needs the same D0 state to perform the hw counters update
   but it should neither need nor care about rtl_reset_work. rtl_close
   even disables itself the deferred work queue right after the hw counter
   update.

If it's also worth making the code more palatable, more explicit symmetry
between rtl8169_net_suspend and __rtl8169_resume would be welcome.

-- 
Ueimor

Re: [PATCH net-next] r8169: perform reset synchronously in __rtl8169_resume

2018-05-21 Thread Francois Romieu

Heiner Kallweit  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 75dfac024..1eb4f625a 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7327,9 +7327,9 @@ static void __rtl8169_resume(struct net_device *dev)
>   rtl_lock_work(tp);
>   napi_enable(>napi);
>   set_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags);
> + if (netif_running(dev))
> + rtl_reset_work(tp);
>   rtl_unlock_work(tp);
> -
> - rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>  }
>  
>  static int rtl8169_resume(struct device *device)

netif_running() returns true before rtl_open(): issuing rtl_reset_work()
synchronously against uninitialized rings right at the start of rtl_open()
won't work as expected.

pm_runtime_get_sync() could be issued later in rtl_open() (but I would
not mind something different with runtime_{resume/suspend} in open/close).

-- 
Ueimor

Re: [PATCH net] r8169: fix setting driver_data after register_netdev

2018-03-26 Thread Francois Romieu

Andrew Lunn  :
[...]
> How about rtl8169_get_wol() and rtl8169_set_wol(). And
> rtl8169_get_ethtool_stats().

rtl8169_get_wol does not depend on dev->driver_data. Neither does
rtl8169_set_wol() nor rtl8169_get_ethtool_stats().

> Basically anything which makes use of run time power management
> could be invoked as soon as parts of register_netdev() have been
> called.

Ok, it can crash through rtl_open and check_link_status.

If rtl_open can be called that early, rtl_init_one::rtl8168_driver_start()
may also be executed a bit late.

-- 
Ueimor

Re: [PATCH net] r8169: fix setting driver_data after register_netdev

2018-03-25 Thread Francois Romieu

Heiner Kallweit  :
> pci_set_drvdata() is called only after registering the net_device,
> therefore we could run into a NPE if one of the functions using
> driver_data is called before it's set.
> 
> Fix this by calling pci_set_drvdata() before registering the
> net_device.
> 
> This fix is a candidate for stable. As far as I can see the
> bug has been there in kernel version 3.2 already, therefore
> I can't provide a reference which commit is fixed by it.

It does not sound convincing.

Please tell which functions are supposed to crash.

Suspend / resume ones ? Anything else ?

-- 
Ueimor

Re: [PATCH net-next] r8169: improve interrupt handling

2018-02-23 Thread Francois Romieu

Heiner Kallweit  :
[...]
> Last but not least it enables a feature which was (I presume accidently)
> disabled before. There are members of the RTL8169 family supporting MSI
> (e.g. RTL8169SB), however MSI never got enabled because RTL_CFG_0 was
> missing flag RTL_FEATURE_MSI.
> An indicator for "accidently" is that statement "cfg2 |= MSIEnable;"

The reality is more simple: it could had been removed.

> in rtl_try_msi() is dead code. cfg2 is used for chip versions <= 06
> only and for all these chip versions RTL_FEATURE_MSI isn't set.

On purpose:
1. mostly untested
2. MSI without MSI-X does not buy much
3. wrt 2., ok, it kills (yucky) plain old shared PCI irq (remember those ?)
   but I didn't end feeling that good about realtek MSI support on older
   chipsets to enable it any further

[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 96db3283e..4730db990 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> -static unsigned rtl_try_msi(struct rtl8169_private *tp,
> - const struct rtl_cfg_info *cfg)
> +static void rtl_alloc_irq(struct rtl8169_private *tp)
>  {
[...]
> + ret = pci_alloc_irq_vectors(tp->pci_dev, 1, 1, PCI_IRQ_ALL_TYPES);
> + if (ret < 0) {
> + netif_err(tp, drv, tp->dev, "failed to allocate irq!\n");
> + return;
[...]
> @@ -8497,9 +8495,7 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   chipset = tp->mac_version;
>   tp->txd_version = rtl_chip_infos[chipset].txd_version;
>  
> - RTL_W8(Cfg9346, Cfg9346_Unlock);
> - tp->features |= rtl_try_msi(tp, cfg);
> - RTL_W8(Cfg9346, Cfg9346_Lock);
> + rtl_alloc_irq(tp);

Happily proceeding after error. :o/

-- 
Ueimor

Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu

net-next is closed anyway.

-- 
Ueimor

Re: [PATCH net-next] r8169: add module param for control of ASPM disable

2018-02-01 Thread Francois Romieu

Chunhao Lin  :
[...]
> @@ -5878,6 +5881,20 @@ static void rtl_pcie_state_l2l3_enable(struct 
> rtl8169_private *tp, bool enable)
>   RTL_W8(Config3, data);
>  }
>  
> +static void rtl_hw_internal_aspm_clkreq_enable(struct rtl8169_private *tp,
> + bool enable)
> +{
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + if (enable) {
> + RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
> + } else {
> + RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
> + RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
> + }
> +}

s/enable(..., false)/disable()/

static void rtl_hw_internal_aspm_clkreq_enable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
}

static void rtl_hw_internal_aspm_clkreq_disable(truct rtl8169_private *tp)
{
void __iomem *ioaddr = tp->mmio_addr;

RTL_W8(Config2, RTL_R8(Config2) & ~ClkReqEn);
RTL_W8(Config5, RTL_R8(Config5) & ~ASPM_en);
}

If you really want to factor something out, you may use helpers that set
or clear bits according to the 3-uple (tp, register, bits) but
foo_enable(..., false) is pointlessly convoluted.

-- 
Ueimor

[PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics.

2018-01-25 Thread Francois Romieu

Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <rom...@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyerm...@googlemail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 272c5962e4f7..8e91274174f1 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2235,19 +2235,14 @@ static bool rtl8169_do_counters(struct net_device *dev, 
u32 counter_cmd)
void __iomem *ioaddr = tp->mmio_addr;
dma_addr_t paddr = tp->counters_phys_addr;
u32 cmd;
-   bool ret;
 
RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+   RTL_R32(CounterAddrHigh);
cmd = (u64)paddr & DMA_BIT_MASK(32);
RTL_W32(CounterAddrLow, cmd);
RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-   ret = rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000);
-
-   RTL_W32(CounterAddrLow, 0);
-   RTL_W32(CounterAddrHigh, 0);
-
-   return ret;
+   return rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)
-- 
2.14.3

Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-23 Thread Francois Romieu

Oliver Freyermuth  :
[...]
> This looks like it could very well match the structure found in memory,
> so something would be broken related to rtl8169_do_counters, in the DMA
> transfer. 
> 
> Does this help - can I provide more info? I get the feeling this affects
> many tens of thousands of systems and just has been hidden due to network
> stats being read rarely... 

It helps. Can you try the snippet below ?

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 272c5962e4f7..8531b41e3397 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2238,16 +2238,12 @@ static bool rtl8169_do_counters(struct net_device *dev, 
u32 counter_cmd)
bool ret;
 
RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+   RTL_R32(CounterAddrHigh);
cmd = (u64)paddr & DMA_BIT_MASK(32);
RTL_W32(CounterAddrLow, cmd);
RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-   ret = rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000);
-
-   RTL_W32(CounterAddrLow, 0);
-   RTL_W32(CounterAddrHigh, 0);
-
-   return ret;
+   return rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)

Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Francois Romieu

Oliver Freyermuth <o.freyerm...@googlemail.com> :
> Am 21.01.2018 um 21:48 schrieb Francois Romieu:
> > Oliver Freyermuth <o.freyerm...@googlemail.com> :
[...]
> > Is it an AMD based system ?
> > 
> 
> No, all the systems on which I have observed this up to now are Intel-based. 
> Two Haswell and one Sandy Bridge system. 

Ok.

You said:

Oliver Freyermuth <o.freyerm...@googlemail.com> :
[...]
> The values found in overwritten memory match those contained in
> /proc/self/net/dev for the realtek ethernet device.

Are you able to retrieve the layout ? That is, does it appear to match:

- r8169 hardware stats DMA buffer ?
  TxOk, RxOk, TxErr, RxErr, ...

- rtnl_link_stats ?
  rx_packets, tx_packets, rx_bytes, tx_bytes, ...

or something else ?

-- 
Ueimor

Re: Memory corruption with r8169 across several device revisions and kernels

2018-01-21 Thread Francois Romieu

Oliver Freyermuth  :
[...]

Is it an AMD based system ?

-- 
Ueimor

Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu

Jia-Ju Bai <baijiaju1...@gmail.com> :
> 
> On 2018/1/19 9:11, Francois Romieu wrote:
> > Jia-Ju Bai <baijiaju1...@gmail.com> :
> > [...]
> > > The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> > >  if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> > >  netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> > >  goto err_stop_0;
> > >  }
> > > But there is no memory barrier around this code.
> > > 
> > > Is there a possible data race here?
> > This code would not even be needed if rtl8169_start_xmit was only your
> > usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
> > (see r8169_csum_workaround).
> > 
> > If the test is not a no-op in this GSO context, it's racy.
> > 
> 
> Thanks for reply.
> I didn't clearly understand your meaning...

It's fine.

> I wonder whether there is a possible data race and whether a "smp_mb" is
> needed before this code?
> By the way, do you mean that this code can be removed?

This code may be removed in a driver that properly stops itself its
tx queueing in the ndo_start_xmit handler (I would still keep it as
a bug detection helper but it's just a matter of taste). That's what
the r8169 driver used to aim at.

However, since e974604b453e87f8d864371786375d3d511fdf56, there is a piece
of code where the r8169 driver iteratively uses its own ndo_start_xmit
(without even checking its return value) in r8169_csum_workaround.
It is racy. Now, let's forget races for a few seconds: how is
r8169_csum_workaround supposed to work at all given that it does not care
if (the "unlikely(...)" test in) rtl8169_start_xmit succeeds or not ?

rtl8169_start_xmit can leave the skb as-is or map it to hardware descriptors
(whence late release in rtl_tx). net/core/dev.c::dev_hard_start_xmit cares.
r8169_csum_workaround doesn't.

-- 
Ueimor

Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-19 Thread Francois Romieu

Peter Zijlstra <pet...@infradead.org> :
> On Fri, Jan 19, 2018 at 02:11:18AM +0100, Francois Romieu wrote:
> > Peter Zijlstra <pet...@infradead.org> :
> > [...]
> > > There is only 1 variable afaict. Memory barriers need at least 2 in
> > > order to be able to do _anything_.
> > 
> > I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
> > two variables ?
> 
> There wasn't any cur_tx in the code you provided.

/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
#define TX_FRAGS_READY_FOR(tp,nr_frags) \
(TX_SLOTS_AVAIL(tp) >= (nr_frags + 1))

#define TX_SLOTS_AVAIL(tp) \
(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)

Both are also used in rtl_tx.

I don't get your point. Even a single variable is scattered through
the system.

-- 
Ueimor

Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu

Jia-Ju Bai  :
[...]
> The function rtl8169_start_xmit reads tp->dirty_tx in TX_FRAGS_READY_FOR:
> if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
> netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
> goto err_stop_0;
> }
> But there is no memory barrier around this code.
> 
> Is there a possible data race here?

This code would not even be needed if rtl8169_start_xmit was only your
usual ndo_start_xmit handler: Realtek {ab / re}used it for GSO handling
(see r8169_csum_workaround).

If the test is not a no-op in this GSO context, it's racy.

-- 
Ueimor

Re: net: r8169: a question of memory barrier in the r8169 driver

2018-01-18 Thread Francois Romieu

Peter Zijlstra  :
[...]
> There is only 1 variable afaict. Memory barriers need at least 2 in
> order to be able to do _anything_.

I don't get your point: why don't {cur_tx, dirty_tx} qualify as said
two variables ?

-- 
Ueimor

Re: [PATCH 1/2] r8169: reinstate ALDPS for power saving

2017-11-15 Thread Francois Romieu

David Miller  :
[...]
> The amount of coverage this change is going to get is very small as
> well, meaning an even greater chance of regressions.

Yes.

> Therefore the only acceptable way to handle this is to have
> a white-list, specific chips that have been explicitly tested
> and are known to work with this feature, rather than the other
> way around.
> 
> Furthermore, you're not even checking the chip version, you're
> checking instead whether the firmware is loaded or not.  That
> doesn't seem like a safe way to guard this at all.

Actually the chip specific xyz_hw_phy_config methods call the relevant
aldps enabling helper _but_ the 8168evl dedicated xyz_hw_phy_config
doesn't. The firmware loaded check is just a distraction for the
busy reviewer.

-- 
Ueimor

Re: linux-next: Signed-off-by missing for commit in the net-next tree

2017-11-01 Thread Francois Romieu

Kirill Smelkov  :
[...]
> I was keeping you in To and Cc all the time but got no reply at all since my
> first posting from ~ 1 month ago.

I thought it was longer than that. Sorry for the frustrating excess delay.

As Eric already said there is no problem and I am perfectly fine with
the current attribution of this code.

Use of errno.h::ELNRNG is really unusual but it's a different topic.

-- 
Ueimor

Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup

2017-10-05 Thread Francois Romieu

Daniel Drake  :
[...]
> Also, is there a standard behaviour defined for ethernet drivers
> regarding wake-on-LAN? r8169 appears to enable wake-on-LAN by default
> if it believes the hardware is capable of it,

If so it isn't its designed behavior.

The r8169 driver does not enable specific WoL event source (unicast packet,
link, etc.). It should keep the current settings unless one of those holds:
- explicit wol config from userspace (obviously :o) )
- runtime pm requires different settings to resume. The change should
  be temporary (save before suspend, restore after resume).

The device is supposed to require both an event source + Config1.PMEnable.

A problem may happen if some event source bit is already set while
Config1.PMEnable is not. The driver has been forcing Config1.PMEnable
since 5d06a99f543e734ceb53bbc9e550537be97f0c49. One may thus experience
transition from inconsistent wol settings to enabled ones (if you want
to dig it, check beforehand if Config1.PMEnable is really read-write or
hardwired to 1).

Whatever it seems rather orthogonal with your wake over wifi style instant
resume symptom.

-- 
Ueimor

Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation

2017-09-21 Thread Francois Romieu

Amine Kherbouche  :
[...]
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..060ed07 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
[...]
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>  
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> + return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> + u8 *protocol, struct flowi4 *fl4)
> +{
> + return 0;
> +}
> +
> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
> + .encap_hlen = ipgre_mpls_encap_hlen,
> + .build_header = ipgre_mpls_build_header,
> +};

Nit: af_mpls.c uses tab before '=' in such places.

> +
> +int ipgre_tunnel_encap_add_mpls_ops(void)
> +{
> + int ret;
> +
> + ret = ip_tunnel_encap_add_ops(_iptun_ops, TUNNEL_ENCAP_MPLS);

ip_tunnel_encap_add_ops is CONFIG_NET_IP_TUNNEL dependant.

Afaics CONFIG_MPLS does not enforce it.

[...]
> @@ -2486,6 +2521,7 @@ static int __init mpls_init(void)
> 0);
>   rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
> mpls_netconf_dump_devconf, 0);
> + ipgre_tunnel_encap_add_mpls_ops();
>   err = 0;
>  out:
>   return err;

ipgre_tunnel_encap_add_mpls_ops status return code is not checked.

-- 
Ueimor

Re: broken vlan support on Realtek RTL8111/8168/8411 rev 9

2017-09-11 Thread Francois Romieu

Benoit Panizzon  :
[...]
> It's not marked as fixed but:
> 
> when I try to turn offloading off I get:
> 
> # /sbin/ethtool -K eth0 rx-vlan-offload off
> ethtool: bad command line argument(s)
> For more information run ethtool -h
> 
> Could anyone give me a hint how to further debug the problem or disable
> vlan offloading completely?

ethtool -K eth0 rxvlan off

For my reward, please send a complete dmesg where the messages from
the vanilla r8169 module appear for the rev 09 card (r81..f ?). I won't
dig it right now.

-- 
Ueimor

Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu

David Miller  :
[...]
> Oops, this will need to be sent as a relative fixup as I've applied these
> two patches to net-next, sorry Francois.

No problem. It works perfectly this way.

-- 
Ueimor

Re: [PATCH v3 net-next 2/2] wan: dscc4: convert to plain DMA API

2017-08-11 Thread Francois Romieu

Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 8480dbf..a043fb1 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
[...]
> @@ -506,8 +506,9 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>   skbuff = dpriv->rx_skbuff;
>   for (i = 0; i < RX_RING_SIZE; i++) {
>   if (*skbuff) {
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> - RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU),
> +  DMA_FROM_DEVICE);

 RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

[...]
> @@ -664,8 +665,8 @@ static inline void dscc4_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   goto refill;
>   }
>   pkt_len = TO_SIZE(le32_to_cpu(rx_fd->state2));
> - pci_unmap_single(pdev, le32_to_cpu(rx_fd->data),
> -  RX_MAX(HDLC_MAX_MRU), PCI_DMA_FROMDEVICE);
> + dma_unmap_single(d, le32_to_cpu(rx_fd->data),
> +  RX_MAX(HDLC_MAX_MRU), DMA_FROM_DEVICE);

dma_unmap_single(d, le32_to_cpu(rx_fd->data), RX_MAX(HDLC_MAX_MRU),
 DMA_FROM_DEVICE);

[...]
> @@ -782,8 +783,8 @@ static int dscc4_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   rc = -ENOMEM;
>  
> - priv->iqcfg = (__le32 *) pci_alloc_consistent(pdev,
> - IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma);
> + priv->iqcfg = (__le32 *)dma_alloc_coherent(>dev,
> + IRQ_RING_SIZE*sizeof(__le32), >iqcfg_dma, GFP_KERNEL);

- the cast can go away
- please replace >dev with a local variable

priv->iqcfg = dma_alloc_coherent(d, IRQ_RING_SIZE*sizeof(__le32),
 >iqcfg_dma, GFP_KERNEL);

Same thing for iqtx and iqrx (beware of copy + tx/rx mismatch).

Thanks.

-- 
Ueimor

Re: skb allocation from interrupt handler?

2017-08-09 Thread Francois Romieu

Murali Karicheri  :
[...]
> The internal memory or FIFO can store only up to 3 MTU sized packets. So that 
> has to
> be processed before PRU gets another packets to send to CPU. So per above, 
> it is not ideal to run NAPI for this scenario, right? Also for NetCP we use
> about 128 descriptors with MTU size buffers to handle 1Gbps Ethernet link.
> Based on that roughly we would need at least 10-12 buffers in the FIFO.
> 
> Currently we have a NAPI implementation in use that gives throughput of 
> 95Mbps for
> MTU sized packets, but our UDP iperf tests shows less than 1% packet loss for 
> an
> offered traffic of 95Mbps with MTU sized packets.  This is not good for 
> industrial
> network using HSR/PRP protocol for network redundancy. We need to have zero 
> packet
> loss for MTU sized packets at 95Mbps throughput. That is the problem 
> description.

Imvho you should instrument the kernel to figure where the excess latency that
prevents NAPI processing to take place within 125 us of physical packet 
reception
comes from.

> As an experiment, I have moved the packet processing to irq handler to see if 
> we 
> can take advantage of CPU cycle to processing the packet instead of NAPI
> and to check if the firmware encounters buffer overflow. The result is 
> positive 
> with no buffer overflow seen at the firmware and no packet loss in the iperf 
> test.
> But we want to do more testing as an experiment and ran into a uart console 
> locks
> up after running traffic for about 2 minutes. So I tried enabling the DEBUG 
> HACK 
> options to get some clue on what is happening and ran into the trace I shared 
> earlier. So what function can I use to allocate SKB from interrupt handler ?

Is your design also so tight on memory that you can't even refill your own
software skb pool from some non-irq context then only swap buffers in the
irq handler ?

-- 
Ueimor

Re: [PATCH net-next v2] wan: dscc4: add checks for dma mapping errors

2017-08-08 Thread Francois Romieu

Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830f..6a9ffac 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -518,23 +518,31 @@ static void dscc4_release_ring(struct dscc4_dev_priv 
> *dpriv)
>  static inline int try_get_rx_skb(struct dscc4_dev_priv *dpriv,
>struct net_device *dev)
>  {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   unsigned int dirty = dpriv->rx_dirty%RX_RING_SIZE;
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);

For the edification of the masses, you may follow a strict inverted
xmas tree style (aka longer lines first as long as it does not bug).

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);

Use a local struct pci_dev *pdev and it fits on a single line.

At some point it will probably be converted to plain dma api and use a 'd' dev.

[...]
> @@ -1887,16 +1903,22 @@ static struct sk_buff *dscc4_init_dummy_skb(struct 
> dscc4_dev_priv *dpriv)
>  
>   skb = dev_alloc_skb(DUMMY_SKB_SIZE);
>   if (skb) {
> + struct pci_dev *pdev = dpriv->pci_priv->pdev;
>   int last = dpriv->tx_dirty%TX_RING_SIZE;
>   struct TxFD *tx_fd = dpriv->tx_fd + last;
> + dma_addr_t addr;
>  
>   skb->len = DUMMY_SKB_SIZE;
>   skb_copy_to_linear_data(skb, version,
>   strlen(version) % DUMMY_SKB_SIZE);
>   tx_fd->state = FrameEnd | TO_STATE_TX(DUMMY_SKB_SIZE);
> - tx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -  skb->data, DUMMY_SKB_SIZE,
> -  PCI_DMA_TODEVICE));
> + addr = pci_map_single(pdev, skb->data, DUMMY_SKB_SIZE,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + return NULL;
> + }
> + tx_fd->data = cpu_to_le32(addr);
>   dpriv->tx_skbuff[last] = skb;
>   }
>   return skb;

It isn't technically wrong but please don't update tx_fd before the mapping
succeeds. It will look marginally better.

Thanks.

-- 
Ueimor

Re: [PATCH] wan: dscc4: add checks for dma mapping errors

2017-08-07 Thread Francois Romieu

Alexey Khoroshilov  :
> The driver does not check if mapping dma memory succeed.
> The patch adds the checks and failure handling.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Please amend your subject line as:

Subject: [PATCH net-next v2 1/1] dscc4: add checks for dma mapping errors.

Rationale: davem is not supposed to guess the branch the patch should be
applied to.

[...]
> diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
> index 799830ffcae2..1a94f0a95b2c 100644
> --- a/drivers/net/wan/dscc4.c
> +++ b/drivers/net/wan/dscc4.c
> @@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv 
> *dpriv,
>   struct RxFD *rx_fd = dpriv->rx_fd + dirty;
>   const int len = RX_MAX(HDLC_MAX_MRU);
>   struct sk_buff *skb;
> - int ret = 0;
> + dma_addr_t addr;
>  
>   skb = dev_alloc_skb(len);
>   dpriv->rx_skbuff[dirty] = skb;
> - if (skb) {
> - skb->protocol = hdlc_type_trans(skb, dev);
> - rx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
> -   skb->data, len, PCI_DMA_FROMDEVICE));
> - } else {
> - rx_fd->data = 0;
> - ret = -1;
> - }
> - return ret;
> + if (!skb)
> + goto err_out;
> +
> + skb->protocol = hdlc_type_trans(skb, dev);
> + addr = pci_map_single(dpriv->pci_priv->pdev,
> +   skb->data, len, PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr))
> + goto err_free_skb;

Nit: please use a local 'struct pci_dev *pdev = dpriv->pci_priv->pdev;'

[...]
> @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff 
> *skb,
>   struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
>   struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
>   struct TxFD *tx_fd;
> + dma_addr_t addr;
>   int next;
>  
> + addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
> +   PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(ppriv->pdev, addr)) {
> + dev_kfree_skb_any(skb);
> + dev->stats.tx_errors++;

It should read 'dev->stats.tx_dropped++'.

-- 
Ueimor

Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-25 Thread Francois Romieu

Aviad Krawczyk  :
[...]
> module_pci_driver - is not used in other drivers in the same segments, it
> is necessary ?

/me checks... Ok, there seems to be some overenthusiastic copy'paste.

See drivers/net/ethernet/intel/ixgb/ixgb_main.c:
[...]
/**
 * ixgb_init_module - Driver Registration Routine
 *
 * ixgb_init_module is the first routine called when the driver is
 * loaded. All it does is register with the PCI subsystem.
 **/

static int __init
ixgb_init_module(void)
{
pr_info("%s - version %s\n", ixgb_driver_string, ixgb_driver_version);
pr_info("%s\n", ixgb_copyright);

return pci_register_driver(_driver);
}

module_init(ixgb_init_module);

/**
 * ixgb_exit_module - Driver Exit Cleanup Routine
 *
 * ixgb_exit_module is called just before the driver is removed
 * from memory.
 **/

static void __exit
ixgb_exit_module(void)
{
pci_unregister_driver(_driver);
}

module_exit(ixgb_exit_module);

Driver version ought to be fed through ethtool, if ever. Copyright message
mildly contributes to a better world. So the whole stuff above could be:

module_pci_driver(ixgb_driver);

-- 
Ueimor

Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-24 Thread Francois Romieu

Aviad Krawczyk  :
[...]
> hinic_remove - If insmod failed and someone calls rmmod, we will get a
> crash because the resource are already free. Therefore we test if the
> device exists, please tell me if you meant to something different

The module won't even proceed through the pci_driver remove method if
the probe method failed. See drivers/pci/bus.c::pci_bus_add_device and
track 'dev->is_added'. You don't need to believe me: try it.

I have no idea where your crash comes from but something does not
look quite right.

(use module_pci_driver() to save some boilerplate code btw)

[...]
> The priv data is in type void * because the
> caller can use any struct that it wants, like the priv data in Linux
> (netdev, irq, tasklet, work..) -

I disagree. A driver is a piece of glue between hardware and software.
It fills a kernel's contract. It is not supposed to introduce opaque
data (even if it's hard to resist).

> we can change it but if we will pass different struct
> in the future, we will have to change the prototype of the functions.

It's fine. If I do something wrong - and at some point I will - I'd
rather have it detected at compile time. Nobody wants to waste precious
hardware lab testing time because of excess void *.

> According to the other void *:
> The wq struct is used for cmdq, sq and rq. Therefore the wqe is in type
> void *. There are 4 operations get_wqe, write_wqe, read_wqe and put_wqe - 
> there
> is no option that one function will be fed with a wrong pointer because the 
> caller
> should use what it got in get_wqe function.
> 
> When something is used as multiple types, it can be used as void * or union.
> Usually, I would prefer union. But, in this case if we will use union, maybe
> there is a chance of using the wrong wqe type in the wrong work queue type.

union * will at least catch being fed a wrong type. void * won't notice.

Let's take a practical example: hinic_sq_get_sges.

void hinic_sq_get_sges(void *wqe, struct hinic_sge *sges, int nr_sges)
   ^
{
struct hinic_sq_wqe *sq_wqe = (struct hinic_sq_wqe *)wqe;


static void free_all_tx_skbs(struct hinic_txq *txq)
{
struct hinic_dev *nic_dev = netdev_priv(txq->netdev);
struct hinic_sq *sq = txq->sq;
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


static int free_tx_poll(struct napi_struct *napi, int budget)
{
[...]
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


Why is it:

void hinic_sq_get_sges(void *wqe, ...

instead of:

void hinic_sq_get_sges(struct hinic_sq_wqe *wqe, ...

Because of a future change ?

-- 
Ueimor

Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-19 Thread Francois Romieu

Aviad Krawczyk  :
[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> new file mode 100644
> index 000..fbc9de4
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
[...]
> +/**
> + * hinic_init_hwdev - Initialize the NIC HW
> + * @hwdev: the NIC HW device that is returned from the initialization
> + * @pdev: the NIC pci device
> + *
> + * Return 0 - Success, negative - Failure
> + *
> + * Initialize the NIC HW device and return a pointer to it in the first arg
> + **/

Return a pointer and use ERR_PTR / IS_ERR ?

> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev)
> +{
> + struct hinic_pfhwdev *pfhwdev;
> + struct hinic_hwif *hwif;
> + int err;
> +
> + hwif = devm_kzalloc(>dev, sizeof(*hwif), GFP_KERNEL);
> + if (!hwif)
> + return -ENOMEM;
> +
> + err = hinic_init_hwif(hwif, pdev);
> + if (err) {
> + dev_err(>dev, "Failed to init HW interface\n");
> + return err;
> + }
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "Unsupported PCI Function type\n");
> + err = -EFAULT;
> + goto func_type_err;
> + }
> +
> + pfhwdev = devm_kzalloc(>dev, sizeof(*pfhwdev), GFP_KERNEL);
> + if (!pfhwdev) {
> + err = -ENOMEM;
> + goto pfhwdev_alloc_err;

Intel, Mellanox, Broadcom, Amazon and friends use "err_xyz" labels.

Please consider using the same style.

[...]
> +void hinic_free_hwdev(struct hinic_hwdev *hwdev)
> +{
> + struct hinic_hwif *hwif = hwdev->hwif;
> + struct pci_dev *pdev = hwif->pdev;
> + struct hinic_pfhwdev *pfhwdev;
> +
> + if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
> + dev_err(>dev, "unsupported PCI Function type\n");
> + return;
> + }

If it succeeded in hinic_init_hwdev, how could it fail here ?

If it failed in hinic_init_hwdev, hinic_free_hwdev should not even
be called.

-> remove ?

[...]
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_main.c
> new file mode 100644
> index 000..c61c769
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
[...]
> +static void hinic_remove(struct pci_dev *pdev)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct hinic_dev *nic_dev;
> +
> + if (!netdev)
> + return;

Your driver is flawed if this test can ever succeed.

[...]
> +static int __init hinic_init(void)
> +{
> + return pci_register_driver(_driver);
> +}
> +
> +static void __exit hinic_exit(void)
> +{
> + pci_unregister_driver(_driver);
> +}

Use module_pci_driver(hinic_driver).

Remove hinic_init() and hinic_exit().

> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h 
> b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
> new file mode 100644
> index 000..1d92617
> --- /dev/null
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_pci_id_tbl.h
[...]
> +#ifndef HINIC_PCI_ID_TBL_H
> +#define HINIC_PCI_ID_TBL_H
> +
> +#ifndef PCI_VENDOR_ID_HUAWEI
> +#define PCI_VENDOR_ID_HUAWEI0x19e5
> +#endif

Useless: it duplicates include/linux/pci_ids.h

> +
> +#ifndef PCI_DEVICE_ID_HI1822_PF
> +#define PCI_DEVICE_ID_HI1822_PF 0x1822
> +#endif

Please move it to the .c file where it is actually used.


Extra:

grep -E 'void\ \*' drivers/net/ethernet/huawei/hinic/* makes me nervous.

At some point one function will be fed with a wrong pointer and the
compiler won't notice it.

For instance hinic_sq_read_wqe is only called with  There's no
reason to declare it using a 'void **' argument.

-- 
Ueimor

Re: [PATCH net] sky2: Do not deadlock on sky2_hw_down

2017-05-25 Thread Francois Romieu

David Miller  :
> From: Joshua Emele 
> Date: Wed, 24 May 2017 15:43:18 -0700
[...]
> > The sky2_hw_down uses sky2_tx_complete to free pending frames stuck in
> > the HW queue. Because sky2_hw_down can be called from a process context,
> > the call to u64_stats_update_begin can result in deadlock.
> > 
> > Because the statistics do not require update as part of the sky2_hw_down
> > sequence, prevent the update to avoid the deadlock.
> 
> I disagree.  Taking the interface down should cause events in the
> statistics to be lost.
>
> You're going to have to find a way to fix this without eliding
> the stats increments.

NAPI processing is already disabled at this stage in the device close()
path (and sky2_netpoll() uses napi_schedule).

It's possible to add a conditionnal bh or irq disabling instruction to
silent the warning but it should not be needed at all.

-- 
Ueimor

Re: [PATCH v2] net: natsemi: ns83820: add checks for dma mapping error

2017-04-22 Thread Francois Romieu

Alexey Khoroshilov  :
[...]
> diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
> b/drivers/net/ethernet/natsemi/ns83820.c
> index 729095db3e08..dfc64e1e31f9 100644
> --- a/drivers/net/ethernet/natsemi/ns83820.c
> +++ b/drivers/net/ethernet/natsemi/ns83820.c
[...]
> @@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
> sk_buff *skb,
>   netif_start_queue(ndev);
>  
>   return NETDEV_TX_OK;
> +
> +dma_error:
> + do {
> + free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
> + desc = dev->tx_descs + (free_idx * DESC_SIZE);
> + cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
> + len = cmdsts & CMDSTS_LEN_MASK;
> + buf = desc_addr_get(desc + DESC_BUFPTR);
> + if (desc == first_desc)
> + pci_unmap_single(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);
> + else
> + pci_unmap_page(dev->pci_dev,
> + buf,
> + len,
> + PCI_DMA_TODEVICE);

(use tabs + spaces to indent: code should line up right after the parenthesis)

(premature line breaks imho)

(nevermind, both can be avoided :o) )

> + desc[DESC_CMDSTS] = cpu_to_le32(0);
> + mb();
> + } while (desc != first_desc);
> +
> +dma_error_first:
> + dev_kfree_skb_any(skb);
> + ndev->stats.tx_errors++;
^ -> should be tx_dropped
> + return NETDEV_TX_OK;
>  }

You only need a single test in the error loop if you mimic the map loop.
Something like:

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095d..5e2dbc9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -1160,9 +1160,11 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
 
buf = skb_frag_dma_map(>pci_dev->dev, frag, 0,
   skb_frag_size(frag), DMA_TO_DEVICE);
+   if (dma_mapping_error(>pci_dev->dev, buf))
+   goto err_unmap_frags;
dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
(long long)buf, (long) page_to_pfn(frag->page),
frag->page_offset);
len = skb_frag_size(frag);
frag++;
nr_frags--;
@@ -1181,8 +1184,27 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
/* Check again: we may have raced with a tx done irq */
if (stopped && (dev->tx_done_idx != tx_done_idx) && start_tx_okay(dev))
netif_start_queue(ndev);
-
+out:
return NETDEV_TX_OK;
+
+err_unmap_frags:
+   while (1) {
+   buf = desc_addr_get(desc + DESC_BUFPTR);
+   if (!--nr_frags)
+   break;
+
+   pci_unmap_page(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+   free_idx = (free_idx - 1) % NR_TX_DESC;
+   desc = dev->tx_descs + (free_idx * DESC_SIZE);
+   len = le32_to_cpu(desc + DESC_CMDSTS) & CMDSTS_LEN_MASK;
+   }
+   pci_unmap_single(dev->pci_dev, buf, len, PCI_DMA_TODEVICE);
+
+err_free_skb:
+   dev_kfree_skb_any(skb);
+   ndev->stats.tx_dropped++;
+   goto out;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)


Thinking more about it, the driver seems rather unsafe if a failing
start_xmit closely follows a succeeding one. The driver should imho
map frags first *then* plug the remaining hole in the descriptor ring.
Until it does, the implicit assumption about descriptor ownership that
the error unroll loop relies on may be wrong.

-- 
Ueimor

Re: r8169: Long link becomes ready times

2017-04-22 Thread Francois Romieu

Paul Menzel  :
[...]
> The ASRock E350M1 has a Realtek ethernet controller.
> 
> It takes almost three seconds for the link to become ready. This is
> noticeable after resume from suspend, where the user wants to continue
> working but first has to wait for the network.
> 
> This test is done with Linux 4.10.
[...]
> The test below is done, removing the module, and then inserting it.
> 
> ```
> Apr 22 10:56:11.919311 myasrocke350m1 kernel: r8169 :03:00.0 eth0: 
> RTL8168e/8111e at 0xf82ad000, bc:5f:f4:c8:d3:98, XID 0c20 IRQ 26
> Apr 22 10:56:11.920631 myasrocke350m1 kernel: r8169 :03:00.0 eth0: jumbo 
> features [frames: 9200 bytes, tx checksumming: ko]
> Apr 22 10:56:11.967396 myasrocke350m1 kernel: r8169 :03:00.0 eth6: 
> renamed from eth0
> Apr 22 10:56:12.064323 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): 
> eth6: link is not ready
> Apr 22 10:56:12.179106 myasrocke350m1 kernel: r8169 :03:00.0: firmware: 
> direct-loading firmware rtl_nic/rtl8168e-2.fw
> Apr 22 10:56:12.247858 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link 
> down
> Apr 22 10:56:12.248593 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): 
> eth6: link is not ready
> Apr 22 10:56:14.992108 myasrocke350m1 kernel: r8169 :03:00.0 eth6: link up
> Apr 22 10:56:14.993299 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
> eth6: link becomes ready
> ```
> 
> Is it possible to get this well below one second?

Gross as it is, the link detection is already irq driven. Most currently
used delays - see grep -E '(sleep|delay)' drivers/.../r8169.c - are
supposed to be busy waiting loops with a moderate (well...) delay per
loop. The iteration bound is not expected to make a difference.

So, unless there is a big crawling sleep/delay hidden somewhere, this part
of the r8169 driver should not induce huge delays.

Realtek does not communicate hardware documentation, neither programming
specification nor known bugs. Some phy related pieces of material may be
found on their site but you would have to experiment a lot to check if
things can behave differently.

I also experience ~3s link down / link up transition with a 8168c when
connected to a 3Com 4200G. Same 2~3s figures with an intel 82578dc.
It looks similar with a 82574l.

I've never aimed at well below one second (500 ms ?) reliable autoneg.
The phy man may have a different vision.

-- 
Ueimor

Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

2017-04-08 Thread Francois Romieu

David Miller  :
[...]
> One theory is that the interrupt masking isn't working properly
> and interrupts are still arriving and hitting the NAPI state even
> when we are actively polling NAPI.
> 
> And this problem was masked by the locking done here.

Yes.

Ville, can you rule out irq sharing between the 8139 and some other
device ? It's a candidate for unexpected interrupt handler invocation
with older pc, even with properly working hardware.

-- 
Ueimor

Re: [PATCH v2 net] net: solve a NAPI race

2017-02-28 Thread Francois Romieu

David Miller  :
> From: Eric Dumazet 
> Date: Mon, 27 Feb 2017 08:44:14 -0800
> 
> > Any point doing a napi_schedule() not from device hard irq handler
> > is subject to the race for NIC using some kind of edge trigger
> > interrupts.
> > 
> > Since we do not provide a ndo to disable device interrupts, the
> > following can happen.
> 
> Ok, now I understand.
> 
> I think even without considering the race you are trying to solve,
> this situation is really dangerous.
> 
> I am sure that every ->poll() handler out there was written by an
> author who completely assumed that if they are executing then the
> device's interrupts for that NAPI instance are disabled.  And this is
> with very few, if any, exceptions.

Shareable pci irq used to remind author that such an assumption was
not always right. Otoh it was still manageable as long as level only
triggered irq were involved.

-- 
Ueimor

Re: [Patch net] atm: remove an unnecessary loop

2017-01-13 Thread Francois Romieu

Francois Romieu <rom...@fr.zoreil.com> :
[...]

Now with a proper error code. Have a nice night.

diff --git a/net/atm/common.c b/net/atm/common.c
index a3ca922..e20d040 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -62,21 +62,16 @@ static void vcc_remove_socket(struct sock *sk)
write_unlock_irq(_sklist_lock);
 }
 
-static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
+static bool vcc_tx_ready(struct atm_vcc *vcc, unsigned int size)
 {
-   struct sk_buff *skb;
struct sock *sk = sk_atm(vcc);
 
if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {
pr_debug("Sorry: wmem_alloc = %d, size = %d, sndbuf = %d\n",
 sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
-   return NULL;
+   return false;
}
-   while (!(skb = alloc_skb(size, GFP_KERNEL)))
-   schedule();
-   pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
-   atomic_add(skb->truesize, >sk_wmem_alloc);
-   return skb;
+   return true;
 }
 
 static void vcc_sock_destruct(struct sock *sk)
@@ -606,7 +601,7 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
eff = (size+3) & ~3; /* align to word boundary */
prepare_to_wait(sk_sleep(sk), , TASK_INTERRUPTIBLE);
error = 0;
-   while (!(skb = alloc_tx(vcc, eff))) {
+   while (!vcc_tx_ready(vcc, eff)) {
if (m->msg_flags & MSG_DONTWAIT) {
error = -EAGAIN;
break;
@@ -628,6 +623,15 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
finish_wait(sk_sleep(sk), );
if (error)
goto out;
+
+   skb = alloc_skb(eff, GFP_KERNEL);
+   if (!skb) {
+   error = -ENOMEM;
+   goto out;
+   }
+   pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
+   atomic_add(skb->truesize, >sk_wmem_alloc);
+
skb->dev = NULL; /* for paths shared with net_device interfaces */
ATM_SKB(skb)->atm_options = vcc->atm_options;
if (!copy_from_iter_full(skb_put(skb, size), size, >msg_iter)) {

Re: [Patch net] atm: remove an unnecessary loop

2017-01-13 Thread Francois Romieu

Cong Wang <xiyou.wangc...@gmail.com> :
> On Fri, Jan 13, 2017 at 5:23 AM, Francois Romieu <rom...@fr.zoreil.com> wrote:
[...]
> > alloc_skb() does not need to be in the "while" loop.
> 
> This is exactly what I describe in my changelog, don't know
> why you want to repeat it...

Because it is still hidden in a while loop. 

You turned the alloc from a two level deep "while" loop to a one level
one. I want it at zero level. alloc_skb(..., GFP_KERNEL) fails ?
So let it be done (see patch in other message).

[...]
> Please don't expect me to fix many things in one patch, let's
> fix each of them separately, agreed?

I am not convinced that several patches are needed to get the whole
picture right.

-- 
Ueimor

Re: [Patch net] atm: remove an unnecessary loop

2017-01-13 Thread Francois Romieu

Cong Wang  :
[...]
> If you can justify API is not broken by doing that, I am more than happy
> to do it, as I already stated in the latter patch:
> 
> "Of course, the logic itself is suspicious, other sendmsg()
> could handle skb allocation failure very well, not sure
> why ATM has to wait for a successful one here. But probably
> it is too late to change since the errno and behavior is
> visible to user-space. So just leave the logic as it is."
> 
> For some reason, no one reads that patch. :-/

Believe it or not but I actually read it.

It changes the logic : the original code would have been unable to
escape the while loop on memory failure. Fine, I don't mind the change.
Actually I believe that these two patches are too shy (and backport
unefficient). Instead of trying to reformulate why, here's what I have
in mind. Uncompiled, caveat emptor, etc.

I'll do a (slow) build and test on saturday's night with a pair of
iphase 5575.

diff --git a/net/atm/common.c b/net/atm/common.c
index a3ca922..67f76f3 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -62,21 +62,16 @@ static void vcc_remove_socket(struct sock *sk)
write_unlock_irq(_sklist_lock);
 }
 
-static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
+static bool vcc_tx_ready(struct atm_vcc *vcc, unsigned int size)
 {
-   struct sk_buff *skb;
struct sock *sk = sk_atm(vcc);
 
if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {
pr_debug("Sorry: wmem_alloc = %d, size = %d, sndbuf = %d\n",
 sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
-   return NULL;
+   return false;
}
-   while (!(skb = alloc_skb(size, GFP_KERNEL)))
-   schedule();
-   pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
-   atomic_add(skb->truesize, >sk_wmem_alloc);
-   return skb;
+   return true;
 }
 
 static void vcc_sock_destruct(struct sock *sk)
@@ -606,7 +601,7 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
eff = (size+3) & ~3; /* align to word boundary */
prepare_to_wait(sk_sleep(sk), , TASK_INTERRUPTIBLE);
error = 0;
-   while (!(skb = alloc_tx(vcc, eff))) {
+   while (!vcc_tx_ready(vcc, eff)) {
if (m->msg_flags & MSG_DONTWAIT) {
error = -EAGAIN;
break;
@@ -628,6 +623,13 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
finish_wait(sk_sleep(sk), );
if (error)
goto out;
+
+   skb = alloc_skb(eff, GFP_KERNEL);
+   if (!skb)
+   goto out;
+   pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
+   atomic_add(skb->truesize, >sk_wmem_alloc);
+
skb->dev = NULL; /* for paths shared with net_device interfaces */
ATM_SKB(skb)->atm_options = vcc->atm_options;
if (!copy_from_iter_full(skb_put(skb, size), size, >msg_iter)) {

-- 
Ueimor

Re: [Patch net] atm: remove an unnecessary loop

2017-01-13 Thread Francois Romieu

Cong Wang  :
[...]
> alloc_skb(GFP_KERNEL) itself is sleeping, so the new wait api is still
> needed.

The task state change warning is the symptom.

The deeply nested alloc_skb is the problem.

Diagnosis: nesting is wrong. It makes zero sense. Fix it and the
implicit task state change problem automagically goes away.

alloc_skb() does not need to be in the "while" loop.

alloc_skb() does not need to be in the {prepare_to_wait/add_wait_queue ...
finish_wait/remove_wait_queue} block.

alloc_tx() is not correctly named: given its original content, it deserves
to be called something like:

"wait_for_decent_tx_drain_and_alloc_by_hand_coz_i_dont_trust_the_mm_subsystem_and_i_dont_know_what_i_want"

I claim that:
- alloc_tx() should only perform the "wait_for_decent_tx_drain" part
- alloc_skb() ought to be done directly in vcc_sendmsg
- alloc_skb() failure can be handled gracefully in vcc_sendmsg
- alloc_skb() may use a (m->msg_flags & MSG_DONTWAIT) dependant
  GFP_{KERNEL / ATOMIC} flag
- most of it can be done in a longterm maintenance pain minimizing
  way. Call it a side-effect: I don't claim that it *must* be done
  this way.

-- 
Ueimor

Re: [Patch net] atm: remove an unnecessary loop

2017-01-12 Thread Francois Romieu

Cong Wang  :
[...]
> diff --git a/net/atm/common.c b/net/atm/common.c
> index a3ca922..7ec3bbc 100644
> --- a/net/atm/common.c
> +++ b/net/atm/common.c
> @@ -72,10 +72,11 @@ static struct sk_buff *alloc_tx(struct atm_vcc *vcc, 
> unsigned int size)
>sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
>   return NULL;
>   }
> - while (!(skb = alloc_skb(size, GFP_KERNEL)))
> - schedule();
> - pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
> - atomic_add(skb->truesize, >sk_wmem_alloc);
> + skb = alloc_skb(size, GFP_KERNEL);
> + if (skb) {
> + pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
> + atomic_add(skb->truesize, >sk_wmem_alloc);
> + }
>   return skb;
>  }

Were alloc_skb moved one level up in the call stack, there would be
no need to use the new wait api in the subsequent page, thus easing
pre 3.19 longterm kernel maintenance (at least those on korg page).

But it tastes a tad bit too masochistic.

-- 
Ueimor

Re: net/atm: warning in alloc_tx/__might_sleep

2017-01-10 Thread Francois Romieu

Eric Dumazet  :
> On Tue, Jan 10, 2017 at 9:35 AM, Cong Wang  wrote:
> > On Mon, Jan 9, 2017 at 9:20 AM, Andrey Konovalov  
> > wrote:
> >
> > The fix should be straight-forward. Mind to try the attached patch?
> 
> 
> You forgot to remove schedule() ?

It may be clearer to split alloc_tx in two parts: only the unsleepable
"if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {" part of it
contributes to the inner "while (!(skb = alloc_tx(vcc, eff))) {" block.

See net/atm/common.c
[...]
static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
{
struct sk_buff *skb;
struct sock *sk = sk_atm(vcc);

if (sk_wmem_alloc_get(sk) && !atm_may_send(vcc, size)) {
pr_debug("Sorry: wmem_alloc = %d, size = %d, sndbuf = %d\n",
 sk_wmem_alloc_get(sk), size, sk->sk_sndbuf);
return NULL;
}
while (!(skb = alloc_skb(size, GFP_KERNEL)))
schedule();
pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
atomic_add(skb->truesize, >sk_wmem_alloc);
return skb;
}

The waiting stuff is related to vcc drain but the code makes it look as
if it were also related to skb alloc (it isn't).

It may be obvious for you but it took me a while to figure what the
code is supposed to achieve.

-- 
Ueimor

Re: driver r8169 suddenly failed

2016-12-27 Thread Francois Romieu

Robert Grasso  :
[...]
> So, what is your opinion :
> - should I broaden my request for help to other teams than yours (kernel
> maintainers) ?

If I had to untangle this mess, I would check that my router is not
configured with an empty dhcp range. Then I would put each and every
interface facing it in promiscuous (tcpdump) capture mode until one
of those is able to negotiate a dhcp lease. I would thereafter replace
it with the r8169 interface and compare the traffic (+ ethtool byte/packet
counters).

-- 
Ueimor

Re: driver r8169 suddenly failed

2016-12-26 Thread Francois Romieu

Robert Grasso  :
[dhcp snafu]
> First of all, can you confirm that I am doing right in posting to you
> (addresses found in README.Debian) ?

It isn't completely wrong.

> If I do, can you help ? I am not very proficient with Ethernet, and I am not
> able to figure out what my provider changed : their hotline is
> underqualified, they are just able to tell that "the signal on the line is
> ok".

You're spoiled. It is more than decent for a company whose core business
used to sell cable TV.

> But if you want me to run various tests, try new versions, I would be
> glad to do so : I would appreciate if I could salvage this Shuttle.

Please try a recent stable vanilla kernel and send a complete dmesg
from boot. I need to identify the specific 816x chipset.

Then record some traffic:

# touch gonzo.pcap && tshark -w gonzo.pcap -i eth1

It should only exhibit small outgoing packets but, well, one never knows.

Check the leds activity to be sure that the network interfaces have not
been renumbered.

Use ethtool to check that tso is disabled. If it isn't, disable it.

-- 
Ueimor

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-19 Thread Francois Romieu

Pavel Machek  :
[...]
> Considering the memory barriers... is something like this neccessary
> in the via-rhine ?

Yes.

> AFAICT... we need a barrier after making sure that descriptor is no
> longer owned by DMA (to make sure we don't get stale data in rest of
> descriptor)... and we need a barrier before giving the descriptor to
> the dma, to make sure DMA engine sees the complete update?

I would not expect stale data while processing a single transmit
descriptor as the transmit completion does not use the rest of the
descriptor at all in the via-rhine driver. However I agree that transmit
descriptors should be read by the cpu with adequate ordering so the
dma_rmb() should stay.

Same kind of narrative for dma_wmb rhine_rx (s/read/written/ and
s/cpu/device/).

> diff --git a/drivers/net/ethernet/via/via-rhine.c 
> b/drivers/net/ethernet/via/via-rhine.c
> index ba5c542..3806e72 100644
> --- a/drivers/net/ethernet/via/via-rhine.c
> +++ b/drivers/net/ethernet/via/via-rhine.c
[...]
> @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit)
>  
>   if (desc_status & DescOwn)
>   break;
> + dma_rmb();
>  

I agree with your explanation for this one (late vlan processing in a 
different word from the same descriptor).

-- 
Ueimor

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-17 Thread Francois Romieu

Pavel Machek  :
[...]
> Won't this up/down the interface, in a way userspace can observe?

It won't up/down the interface as it doesn't exactly mimic what the
network code does (there's more than rtnl_lock).

For the same reason it's broken if it races with the transmit path: it
can release driver resources while the transmit path uses these.

Btw the points below may not matter/hurt much for a proof a concept
but they would need to be addressed as well:
1) unchecked (and avoidable) extra error paths due to stmmac_release()
2) racy cancel_work_sync. Low probability as it is, an irq + error could
   take place right after cancel_work_sync

Lino, have you considered via-rhine.c since its "move work from irq to
workqueue context" changes that started in
7ab87ff4c770eed71e3777936299292739fcd0fe [*] ?

It's a shameless plug - originated in r8169.c - but it should be rather
close to what the sxgbe and friends require. Thought / opinion ?

[*] Including fixes/changes in:
- 3a5a883a8a663b930908cae4abe5ec913b9b2fd2
- e1efa87241272104d6a12c8b9fcdc4f62634d447
- 810f19bcb862f8889b27e0c9d9eceac9593925dd
- e45af497950a89459a0c4b13ffd91e1729fffef4
- a926592f5e4e900f3fa903298c4619a131e60963
- 559bcac35facfed49ab4f408e162971612dcfdf3

-- 
Ueimor

Re: [PATCH] net:ethernet:samsung:initialize cur_rx_qnum

2016-12-09 Thread Francois Romieu

Rayagond Kokatanur  :
> This patch initialize the cur_rx_qnum upon occurence of rx interrupt,
> without this initialization driver will not work with multiple rx queues
> configurations.
> 
> NOTE: This patch is not tested on actual hw.

(your patch should include a Signed-off-by)

Imho the driver needs more changes to support multiple rx queues.

- rx interrupt for queue A -> priv->cur_rx_qnum = A
- rx interrupt for queue B -> priv->cur_rx_qnum = B
- rx napi processing   -> Err...

Please start turning priv->cur_rx_qnum into a SXGBE_RX_QUEUES sized bitmap.

-- 
Ueimor

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-08 Thread Francois Romieu

Lino Sanfilippo  :
[...]
> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
> this is caused by that locking scheme (in a way I have not figured out yet)
> or if it is a different issue.

stmmac_tx_err races with stmmac_xmit.

-- 
Ueimor

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-07 Thread Francois Romieu

Lino Sanfilippo  :
> The driver uses a private lock for synchronization between the xmit
> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> is not set, the xmit function is also called with the xmit_lock held.
> 
> On the other hand the xmit completion handler first takes the private lock
> and (in case that the tx queue has been stopped) the xmit_lock, leading
> to a reverse locking order and the potential danger of a deadlock.

netif_tx_stop_queue is used by:
1. xmit function before releasing lock and returning.
2. sxgbe_restart_tx_queue()
   <- sxgbe_tx_interrupt
   <- sxgbe_reset_all_tx_queues()
  <- sxgbe_tx_timeout()

Given xmit won't be called again until tx queue is enabled, it's not clear
how a deadlock could happen due to #1.

Regardless of deadlocks anywhere else, #2 has some serious problem due to
the lack of exclusion between the tx queue restart handler and the xmit
handler.

-- 
Ueimor

Re: [PATCH] net: pch_gbe: Fix TX RX descriptor accesses for big endian systems

2016-12-07 Thread Francois Romieu

Hassan Naveed  :
> Fix pch_gbe driver for ethernet operations for a big endian CPU.
> Values written to and read from transmit and receive descriptors
> in the pch_gbe driver are byte swapped from the perspective of a
> big endian CPU, since the ethernet controller always operates in
> little endian mode. Rectify this by appropriately byte swapping
> these descriptor field values in the driver software.

You should also use __le{16/32} types in struct pch_gbe_{rx/tx}_desc.

-- 
Ueimor

Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Francois Romieu

Mark Lord  :
[...]
> >From tracing through the powerpc arch code, this is the buffer that
> is being directly DMA'd into.  And the USB layer does an invalidate_dcache
> on that entire buffer before initiating the DMA (confirmed via printk).
> 
> The driver itself NEVER writes anything to that buffer,
> and nobody else has a pointer to it other than the USB host controller,
> so there's nothing else that can write to it either.
> 
> According to the driver writer, the chip should only ever write a fresh
> rx_desc struct at the beginning of a buffer, never ASCII data.
> 
> So how does that buffer end up containing ASCII data from the NFS transfers?

Through aliasing the URB was given a page that contains said (previously)
received file. The ethernet chip/usb host does not write anything in it.
There could be a device or a driver problem but it may not be the real
problem.

So far the analysis focused on "how was this corrupted content written into
this receive buffer page ?". If I read David correctly (?) the "nobody
else has a pointer to it other than the USB host controller" point may be
replaced with "the pointer to it aliases some already used page".

-- 
Ueimor

Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-14 Thread Francois Romieu

Hayes Wang <hayesw...@realtek.com> :
> Francois Romieu [mailto:rom...@fr.zoreil.com]
> > Sent: Friday, November 11, 2016 8:13 PM
> [...]
> > Invalid packet size corrupted receive descriptors in Realtek's device
> > reminds of CVE-2009-4537.
> 
> Do you mean that the driver would get a packet exceed the size
> which is set to RxMaxSize ?

If it was possible to get it wrong once, it should be possible to
get it wrong twice, especially if some part of the hardware design
is recycled. I don't mean anything else.

I won't speculate about some cache consistency issue or some badly
aborted dma transaction to explain the memory corruption.

-- 
Ueimor

Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-11 Thread Francois Romieu

Hayes Wang  :
> For some platforms, the data in memory is not the same with the one
> from the device. That is, the data of memory is unbelievable. The
> check is used to find out this situation.

Invalid packet size corrupted receive descriptors in Realtek's device
reminds of CVE-2009-4537.

Is the silicium of both devices different enough to prevent the same
exploit to happen ?

-- 
Ueimor

Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask

2016-10-14 Thread Francois Romieu

Ard Biesheuvel <ard.biesheu...@linaro.org> :
> PCI devices that are 64-bit DMA capable should set the coherent
> DMA mask as well as the streaming DMA mask. On some architectures,
> these are managed separately, and so the coherent DMA mask will be
> left at its default value of 32 if it is not set explicitly. This
> results in errors such as
> 
>  r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>  hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000
>  swiotlb: coherent allocation failed for device :02:00.0 size=4096
>  CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
>  Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
> 
> on systems without memory that is 32-bit addressable by PCI devices.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

Acked-by: Francois Romieu <rom...@fr.zoreil.com>

Unless someone plans to plug an acenic, a 83820 (pci-e gem board, anyone ?)
on top of a pci <-> pci-e adapter on this kind of motherboard, no other
network driver that uses the pci_... dma api exhibits this mixed 32 / 64 bit
support bug. I haven't checked devices with 32 < mask < 64 nor plain DMA api
converted ones.

-- 
Ueimor

Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Francois Romieu

Rustad, Mark D  :
> Alexei Starovoitov  wrote:
[...]
> > the point that it's only used virtualized, since PCI (not PCIE) have
> > been long dead.
> 
> My point is precisely the opposite. It is a real device, it exists in real
> systems and it is used in those systems. I worked on embedded systems that
> ran Linux and used e1000 devices. I am sure they are still out there because
> customers are still paying for support of those systems.

Old PCI is not the bulk of my professional hardware but it's still used here,
both for networking and video. No embedded systems. Mostly dumb file serving
ranging from quad core to i5 and xeon. Recent kernels. 

> The day is coming when all the motherboards with PCI(-X) will be gone, but I
> think it is still at least a few years off.

Add some time for the PCI <-> PCIe adapters to disappear as well. :o)

-- 
Ueimor

Re: [PATCH] via-velocity: remove null pointer check on array tdinfo->skb_dma

2016-09-08 Thread Francois Romieu

Colin King <colin.k...@canonical.com> :
> From: Colin Ian King <colin.k...@canonical.com>
> 
> tdinfo->skb_dma is a 7 element array of dma_addr_t hence cannot be
> null, so the pull pointer check on tdinfo->skb_dma  is redundant.
> Remove it.
> 
> Signed-off-by: Colin Ian King <colin.k...@canonical.com>

Acked-by: Francois Romieu <rom...@fr.zoreil.com>

-- 
Ueimor

Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-31 Thread Francois Romieu

Salil Kapur  :
> I was looking at v3.12. Can we submit patches for stable versions?

It has already been fixed in 3.12.8 as 56399d8b44beae5b80e1eda0350ab6af72baf4d0
("netpoll: Fix missing TXQ unlock and and OOPS.") by davem.

3.12.8 dates back to 2014/01. The current 3.12.x version is 3.12.62.

-- 
Ueimor

Re: [PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-30 Thread Francois Romieu

Salil Kapur  :
[...]
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index fc75c9e..9124f76 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -386,8 +386,10 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
> sk_buff *skb,
>   
> !vlan_hw_offload_capable(netif_skb_features(skb),
>
> skb->vlan_proto)) {
>   skb = __vlan_put_tag(skb, 
> skb->vlan_proto, vlan_tx_tag_get(skb));
> - if (unlikely(!skb))
> + if (unlikely(!skb)) {
> + __netif_tx_unlock(txq);
>   break;
> + }
>   skb->vlan_tci = 0;
>   }
>  

Your kernel is outdated: __vlan_put_tag has disappeared from net/core/netpoll.c
since 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ("vlan: rename __vlan_put_tag to
vlan_insert_tag_set_proto") by Jiri Pirko somewhere in 2014.

-- 
Ueimor

Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu

Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor

Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Francois Romieu

Hau  :
[...]
> > Either the driver resumes the device so that it can perform requested
> > operation or it signals .set_wol failure when the device is suspended.
> > 
> > If the driver does something else, "spam removal" translates to "silent
> > failure".
> 
> Because "tp->saved_wolopts" will be used to set hardware wol capability in
> rtl8169_runtime_resume().  So I prefer to keep "wol->wolopts" to
> " tp->saved_wolopts " in runtime suspend state and set this to this
> "wol->wolopts" to hardware in in rtl8169_runtime_resume(). 

It would be fine if it could be proven that rtl8169_runtime_resume() will
always be run before software state is lost.

-- 
Ueimor

Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu

Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private 
> *tp)
>  static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
> *wol)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
> + struct pci_dev *pdev = tp->pci_dev;

Nit: you may directly use "struct device *d = >pci_dev->dev;"

-- 
Ueimor

Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu

Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
> struct ethtool_wolinfo *wol)
>   tp->features |= RTL_FEATURE_WOL;
>   else
>   tp->features &= ~RTL_FEATURE_WOL;
> - __rtl8169_set_wol(tp, wol->wolopts);
> + if (pm_runtime_active(>dev))
> + __rtl8169_set_wol(tp, wol->wolopts);
> + else
> + tp->saved_wolopts = wol->wolopts;
>  
>   rtl_unlock_work(tp);
>  
>   device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
>  
> + pm_runtime_put_noidle(>dev);
> +
>   return 0;

Either the driver resumes the device so that it can perform requested
operation or it signals .set_wol failure when the device is suspended.

If the driver does something else, "spam removal" translates to
"silent failure".

-- 
Ueimor

Re: [PATCH net-next 3/9] net: r6040: Utilize skb_put_padto()

2016-07-05 Thread Francois Romieu

Florian Fainelli  :
[...]
> diff --git a/drivers/net/ethernet/rdc/r6040.c 
> b/drivers/net/ethernet/rdc/r6040.c
> index 75776eee36f9..46ed093348da 100644
> --- a/drivers/net/ethernet/rdc/r6040.c
> +++ b/drivers/net/ethernet/rdc/r6040.c
> @@ -815,6 +815,9 @@ static netdev_tx_t r6040_start_xmit(struct sk_buff *skb,
>   void __iomem *ioaddr = lp->base;
>   unsigned long flags;
>  
> + if (skb_put_padto(skb, ETH_ZLEN) < 0)
> + return NETDEV_TX_OK;
> +

Missing dev->stats->tx_dropped++.

-- 
Ueimor

Re: r8169 regression: UDP packets dropped intermittantly

2016-06-22 Thread Francois Romieu

Jonathan Woithe  :
[...]
> to mainline (in which case I'll keep watching out for it)?  Or is the
> out-of-tree workaround mentioned above considered to be the long term
> fix for those who encounter the problem?

It's a workaround. Nothing less, nothing more.

IIRC the ga311 irq signaling was a bit special. I almost surely broke
it at some point.

-- 
Ueimor

Re: r8169 regression: UDP packets dropped intermittantly

2016-06-21 Thread Francois Romieu

Jonathan Woithe  :
[...]
> Is there any chance that this regression can be resolved?  It's been 6
> months since the last contact was received from the list in relation to this
> issue.  If the r8169 driver is to remain broken with respect to UDP traffic
> then we will have no choice but to factor in a change in our standard
> hardware for future systems.  Unfortunately this also means that dozens of
> systems in the field cannot be upgraded to recent kernels since doing so
> will trigger the regression.[1]

If I understood correctly (2015/11/21) you had a working system with a stock
4.2 or 4.3 kernel and the r8169.c from 1e874e041fc7c222cbd85b20c4406070be1f687a
(i.e. da78dbff2e05630921c551dbbc70a4b7981a8fff "r8169: remove work from irq
handler." parent) patched with the snippet below, right ?

If so, while not perfect, it should at least mitigate the "can't upgrade
kernel" part.

--- r8169.c 2015-11-21 23:02:10.435275753 +0100
+++ r8169.c 2015-11-21 23:21:49.429554012 +0100
@@ -29,7 +29,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include 
 
@@ -1616,7 +1615,7 @@ static int rtl8169_set_features(struct n
else
tp->cp_cmd &= ~RxChkSum;
 
-   if (dev->features & NETIF_F_HW_VLAN_RX)
+   if (dev->features & NETIF_F_HW_VLAN_CTAG_RX)
tp->cp_cmd |= RxVlan;
else
tp->cp_cmd &= ~RxVlan;
@@ -1632,8 +1631,8 @@ static int rtl8169_set_features(struct n
 static inline u32 rtl8169_tx_vlan_tag(struct rtl8169_private *tp,
  struct sk_buff *skb)
 {
-   return (vlan_tx_tag_present(skb)) ?
-   TxVlanTag | swab16(vlan_tx_tag_get(skb)) : 0x00;
+   return (skb_vlan_tag_present(skb)) ?
+   TxVlanTag | swab16(skb_vlan_tag_get(skb)) : 0x00;
 }
 
 static void rtl8169_rx_vlan_tag(struct RxDesc *desc, struct sk_buff *skb)
@@ -1641,7 +1640,7 @@ static void rtl8169_rx_vlan_tag(struct R
u32 opts2 = le32_to_cpu(desc->opts2);
 
if (opts2 & RxVlanTag)
-   __vlan_hwaccel_put_tag(skb, swab16(opts2 & 0x));
+   __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), swab16(opts2 & 
0x));
 
desc->opts2 = 0;
 }
@@ -3508,7 +3507,7 @@ static const struct net_device_ops rtl81
 
 };
 
-static void __devinit rtl_init_mdio_ops(struct rtl8169_private *tp)
+static void rtl_init_mdio_ops(struct rtl8169_private *tp)
 {
struct mdio_ops *ops = >mdio_ops;
 
@@ -3725,7 +3724,7 @@ static void rtl_pll_power_up(struct rtl8
rtl_generic_op(tp, tp->pll_power_ops.up);
 }
 
-static void __devinit rtl_init_pll_power_ops(struct rtl8169_private *tp)
+static void rtl_init_pll_power_ops(struct rtl8169_private *tp)
 {
struct pll_power_ops *ops = >pll_power_ops;
 
@@ -3905,7 +3904,7 @@ static void r8168b_1_hw_jumbo_disable(st
RTL_W8(Config4, RTL_R8(Config4) & ~(1 << 0));
 }
 
-static void __devinit rtl_init_jumbo_ops(struct rtl8169_private *tp)
+static void rtl_init_jumbo_ops(struct rtl8169_private *tp)
 {
struct jumbo_ops *ops = >jumbo_ops;
 
@@ -3971,7 +3970,7 @@ static void rtl_hw_reset(struct rtl8169_
}
 }
 
-static int __devinit
+static int
 rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
@@ -4137,7 +4136,7 @@ rtl8169_init_one(struct pci_dev *pdev, c
dev->dev_addr[i] = RTL_R8(MAC0 + i);
memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
-   SET_ETHTOOL_OPS(dev, _ethtool_ops);
+   dev->ethtool_ops = _ethtool_ops;
dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
dev->irq = pdev->irq;
dev->base_addr = (unsigned long) ioaddr;
@@ -4147,16 +4146,16 @@ rtl8169_init_one(struct pci_dev *pdev, c
/* don't enable SG, IP_CSUM and TSO by default - it might not work
 * properly for all devices */
dev->features |= NETIF_F_RXCSUM |
-   NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+   NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
 
dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
-   NETIF_F_RXCSUM | NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+   NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_TX | 
NETIF_F_HW_VLAN_CTAG_RX;
dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
NETIF_F_HIGHDMA;
 
if (tp->mac_version == RTL_GIGA_MAC_VER_05)
/* 8110SCd requires hardware Rx VLAN - disallow toggling */
-   dev->hw_features &= ~NETIF_F_HW_VLAN_RX;
+   dev->hw_features &= ~NETIF_F_HW_VLAN_CTAG_RX;
 
tp->intr_mask = 0x;
tp->hw_start = cfg->hw_start;
@@ -4217,7 +4216,7 @@ err_out_free_dev_1:
goto out;
 }
 
-static void __devexit rtl8169_remove_one(struct pci_dev *pdev)
+static void rtl8169_remove_one(struct pci_dev *pdev)
 {
struct net_device *dev = pci_get_drvdata(pdev);

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-20 Thread Francois Romieu

Netanel Belgazal  :
[...]
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
> > b/drivers/net/ethernet/amazon/ena/ena_com.h
> > new file mode 100644
> > index 000..e49ba43
> > --- /dev/null
> > [...]
> > +static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
> > *intr_reg,
> > +  u32 rx_delay_interval,
> > +  u32 tx_delay_interval,
> > +  bool unmask)
> > +{
> > +   intr_reg->intr_control = 0;
> > +   intr_reg->intr_control |= rx_delay_interval &
> > +   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> > +
> > +   intr_reg->intr_control |=
> > +   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
> > +   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
> >   ^^ TX ?
> >
> > Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
> > queueing is still enabled. Please drop packet and return NETDEV_TX_OK.
> Ack.
> 
> Thanks for your review.

Ack/nack regarding use of ..._RX_INTR_DELAY_MASK with ..._TX_INTR_DELAY_SHIFT ?

-- 
Ueimor

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-17 Thread Francois Romieu

Netanel Belgazal  :
[...]

More stuff below.

> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
> +  struct ena_admin_aq_get_stats_cmd *get_cmd,
> +  struct ena_admin_acq_get_stats_resp *get_resp,

At first sight it should be possible avoid one pointer argument with:

struct ena_something {
struct ena_admin_aq_get_stats_cmd cmd;
struct ena_admin_acq_get_stats_resp resp;

};

[...]
> +int ena_com_get_dev_extended_stats(struct ena_com_dev *ena_dev, char *buff,
> +u32 len)

Where is it used ?

> +{
> + int ret = 0;
> + struct ena_admin_aq_get_stats_cmd get_cmd;
> + struct ena_admin_acq_get_stats_resp get_resp;
> + void *virt_addr;

int rc = -ENOMEM;
char *str;

> + dma_addr_t phys_addr;
> +
> + virt_addr = dma_alloc_coherent(ena_dev->dmadev,
> +len,
> +_addr,
> +GFP_KERNEL | __GFP_ZERO);
> + if (!virt_addr) {
> + ret = -ENOMEM;
> + goto done;
> + }
> + memset(_cmd, 0x0, sizeof(get_cmd));
> + ret = ena_com_mem_addr_set(ena_dev,
> +_cmd.u.control_buffer.address,
> +phys_addr);
> + if (unlikely(ret)) {
> + ena_trc_err("memory address set failed\n");
> + return ret;
> + }
> + get_cmd.u.control_buffer.length = len;
> +
> + get_cmd.device_id = ena_dev->stats_func;
> + get_cmd.queue_idx = ena_dev->stats_queue;
> +
> + ret = ena_get_dev_stats(ena_dev, _cmd, _resp,
> + ENA_ADMIN_GET_STATS_TYPE_EXTENDED);
> + if (ret < 0)
> + goto free_ext_stats_mem;
> +
> + ret = snprintf(buff, len, "%s", (char *)virt_addr);

So ENA_ADMIN_GET_STATS_TYPE_EXTENDED fills a buffer with bytes in the
[0; 127] range, right ?

> +
> +free_ext_stats_mem:
> + dma_free_coherent(ena_dev->dmadev, len, virt_addr, phys_addr);
> +done:
> + return ret;
> +}
> +
> +int ena_com_set_dev_mtu(struct ena_com_dev *ena_dev, int mtu)
> +{
> + struct ena_com_admin_queue *admin_queue;
> + struct ena_admin_set_feat_cmd cmd;
> + struct ena_admin_set_feat_resp resp;
> + int ret = 0;

Should be -ENODEV or left uninitialized.

> +
> + if (unlikely(!ena_dev)) {
> + ena_trc_err("%s : ena_dev is NULL\n", __func__);
> + return -ENODEV;
> + }

Does it mean that ena_com_dev may go away while the net_device is in use ?

> +
> + if (!ena_com_check_supported_feature_id(ena_dev, ENA_ADMIN_MTU)) {
> + ena_trc_info("Feature %d isn't supported\n", ENA_ADMIN_MTU);
> + return -EPERM;
> + }

rc = ena_com_check_supported_feature_id(...
if (rc < 0) {
ena_trc_info(...
goto out;
}
> +
> + memset(, 0x0, sizeof(cmd));
> + admin_queue = _dev->admin_queue;
> +
> + cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
> + cmd.aq_common_descriptor.flags = 0;
> + cmd.feat_common.feature_id = ENA_ADMIN_MTU;
> + cmd.u.mtu.mtu = mtu;
> +
> + ret = ena_com_execute_admin_command(admin_queue,
> + (struct ena_admin_aq_entry *),
> + sizeof(cmd),
> + (struct ena_admin_acq_entry *),
> + sizeof(resp));
> +
> + if (unlikely(ret)) {
> + ena_trc_err("Failed to set mtu %d. error: %d\n", mtu, ret);
> + return -EINVAL;

ena_com_execute_admin_command should return a proper status code.

[...]
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
b/drivers/net/ethernet/amazon/ena/ena_com.h
new file mode 100644
index 000..e49ba43
--- /dev/null
[...]
+static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
*intr_reg,
+  u32 rx_delay_interval,
+  u32 tx_delay_interval,
+  bool unmask)
+{
+   intr_reg->intr_control = 0;
+   intr_reg->intr_control |= rx_delay_interval &
+   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
+
+   intr_reg->intr_control |=
+   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
+   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
  ^^ TX ?

Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
queueing is still enabled. Please drop packet and return NETDEV_TX_OK.

-- 
Ueimor

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-16 Thread Francois Romieu

Netanel Belgazal  :
[...]

Very limited review below.

> diff --git a/Documentation/networking/ena.txt 
> b/Documentation/networking/ena.txt
> new file mode 100644
> index 000..528544f
> --- /dev/null
> +++ b/Documentation/networking/ena.txt
> @@ -0,0 +1,330 @@
> +Linux kernel driver for Elastic Network Adapter (ENA) family:
> +=
> +
> +Overview:
> +=
> +The ENA driver provides a modern Ethernet device interface optimized
> +for high performance and low CPU overhead.

Marketing boilerplate. How much of it will still be true in 6 months ?
6 years ?

Hint: stay technical. If in doubt, make it boring. :o)

[...]
> +ENA devices allow high speed and low overhead Ethernet traffic
> +processing by providing a dedicated Tx/Rx queue pair per host CPU, a
> +dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive
> +interrupt moderation, and CPU cacheline optimized data placement.

How many queues exactly ?

[...]
> +Memory Allocations:
> +===
> +DMA Coherent buffers are allocated for the following DMA rings:
> +- Tx submission ring (For regular mode; for LLQ mode it is allocated
> +  using kzalloc)
> +- Tx completion ring
> +- Rx submission ring
> +- Rx completion ring
> +- Admin submission ring
> +- Admin completion ring
> +- AENQ ring

Useless documentation (implementation detail).

Nobody will maintain it when the driver changes. Please remove.

[...]
> +MTU:
> +
> +The driver supports an arbitrarily large MTU with a maximum that is
> +negotiated with the device. The driver configures MTU using the
> +SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
> +via ifconfig(8) and ip(8).

Please avoid advertising legacy ifconfig.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h 
> b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
> new file mode 100644
> index 000..8516e5e
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
[...]
> +/* admin commands opcodes */
> +enum ena_admin_aq_opcode {
> + /* create submission queue */
> + ENA_ADMIN_CREATE_SQ = 1,
> +
> + /* destroy submission queue */
> + ENA_ADMIN_DESTROY_SQ = 2,
> +
> + /* create completion queue */
> + ENA_ADMIN_CREATE_CQ = 3,
> +
> + /* destroy completion queue */
> + ENA_ADMIN_DESTROY_CQ = 4,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_GET_FEATURE = 8,
> +
> + /* get capabilities of particular feature */
> + ENA_ADMIN_SET_FEATURE = 9,
> +
> + /* get statistics */
> + ENA_ADMIN_GET_STATS = 11,
> +};

The documentation above simply duplicates the member names -> useless
visual payload.

Please replace space with tab before "=" to line things up.

[...]
> +/* Basic Statistics Command. */
> +struct ena_admin_basic_stats {
> + /* word 0 :  */
> + u32 tx_bytes_low;
> +
> + /* word 1 :  */
> + u32 tx_bytes_high;
> +
> + /* word 2 :  */
> + u32 tx_pkts_low;
> +
> + /* word 3 :  */
> + u32 tx_pkts_high;
> +
> + /* word 4 :  */
> + u32 rx_bytes_low;
> +
> + /* word 5 :  */
> + u32 rx_bytes_high;
> +
> + /* word 6 :  */
> + u32 rx_pkts_low;
> +
> + /* word 7 :  */
> + u32 rx_pkts_high;
> +
> + /* word 8 :  */
> + u32 rx_drops_low;
> +
> + /* word 9 :  */
> + u32 rx_drops_high;
> +};

struct zorg {
u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
u32 ...;

u32 ...;
u32 ...;
u32 ...;
};

-> No comment needed to figure the offset.
   Comment removed => no need to check if offset starts at word 0 or word 1.

[...]
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static inline int ena_com_mem_addr_set(struct ena_com_dev *ena_dev,
> +struct ena_common_mem_addr *ena_addr,
> +dma_addr_t addr)
> +{
> + if ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 0)) != addr) {
> + ena_trc_err("dma address has more bits that the device 
> supports\n");
> + return -EINVAL;
> + }
> +
> + ena_addr->mem_addr_low = (u32)addr;
> + ena_addr->mem_addr_high =
> + ((addr & GENMASK_ULL(ena_dev->dma_addr_bits - 1, 32)) >> 32);

No need to "... & GENMASK" given the test above.

> +
> + return 0;
> +}
> +
> +static int ena_com_admin_init_sq(struct ena_com_admin_queue *queue)
> +{
> + queue->sq.entries =
> + dma_alloc_coherent(queue->q_dmadev,
> +ADMIN_SQ_SIZE(queue->q_depth),
> +>sq.dma_addr,
> +GFP_KERNEL | __GFP_ZERO);

1. Use

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-24 Thread Francois Romieu

Lino Sanfilippo  :
[...]
> I dont agree here. A dma_wmb would have an effect to "data" and "info", yes,
> but it would have absolutely no effect to skb_tx_timestamp(), since there
> is no dma access involved here. In fact skb_tx_timestamp() could probably
> be even reordered to appear after the dma_wmb.
>
> Anyway, there is the wmb() directly after the assignment to "info". _This_
> barrier  should ensure that skb_tx_timestamp() (along with a flush of data
> and info to DMA) is executed before "txbd_curr" is advanced.
> This means that the corresponding skb cant be freed prematurely by tx_clean().

The concern here is about sending adequate PTP payload on the network.

skb_tx_timestamp() is used for network clock synchronization. Some extra
information must be transmitted. Be it through direct payload change
or through indirect control, it _is_ related to dma.

Several (most ?) skb_tx_timestamp() misuses blur the picture: CPU vs device
sync is of course way below the radar when the driver crashes because of
plain use-after-free skb_tx_timestamp() :o/

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Francois Romieu

Lino Sanfilippo  :
[...]
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   unsigned int *txbd_dirty = >txbd_dirty;
>   struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
>   struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];
> - struct sk_buff *skb = tx_buff->skb;
>   unsigned int info = le32_to_cpu(txbd->info);
> + struct sk_buff *skb;
>  

Insert a smp_rmb() here to close one window for an outdated txbd_dirty value
(the "arc_emac_tx_clean wrote txbd_curr and issued smp_wmb" one).

Actually, insert smp_rmb() at the start of arc_emac_tx_clean() as it
does not need to be performed withing the loop and would penalize it.

Given the implicit smp barriers in the non-driver code, I consider
"arc_emac_tx_clean on one CPU does not read latest txbd_dirty value written
by previous arc_emac_tx_clean on different CPU" as utter onanism but
issueing smp_rmb() at the start of arc_emac_tx_clean() nails it as well.

> - if ((info & FOR_EMAC) || !txbd->data || !skb)
> + if (*txbd_dirty == priv->txbd_curr)
>   break;

Ok, it's just the "while (priv->txbd_dirty != priv->txbd_curr) {" loop
in disguise.

>  
> + /* Make sure curr pointer is consistent with info */
> + rmb();
> +
> + info = le32_to_cpu(txbd->info);
> +
> + if (info & FOR_EMAC)
> + break;

With proper ordering + barrier in arc_emac_tx() you can relax it to smp_rmb().

> +
> + skb = tx_buff->skb;
> +
>   if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
>   stats->tx_errors++;
>   stats->tx_dropped++;
> @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
>   }
>  
> - /* Ensure that txbd_dirty is visible to tx() before checking
> -  * for queue stopped.
> + /* Ensure that txbd_dirty is visible to tx() and we see the most recent
> +  * value for txbd_curr.
>*/
>   smp_mb();
>  
> @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>   dma_unmap_len_set(>tx_buff[*txbd_curr], len, len);
>  
>   priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
> -
> - /* Make sure pointer to data buffer is set */
> - wmb();
> + priv->tx_buff[*txbd_curr].skb = skb;
>  
>   skb_tx_timestamp(skb);
>  
>   *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);

No.

You need dma_wmb() after skb_tx_timestamp() to commit skb_tx_timestamp() [*]
and data = cpu_to_le32(addr).

[*] I doubt anyone want a dma_sync_single_...() here.

>  
> - /* Make sure info word is set */
> + /* 1. Make sure that with respect to tx_clean everything is set up
> +  * properly before we advance txbd_curr.
> +  * 2. Make sure writes to DMA descriptors are completed before we inform
> +  * the hardware.
> +  */
>   wmb();

Yes, either wmb() or smp_wmb() + dma_wmb().

>  
> - priv->tx_buff[*txbd_curr].skb = skb;
> -
>   /* Increment index to point to the next BD */
>   *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
>  
> - /* Ensure that tx_clean() sees the new txbd_curr before
> -  * checking the queue status. This prevents an unneeded wake
> -  * of the queue in tx_clean().
> + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
> +  * the updated value of txbd_curr.
>*/
>   smp_mb();

Nit: s/the most/a/

"a" as in "arc_emac_tx_clean() _is_ racing with arc_emac_tx"

>  
> - if (!arc_emac_tx_avail(priv)) {
> + if (!arc_emac_tx_avail(priv))
>   netif_stop_queue(ndev);
> - /* Refresh tx_dirty */
> - smp_mb();
> - if (arc_emac_tx_avail(priv))
> - netif_start_queue(ndev);
> - }

No.

I may sound like an old record but the revalidation part must be kept.

txbd_dirty may change in the arc_emac_tx_avail.. netif_stop_queue window
(the race requires a different CPU as arc_emac_tx() runs in locally
disabled BH context).

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Francois Romieu

Lino Sanfilippo <linosanfili...@gmx.de> :
> On 21.05.2016 21:47, Francois Romieu wrote:
> > Shuyu Wei <wsy2...@gmail.com> :
> > [...]
> >> diff --git a/drivers/net/ethernet/arc/emac_main.c 
> >> b/drivers/net/ethernet/arc/emac_main.c
> >> index a3a9392..c2447b0 100644
> >> --- a/drivers/net/ethernet/arc/emac_main.c
> >> +++ b/drivers/net/ethernet/arc/emac_main.c
> >> @@ -686,6 +686,9 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> >> net_device *ndev)
> >>  
> >> skb_tx_timestamp(skb);
> >>  
> >> +   /* Make sure timestamp is set */
> >> +   smp_wmb();
> > 
> > Should be dma_wmb() (see davem's message).
> > 
> > It's completely unrelated to SMP.
> > 
> 
> Its also completely unrelated to dma so I doubt that this is what davem meant.

It's related to dma: nobody wants the device to perform dma from memory
while the CPU is writing timestamp.

The device must enforce a commit of any network buffer memory write before
releasing control, i.e. before writing *info.

See 'git log -p --grep=dma_[rw]mb': it appears several times.

> As far as I understood he was referring to the dma descriptor.

If the wmb() between *info = ... and *txbd_curr = ... is replaced by a
dma_wmb(), the device will see a consistent descriptor when if performs
a DMA read after the CPU wrote into the mailbox (arc_reg_set(..., TXPL_MASK)).

Ok, I agree on this one.

However, it doesn't help with the (SMP) requirement that no CPU sees
the txbd_curr write before *info is written by the local CPU. Afaiui
dma_wmb() is too weak for this part. If we don't want wmb() here,
it will have to be dma_wmb() + smp_wmb().

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-21 Thread Francois Romieu

Shuyu Wei  :
[...]
> diff --git a/drivers/net/ethernet/arc/emac_main.c 
> b/drivers/net/ethernet/arc/emac_main.c
> index a3a9392..c2447b0 100644
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -686,6 +686,9 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>  
> skb_tx_timestamp(skb);
>  
> +   /* Make sure timestamp is set */
> +   smp_wmb();

Should be dma_wmb() (see davem's message).

It's completely unrelated to SMP.

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-19 Thread Francois Romieu

Lino Sanfilippo  :
[...]
> 2. requires a smp_wmb, while 3. requires a rmb(). AFAIK the mb() implies all 
> we need, 
> the dma_rmb() for 1., the smp_wmb() for 2. and the rmb() for 3. 

A revalidation of tx_dirty is still needed (see below) despite the rmb()
for 3. The rmb() for 3. is close to useless.

> >> 2. On multi processor systems: ensures that txbd_curr is updated (this is 
> >> paired
> >> with the smp_mb() at the end of tx_clean).
> > 
> > Smells like using barrier side-effects to control smp coherency. It isn't
> > the recommended style.
> > 
> 
> As I wrote above, the mb() implies the smp_wmb() so this is a regular pairing
> of smp_wmb() in xmit and smb_mb() in tx_clean, nothing uncommon.

Since you are quoting Documentation/memory-barriers.txt:
[...]
CPU MEMORY BARRIERS
---
[...]
Mandatory barriers should not be used to control SMP effects, since mandatory
barriers impose unnecessary overhead on both SMP and UP systems.

> >> -  if ((info & FOR_EMAC) || !txbd->data || !skb)
> >> +  if (info & FOR_EMAC) {
> >> +  /* Make sure we see consistent values for info, skb
> >> +   * and data.
> >> +   */
> >> +  smp_rmb();
> >>break;
> >> +  }
> > 
> > ?
> > 
> > smp_rmb should appear before the variables you want coherency for.
> 
> I dont think so. Please take a look into the memory barriers documentation.

> There is an example that is pretty close to the situation that we have in 
> this driver:
> 
> http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt#L1819
> 
> In that example the barrier is also _between_ the variables that are to be
> ordered, not before. 

Err, which barrier ?

- dma_rmb() ?
  The device (obviously) set the 'status' member of the descriptor.
  dma_rmb() ensures that device-initiated DMA is complete for the 'data'
  member as well.

- dma_wmb() ?
  It ensures that the updated 'data' member will be set before any
  DMA resulting from the change of descriptor ownership takes place.

- wmb() ?
  It ensures that the previous write to descriptor (coherent memory)
  completes before write is posted to I/O mailbox.

None of these is "pretty close" to the "smp_rmb() before return" situation.

> >> -  skb_tx_timestamp(skb);
> >> +  /* Make sure info is set after data and skb with respect to
> >> +   * other tx_clean().
> >> +   */
> >> +  smp_wmb();
> >>  
> >>*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
> > 
> > Afaik smp_wmb() does not imply wmb(). So priv->txbd[*txbd_curr].data and
> > *info (aka priv->txbd[*txbd_curr].info) are not necessarily written in
> > an orderly manner.
> 
> Right, as I already wrote above I changed the smp barriers to mandatory ones.
> 
> > 
> >>  
> >> -  /* Make sure info word is set */
> >> -  wmb();
> >> -
> >> -  priv->tx_buff[*txbd_curr].skb = skb;
> >> -
> >>/* Increment index to point to the next BD */
> >>*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
> > 
> > With this change it's possible that tx_clean() reads new value for
> > tx_curr and old value (0) for *info.
> 
> Even if this could happen, what is the problem? I cant see an issue
> that results from such a scenario.

tx_clean() misunderstands the 0 in *info as a descriptor updated by the
device. tx_clean() thus kfrees the skb before the device DMAed from it.

[...]
> > Xmit thread| Clean thread
> > 
> > mb();
> > 
> > arc_emac_tx_avail() test with old
> > tx_dirty - tx_clean has not issued
> > any mb yet - and new tx_curr
> > 
> >  smp_mb();
> > 
> >  if (netif_queue_stopped(ndev) && ...
> >  netif_wake_queue(ndev);
> > 
> > netif_stop_queue()
> > 
> > -> queue stopped.
> > 
> 
> Again, the mb() we have now implies the smb_mb() we had before. So nothing
> changed except that we can be sure to see the new value for tx_dirty at
> our first attempt.

Nothing changed except you removed the revalidation part !

The smp_mb() we had before wasn't about seeing tx_dirty in the xmit thread
but about balancing the (read) barrier in the cleaning thread so that the
latter stood a chance to see the new (tx thread updated) tx_curr.

Consider the two lines below:

if (!arc_emac_tx_avail(priv))
netif_stop_queue(ndev);

Nothing prevents a complete run of the cleaning thread between these
two lines. It may or it may not happen but there is one thing sure:
mb() before arc_emac_tx_avail() can't tell the future.

[...]
> New patch is below. 

The arc_emac_tx_clean() change is wrong.

tx_drity revalidation is still needed in arc_emac_tx after netif_stop_queue.

A barrier is still missing in arc_emac_tx between descriptor release and
tx_curr increase.

-- 
Ueimor

Re: rtk8168 driver help needed

2016-05-19 Thread Francois Romieu

Murali Karicheri  :
[...]
> Do you what could be wrong with rtk8168?

Hardly.

What do the device registers (ethtool -d) and device stats (ethtool -S)
look like ?

(please trim useless material from previous mail)

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-18 Thread Francois Romieu

Lino Sanfilippo  :
[...]
> what about the (only compile tested) code below?

I may have misunderstood some parts but it nonetheless seems broken.

> The smp_wmb() in tx function combined with the smp_rmb() in tx_clean ensures
> that the CPU running tx_clean sees consistent values for info, data and skb 
> (thus no need to check for validity of all three values any more).
> The mb() fulfills several tasks:
> 1. makes sure that DMA writes to descriptor are completed before the HW is
> informed.

"DMA writes" == "CPU writes" ?

> 2. On multi processor systems: ensures that txbd_curr is updated (this is 
> paired
> with the smp_mb() at the end of tx_clean).

Smells like using barrier side-effects to control smp coherency. It isn't
the recommended style.

> 3. Ensure we see the most recent value for tx_dirty. With this we do not have 
> to
> recheck after we stopped the tx queue.
> 
> 
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -162,8 +162,13 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   struct sk_buff *skb = tx_buff->skb;
>   unsigned int info = le32_to_cpu(txbd->info);
>  
> - if ((info & FOR_EMAC) || !txbd->data || !skb)
> + if (info & FOR_EMAC) {
> + /* Make sure we see consistent values for info, skb
> +  * and data.
> +  */
> + smp_rmb();
>   break;
> + }

?

smp_rmb should appear before the variables you want coherency for.

>  
>   if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
>   stats->tx_errors++;
> @@ -679,36 +684,33 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>   dma_unmap_addr_set(>tx_buff[*txbd_curr], addr, addr);
>   dma_unmap_len_set(>tx_buff[*txbd_curr], len, len);
>  
> - priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
>  
> - /* Make sure pointer to data buffer is set */
> - wmb();
> + priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
> + priv->tx_buff[*txbd_curr].skb = skb;
>  
> - skb_tx_timestamp(skb);
> + /* Make sure info is set after data and skb with respect to
> +  * other tx_clean().
> +  */
> + smp_wmb();
>  
>   *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);

Afaik smp_wmb() does not imply wmb(). So priv->txbd[*txbd_curr].data and
*info (aka priv->txbd[*txbd_curr].info) are not necessarily written in
an orderly manner.

>  
> - /* Make sure info word is set */
> - wmb();
> -
> - priv->tx_buff[*txbd_curr].skb = skb;
> -
>   /* Increment index to point to the next BD */
>   *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;

With this change it's possible that tx_clean() reads new value for
tx_curr and old value (0) for *info.

>  
> - /* Ensure that tx_clean() sees the new txbd_curr before
> + /* 1.Ensure that tx_clean() sees the new txbd_curr before
>* checking the queue status. This prevents an unneeded wake
>* of the queue in tx_clean().
> +  * 2.Ensure that all values are written to RAM and to DMA
> +  * before hardware is informed.

(I am not sure what "DMA" is supposed to mean here.)

> +  * 3.Ensure we see the most recent value for tx_dirty.
>*/
> - smp_mb();
> + mb();
>  
> - if (!arc_emac_tx_avail(priv)) {
> + if (!arc_emac_tx_avail(priv))
>   netif_stop_queue(ndev);
> - /* Refresh tx_dirty */
> - smp_mb();
> - if (arc_emac_tx_avail(priv))
> - netif_start_queue(ndev);
> - }

Xmit thread| Clean thread

mb();

arc_emac_tx_avail() test with old
tx_dirty - tx_clean has not issued
any mb yet - and new tx_curr

 smp_mb();

 if (netif_queue_stopped(ndev) && ...
 netif_wake_queue(ndev);

netif_stop_queue()

-> queue stopped.

You can't remove the revalidation step.

arc_emac_tx_avail() is essentially pessimistic. Even if arc_emac_tx_avail()
was "right", there would be a tx_clean window between arc_emac_tx_avail()
and netif_stop_queue().

> +
> + skb_tx_timestamp(skb);

You don't want to issue skb_tx_timestamp after releasing control of the
descriptor (*info = ...): skb may be long gone.

-- 
Ueimor

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-17 Thread Francois Romieu

David Miller  :
> From: Shuyu Wei 
> Date: Tue, 17 May 2016 23:25:20 +0800
> 
> > diff --git a/drivers/net/ethernet/arc/emac_main.c 
> > b/drivers/net/ethernet/arc/emac_main.c
> > index a3a9392..df3dfef 100644
> > --- a/drivers/net/ethernet/arc/emac_main.c
> > +++ b/drivers/net/ethernet/arc/emac_main.c
> > @@ -153,9 +153,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
> >  {
> > struct arc_emac_priv *priv = netdev_priv(ndev);
> > struct net_device_stats *stats = >stats;
> > -   unsigned int i;
> >  
> > -   for (i = 0; i < TX_BD_NUM; i++) {
> > +   while (priv->txbd_dirty != priv->txbd_curr) {
> > unsigned int *txbd_dirty = >txbd_dirty;
> > struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
> > struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];
> > @@ -685,13 +684,15 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> > net_device *ndev)
> > wmb();
> >  
> > skb_tx_timestamp(skb);
> > +   priv->tx_buff[*txbd_curr].skb = skb;
> > +
> > +   dma_wmb();
> >  
> > *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
> >  
> > /* Make sure info word is set */
> > wmb();
> >  
> > -   priv->tx_buff[*txbd_curr].skb = skb;
> >  
> > /* Increment index to point to the next BD */
> > *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
> > 
> 
> These memory barriers do not look correct to me.
> 
> dma_wmb() is about visibility between CPU reads/writes and device
> accesses to a piece of memory.  But what you're concerned about wrt.
> the SKB pointer assignment is CPU to CPU accesses.  Therefore something
> like smp_wmb() would be appropriate.

Something like:

skb_tx_timestamp(skb);

/* CPU write vs device access. Must be done before releasing control
 * of the descriptor (*info).
 */
dma_wmb();

priv->tx_buff[*txbd_curr].skb = skb;

/* CPU arc_emac_tx_clean vs CPU arc_emac_tx. Must be done before
 * index (tx_curr) update. Does not necessarily deserves to be done
 * before releasing control of the descriptor (*info) due to
 * descriptor vs index ordering.
 *
 * FIXME: missing smp_rmb before the while loop in arc_emac_tx_clean.
 */
smp_wmb();

*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);

/* local descriptor (*info) update vs index (tx_curr) update. */
wmb();

*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;

smp_mb();   // The driver alreay contains this one.

The smp_wmb() and wmb() could be made side-by-side once *info is
updated but I don't see the adequate idiom to improve the smp_wmb + wmb
combo. :o/

> And the wmb() looks like it should be a dma_wmb().

I see two points against it:
- it could be too late for skb_tx_timestamp().
- arc_emac_tx_clean must not see an index update before the device
  got a chance to acquire the descriptor. arc_emac_tx_clean can't
  tell the difference between an about-to-be-released descriptor
  and a returned-from-device one.

-- 
Ueimor

Re: [PATCH v2] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-16 Thread Francois Romieu

Ard Biesheuvel  :
[...]
> This is a followup to 'r8169: default to 64-bit DMA on systems without memory
> below 4 GB' [1]. At the request of Francois, this version bases the decision
> whether to use 64-bit DMA by default on whether the device is PCIe and
> sufficiently recent, rather than whether the platform requires 64-bit DMA
> because it does not have any memory below 4 GB to begin with. This is safer,
> since it will prevent the use of such problematic cards on these platforms.

Testing has not been conclusive. It apparently works but I have not been
able to set addresses above 4Gb for the Rx or Tx descriptor rings yet.

-- 
Ueimor

Re: r8169: Unconditionally disabling ASPM

2016-05-15 Thread Francois Romieu

Paul Menzel  :
[...]
> As over five years have passed now, do you think that is still needed?
> I wonder why no module parameter was added back then, where users could
> enable ASPM if it works on their systems? Because there is no such
> situation and it always fails?

It was enabled again (d64ec841517a25f6d468bde9f67e5b4cffdc67c7) then
disabled (4521e1a94279ce610d3f9b7945c17d581f804242). It's closer
to 3.5 years :o)

Module parameters are frowned upon.

Lin, is there some interest in selectively [*] enabling (or disabling)
ASPM support in the r8169 driver or will it be unreliable ?

[*] Based on DMI information for instance.

-- 
Ueimor

Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-15 Thread Francois Romieu

Shuyu Wei  :
[...]
> I still have a question, is it possible that tx_clean() run
> between   priv->tx_buff[*txbd_curr].skb = skb   and   dma_wmb()?

A (previous) run can take place after priv->tx_buff[*txbd_curr].skb and
before *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len).

So, yes, the driver must check in arc_emac_tx_clean() that 1) either
txbd_dirty != txbd_curr or 2) "info" is not consistent with a still-not-used
status word. Please be patient with me and get rid of the useless "i"

diff --git a/drivers/net/ethernet/arc/emac_main.c 
b/drivers/net/ethernet/arc/emac_main.c
index a3a9392..337ea3b 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -153,9 +153,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
 {
struct arc_emac_priv *priv = netdev_priv(ndev);
struct net_device_stats *stats = >stats;
-   unsigned int i;
 
-   for (i = 0; i < TX_BD_NUM; i++) {
+   while (priv->txbd_dirty != priv->txbd_curr) {
unsigned int *txbd_dirty = >txbd_dirty;
struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];

-- 
Ueimor

Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-15 Thread Francois Romieu

Shuyu Wei  :
[...]
> I don't think taking txbd_curr and txbd_dirty only as hints is a good idea.
> That could be a big waste, since tx_clean have to go through all the txbds.

Sorry if my point was not clear: arc_emac_tx_clean() does not need
to change (at least not for the reason given in the commit message) :o)

Current code:

static void arc_emac_tx_clean(struct net_device *ndev)
{
[...]
for (i = 0; i < TX_BD_NUM; i++) {
unsigned int *txbd_dirty = >txbd_dirty;
struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];
struct sk_buff *skb = tx_buff->skb;
unsigned int info = le32_to_cpu(txbd->info);

if ((info & FOR_EMAC) || !txbd->data || !skb)
break;
^

-> the "break" statement prevents reading all txbds. At most one extra
   descriptor is read and this driver isn't in the Mpps business.

> I tried your advice, Tx throughput can only reach 5.52MB/s.

Even with the original code above ?

> Leaving one sent packet in tx_clean is acceptable if we respect to txbd_curr
> and txbd_dirty, since the ignored packet will be cleaned when new packets
> arrive.

There is no reason to leave tx packet roting in the first place. Really.
I doubt it would help bql for one.

Packet may rot because of unexpected hardware behavior and driver should
cope with it when it is diagnosed, sure. However, you don't want the driver
to opens it own unbounded window. Next packet: 10 us, 10 ms, 10 s ?

-- 
Ueimor

Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-14 Thread Francois Romieu

Shuyu Wei  :
> The tail of the ring buffer(txbd_dirty) should never go ahead of the
> head(txbd_curr) or the ring buffer will corrupt. 
> 
> This is the root cause of racing.

No (see below).

It may suffer from some barrier illness though.

> Besides, setting the FOR_EMAC flag should be the last step of modifying
> the buffer descriptor, or possible racing will occur.

(s/Besides//)

Yes. Good catch.

> Signed-off-by: Shuyu Wei 
> ---
> 
> diff --git a/drivers/net/ethernet/arc/emac_main.c 
> b/drivers/net/ethernet/arc/emac_main.c
> index a3a9392..5ece05b 100644
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
> struct net_device_stats *stats = >stats;
> unsigned int i;
>  
> -   for (i = 0; i < TX_BD_NUM; i++) {
> +   for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % 
> TX_BD_NUM) {
> unsigned int *txbd_dirty = >txbd_dirty;
> struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
> struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];

"i" is only used as a loop counter in arc_emac_tx_clean. It is not even
used as an index to dereference an array or whatever. Only "priv->txbd_dirty"
is used.

arc_emac_tx_clean() checks FOR_EMAC, skb, and dirty tx data. It takes care of
clearing those itself. Thus, (memory / io barrier considerations apart) it can
only proceed beyond its own "if ((info & FOR_EMAC) || !txbd->data || !skb)"
check if arc_emac_tx wrote all of those.

Where they are used as loop counters, both TX_BD_NUM and txbd_curr - txbd_dirty
can be considered as hints (please note that unsigned arithmetic can replace
the "%" sludgehammer here).

> @@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>  
> skb_tx_timestamp(skb);

> +   priv->tx_buff[*txbd_curr].skb = skb;

dma_wmb();

(sync writes to memory before releasing descriptor)

> *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
>  
> /* Make sure info word is set */
> wmb();

arc_emac_tx_clean can run from this point.

txbd_curr is still not set (and it does need to). So, if you insist
on txbd_curr appearing in arc_emac_tx_clean::for(...), it's perfectly
possible to ignore a sent packet.

I ignored arc_reg_set() at the end of arc_emac_tx(). I have no idea
if it is posted nor if it forces the chipset to read the descriptors
(synchronously ?) so part of the sentence above could be wrong.

You have found a big offender in arc_emac_tx() but the arc_emac_tx_clean()
part is imho useless, incorrectly understood or misworded.

-- 
Ueimor

Re: [PATCH] r8169: default to 64-bit DMA on systems without memory below 4 GB

2016-05-12 Thread Francois Romieu

Francois Romieu <rom...@fr.zoreil.com> :
> Ard Biesheuvel <ard.biesheu...@linaro.org> :
> > On 12 May 2016 at 16:09, Francois Romieu <rom...@fr.zoreil.com> wrote:
> [...]
> > By resurrecting 353176888386, I mean the patch that changes the
> > default for PCI express devices, so I think we are in agreement here ?
> 
> Almost 353176888386: please modify it so that there's no change of
> behavior for mac_version < RTL_GIGA_MAC_VER_18.
> 
> It will avoid a bit more than the plain old PCI chipsets, including the
> one that caused 353176888386 reversal as well as a few one that I don't
> feel confortable with due to their vicinity with the original 8169.

The patch below may be of use before refreshing 353176888386:

Btw I have to find a way to force descriptor ring allocation above 4G
because none of my testing has exercized it so far but my tax
paperwork can't wait.

Good luck.

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 94f08f1..ac23240 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8224,20 +8224,6 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_mwi_2;
}
 
-   tp->cp_cmd = 0;
-
-   if ((sizeof(dma_addr_t) > 4) &&
-   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
-   tp->cp_cmd |= PCIDAC;
-   dev->features |= NETIF_F_HIGHDMA;
-   } else {
-   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
-   if (rc < 0) {
-   netif_err(tp, probe, dev, "DMA configuration failed\n");
-   goto err_out_free_res_3;
-   }
-   }
-
/* ioremap MMIO region */
ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
if (!ioaddr) {
@@ -8253,6 +8239,20 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, dev, cfg->default_ver);
 
+   tp->cp_cmd = 0;
+
+   if ((sizeof(dma_addr_t) > 4) &&
+   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
+   tp->cp_cmd |= PCIDAC;
+   dev->features |= NETIF_F_HIGHDMA;
+   } else {
+   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc < 0) {
+   netif_err(tp, probe, dev, "DMA configuration failed\n");
+   goto err_out_unmap_4;
+   }
+   }
+
rtl_init_rxcfg(tp);
 
rtl_irq_disable(tp);
@@ -8412,12 +8412,12 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
   >counters_phys_addr, GFP_KERNEL);
if (!tp->counters) {
rc = -ENOMEM;
-   goto err_out_msi_4;
+   goto err_out_msi_5;
}
 
rc = register_netdev(dev);
if (rc < 0)
-   goto err_out_cnt_5;
+   goto err_out_cnt_6;
 
pci_set_drvdata(pdev, dev);
 
@@ -8451,12 +8451,13 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 out:
return rc;
 
-err_out_cnt_5:
+err_out_cnt_6:
dma_free_coherent(>dev, sizeof(*tp->counters), tp->counters,
  tp->counters_phys_addr);
-err_out_msi_4:
+err_out_msi_5:
netif_napi_del(>napi);
rtl_disable_msi(pdev, tp);
+err_out_unmap_4:
iounmap(ioaddr);
 err_out_free_res_3:
pci_release_regions(pdev);

Re: [PATCH] r8169: default to 64-bit DMA on systems without memory below 4 GB

2016-05-12 Thread Francois Romieu

Ard Biesheuvel <ard.biesheu...@linaro.org> :
> On 12 May 2016 at 16:09, Francois Romieu <rom...@fr.zoreil.com> wrote:
[...]
> By resurrecting 353176888386, I mean the patch that changes the
> default for PCI express devices, so I think we are in agreement here ?

Almost 353176888386: please modify it so that there's no change of
behavior for mac_version < RTL_GIGA_MAC_VER_18.

It will avoid a bit more than the plain old PCI chipsets, including the
one that caused 353176888386 reversal as well as a few one that I don't
feel confortable with due to their vicinity with the original 8169.

[...]
> > A few news from the "pathologically better safe than sorry" squad:
> > I have switched the default on a couple of non-critical production
> > servers that include 8168c (RTL_GIGA_MAC_VER_22). It should give an hint
> > for hardware from 2008 ~ 2009. I'll do some basic sanity testing with
> > different chipsets.
> >
> 
> Thanks for testing that. In the mean time, I will dust off the patch
> and rebase it to today's -next.

8106e (RTL_GIGA_MAC_VER_39) doesn't exhibit any difference of behavior
either.

-- 
Ueimor

Re: [PATCH] r8169: default to 64-bit DMA on systems without memory below 4 GB

2016-05-12 Thread Francois Romieu

> On 12 May 2016 at 01:58, David Miller  wrote:
> > From: Ard Biesheuvel 
> > Date: Wed, 11 May 2016 09:47:49 +0200
[...]
> > I think we should just seriously consider changing the default, it's
> > a really outdated reasoning behind the current default setting.  Maybe
> > relevant a decade ago, but probably not now.
> >
> > And if the card is completely disfunctional in said configuration, the
> > default is definitely wrong.
> 
> The card is indeed completely disfunctional. So we could try to
> resurrect 353176888386 ("r8169: enable 64-bit DMA by default for PCI
> Express devices"), and instead of backing it out again if regressions
> are reported, blacklist the particular chips. This is a much better
> approach, since then we can also print some kind of diagnostic on
> those arm64 systems why such a blacklisted NIC is not supported.

I doubt there will be much *reporting* from broken systems that
include plain old PCI realtek chipsets (r8169.c::RTL_CFG_0). Changing
the default for those is imnsho asking for troubles without clear
benefit (experimental evidence suggests that smsc etherpower II grows
older more easily than plain pci 8169 :o/ ).

I'd rather leave these alone and change the default for the PCI Express
chipsets. Btw, while it does not seem to hurt, they should not need any
CPlusCmd Dual Access Cycle tweak either. Realtek may establish it (Lin ?)

A few news from the "pathologically better safe than sorry" squad:
I have switched the default on a couple of non-critical production
servers that include 8168c (RTL_GIGA_MAC_VER_22). It should give an hint
for hardware from 2008 ~ 2009. I'll do some basic sanity testing with
different chipsets.

-- 
Ueimor

Re: [PATCH] r8169: default to 64-bit DMA on systems without memory below 4 GB

2016-05-11 Thread Francois Romieu

Ard Biesheuvel <ard.biesheu...@linaro.org> :
> On 11 May 2016 at 22:31, Francois Romieu <rom...@fr.zoreil.com> wrote:
[...]
> It has little to do with f*cking legacy 32-bits-only-devices if DRAM
> simply starts at 0x80__. This is on an AMD arm64 chip.

The lack of IOMMU surprizes me.

[...]
> OK, if you prefer. Should I send a v2?

Don't bother unless someones comes with a more substantial request.

-- 
Ueimor

Re: [PATCH] r8169: default to 64-bit DMA on systems without memory below 4 GB

2016-05-11 Thread Francois Romieu

Ard Biesheuvel  :
> The current logic around the 'use_dac' module parameter prevents the
> r81969 driver from being loadable on 64-bit systems without any RAM
> below 4 GB when the parameter is left at its default value.
> So introduce a new default value -1 which indicates that 64-bit DMA
> should be enabled implicitly, but only if setting a 32-bit DMA mask
> has failed earlier. This should prevent any regressions like the ones
> caused by previous attempts to change this code.

I am not a huge fan but if you really need it...

Which current kernel arches do exhibit the interesting
f*ck-legacy-32-bits-only-devices property you just described ?

[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 94f08f1e841c..a49e8a58e539 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -859,7 +859,8 @@ struct rtl8169_private {
>  MODULE_AUTHOR("Realtek and the Linux r8169 crew ");
>  MODULE_DESCRIPTION("RealTek RTL-8169 Gigabit Ethernet driver");
>  module_param(use_dac, int, 0);
> -MODULE_PARM_DESC(use_dac, "Enable PCI DAC. Unsafe on 32 bit PCI slot.");
> +MODULE_PARM_DESC(use_dac,
> + "Enable PCI DAC. Unsafe on 32 bit PCI slot (default -1: enable on 
> 64-bit archs only if needed");

Nit: the parameter is bizarre enough that you could leave the original
description.

-- 
Ueimor

Re: rtk8168 driver help needed

2016-05-07 Thread Francois Romieu

Murali Karicheri  :
[...]
> I am trying to integrate the rtl8168 PCIe card to have Ethernet functional
> on my Keystone EVM.

Which (EVM) one ?

> I purchased the rtl8111c Gib card from Amazon. The Card is detected
> by the RC and I can see it is enumerated and show up when doing lspci command.

What does "the RC" mean ? A different PC ?

> However I can't get the Ethernet port functional.

> Does this need MSI interrupt ?

No.

> I can't see it has requested any.

Yes, something went really, really wrong. See below.

[...]
> [2.303965] PCI host bridge /soc/pcie@2180 ranges:
> [2.309108]   No bus range found for /soc/pcie@2180, using [bus 00-ff]
> [2.316269]IO 0x2325..0x23253fff -> 0x
> [2.321499]   MEM 0x5000..0x5fff -> 0x5000
> [2.331666] keystone-pcie 21801000.pcie: PCI host bridge to bus :00
> [2.338283] pci_bus :00: root bus resource [bus 00-ff]
> [2.343937] pci_bus :00: root bus resource [io  0x-0x3fff]
> [2.350114] pci_bus :00: root bus resource [mem 0x5000-0x5fff]
> [2.357095] pci :00:00.0: [104c:b00b] type 01 class 0x060400
> [2.357665] PCI: bus0: Fast back to back transfers disabled
> [2.363717] pci :01:00.0: [10ec:8168] type 00 class 0x02
> [2.363809] pci :01:00.0: reg 0x10: [io  0x-0x00ff]
> [2.363867] pci :01:00.0: reg 0x18: [mem 0x-0x0fff 64bit]
> [2.363909] pci :01:00.0: reg 0x20: [mem 0x-0x 64bit 
> pref]
> [2.363939] pci :01:00.0: reg 0x30: [mem 0x-0x0001 pref]
> [2.364099] pci :01:00.0: supports D1 D2
> [2.364116] pci :01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> [2.381251] PCI: bus1: Fast back to back transfers disabled
> [2.386989] pci :00:00.0: BAR 8: assigned [mem 0x5000-0x500f]
> [2.393937] pci :00:00.0: BAR 9: assigned [mem 0x5010-0x501f 
> pref]
> [2.401221] pci :00:00.0: BAR 7: assigned [io  0x1000-0x1fff]
> [2.407320] pci :01:00.0: BAR 6: assigned [mem 0x5010-0x5011 
> pref]
> [2.414597] pci :01:00.0: BAR 4: assigned [mem 0x5012-0x5012 
> 64bit pref]
> [2.422380] pci :01:00.0: BAR 2: assigned [mem 0x5000-0x5fff 
> 64bit]
> [2.429702] pci :01:00.0: BAR 0: assigned [io  0x1000-0x10ff]
> [2.435821] pci :00:00.0: PCI bridge to [bus 01]
> [2.440783] pci :00:00.0:   bridge window [io  0x1000-0x1fff]
> [2.446896] pci :00:00.0:   bridge window [mem 0x5000-0x500f]
> [2.453699] pci :00:00.0:   bridge window [mem 0x5010-0x501f 
> pref]
> [2.461453] pcieport :00:00.0: Signaling PME through PCIe PME interrupt
> [2.468411] pci :01:00.0: Signaling PME through PCIe PME interrupt
> [2.475075] pcie_pme :00:00.0:pcie01: service driver pcie_pme loaded
> [2.475392] aer :00:00.0:pcie02: service driver aer loaded
> [2.475652] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [2.481419] r8169 :01:00.0: enabling device (0140 -> 0143)
> [2.488865] r8169 :01:00.0 eth0: RTL8169 at 0xf0d6a000, 
> 00:00:00:00:00:00, XID  IRQ 286

No need to go further, there is a serious problem here.

Most of the XID bits are read from a mapped register. They're definitely not
expected to be null as the driver requires it to identify a proper chipset
(the common PCI configuration registers only tell that you aren't dealing
with a coffee machine).

Any read returning zeroes would not surprize me.

[...]
> Can someone help me figure out what is missing ?

Hardly at this point. I can only suggest to switch power off, plug
the 8168 device in a x86 (32 or 64 bits) system and check how it behaves.

-- 
Ueimor

Re: [PATCH] netdev: enc28j60 kernel panic fix.

2016-05-06 Thread Francois Romieu

(please don't top post)

David Russell  :
> I kind of thought my patch was at best incomplete.  When you state
> this change silences the bug but does not fix it, what are the
> implications of systems running this patch?  We have some production
> systems using this patch.  They reboot daily, but have been solid.

If my assumption is right it should drop an extra packet here and there.
No leak.

However transmit errors + transmit packets should still match the number
of times the driver calls enc28j60_send_packet (you would have to cook
your own stat to check the latter though).

> In addition, if we sent you a pi and the ethernet controller and a
> small but reasonable sum of money for your labor, would you be able to
> properly fix it ?

I'd rather see you testing my crap. :o)

Pi as multi-core (the expected race needs several cores or a netconsole
style transmit from an irq/bh context) ?

> Short of that, do you have any recommendations on quick overviews of
> the networking stack in the kernel and then documentation on the
> various flags and such?

A tad bit too high-level a question... Plain ctags + printk for a start ?

Does the patch below make a difference ?

Takes longer to crash counts as a difference.

diff --git a/drivers/net/ethernet/microchip/enc28j60.c 
b/drivers/net/ethernet/microchip/enc28j60.c
index 7066954..405fe3f 100644
--- a/drivers/net/ethernet/microchip/enc28j60.c
+++ b/drivers/net/ethernet/microchip/enc28j60.c
@@ -1170,7 +1170,8 @@ static void enc28j60_irq_work_handler(struct work_struct 
*work)
enc28j60_dump_tsv(priv, "Tx Done", tsv);
}
enc28j60_tx_clear(ndev, err);
-   locked_reg_bfclr(priv, EIR, EIR_TXIF);
+   locked_reg_bfclr(priv, EIR, EIR_TXIF | EIR_TXERIF);
+   intflags &= ~EIR_TXERIF;
}
/* TX Error handler */
if ((intflags & EIR_TXERIF) != 0) {
@@ -1190,6 +1191,7 @@ static void enc28j60_irq_work_handler(struct work_struct 
*work)
nolock_reg_bfclr(priv, ECON1, ECON1_TXRST);
nolock_txfifo_init(priv, TXSTART_INIT, TXEND_INIT);
mutex_unlock(>lock);
+   locked_reg_bfclr(priv, EIR, EIR_TXIF | EIR_TXERIF);
/* Transmit Late collision check for retransmit */
if (TSV_GETBIT(tsv, TSV_TXLATECOLLISION)) {
if (netif_msg_tx_err(priv))
@@ -1203,7 +1205,6 @@ static void enc28j60_irq_work_handler(struct work_struct 
*work)
enc28j60_tx_clear(ndev, true);
} else
enc28j60_tx_clear(ndev, true);
-   locked_reg_bfclr(priv, EIR, EIR_TXERIF);
}
/* RX Error handler */
if ((intflags & EIR_RXERIF) != 0) {

Re: [PATCH] netdev: enc28j60 kernel panic fix.

2016-05-05 Thread Francois Romieu

David Russell  :
> When connected directly to another system (not via a switch)
> eventually a condition where a NULL pointer dereference occurs in
> enc28j60_hw_tx() and this patch simply checks for that condition and
> returns gracefully without causing a kernel panic.  I believe, but
> have not investigated this is caused by a packet collision and am not
> sure if the kernel tracks collisions or counts them as errors, so that
> should probably be added if this is what's happening.  I'm also not
> familiar with the linux kernel, so may have fixed this in a less than
> ideal way.

Is it possible for EIR.EIR_TXERIF and EIR.EIR_TXIF to be set for the
same packet ?

If so the driver is intrinsically racy:
- EIR.EIR_TXIF completes transmission, clears tx_skb and enables queueing
  again (see netif_wake_queue in enc28j60_tx_clear)

- insert start_xmit here: tx_skb is set and enc28j60_hw_tx is scheduled
  for late execution (user context work)

- EIR.EIR_EIR.EIR_TXERIF issues same enc28j60_tx_clear and clears tx_skb

- enc28j60_hw_tx is run but tx_skb is NULL

> diff --git a/drivers/net/ethernet/microchip/enc28j60.c
> b/drivers/net/ethernet/microchip/enc28j60.c
> index 86ea17e..36ac65f 100644
> --- a/drivers/net/ethernet/microchip/enc28j60.c
> +++ b/drivers/net/ethernet/microchip/enc28j60.c
> @@ -1233,6 +1233,9 @@ static void enc28j60_irq_work_handler(struct
> work_struct *work)
>   */
>  static void enc28j60_hw_tx(struct enc28j60_net *priv)
>  {
> +   if (!priv->tx_skb)
> +   return;
> +
> if (netif_msg_tx_queued(priv))
> printk(KERN_DEBUG DRV_NAME
> ": Tx Packet Len:%d\n", priv->tx_skb->len);

enc28j60_hw_tx isn't the culprit. It's the victim.

This change silences the bug but it does not fix it at all.

-- 
Ueimor

Re: Cannot use NFS with linux-next 20160429

2016-05-03 Thread Francois Romieu

Fabio Estevam  :
[...]
> Today's next shows some different info:
> 
> [7.606456]   #0: wm8962-audio
> [7.672659] VFS: Mounted root (nfs filesystem) readonly on device 0:14.
> [7.680860] devtmpfs: mounted
> [7.685664] Freeing unused kernel memory: 1024K (c0c0 - c0d0)
> [7.871481]
> [7.873004] =
> [7.877381] [ INFO: inconsistent lock state ]
> [7.881760] 4.6.0-rc6-next-20160503-2-g51d9962 #351 Not tainted
> [7.888043] -
> [7.892419] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [7.898449] kworker/0:1H/179 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [7.904040]  (>seq#5){+.?...}, at: [] tcp_ack+0x134/0x129c
> [7.911166] {IN-SOFTIRQ-W} state was registered at:
> [7.916061]   [] lock_acquire+0x78/0x98
> [7.920816]   [] tcp_snd_una_update+0x64/0xa8
> [7.926092]   [] tcp_ack+0x134/0x129c
> [7.930668]   [] tcp_rcv_state_process+0x814/0xfc8
> [7.936375]   [] tcp_v4_do_rcv+0x64/0x1c8
> [7.941305]   [] tcp_v4_rcv+0xf00/0xfbc
> [7.946057]   [] ip_local_deliver_finish+0xd4/0x550
> [7.951859]   [] ip_local_deliver+0xcc/0xdc
> [7.956957]   [] ip_rcv_finish+0xc4/0x744
> [7.961881]   [] ip_rcv+0x4c8/0x7a8
> [7.966284]   [] __netif_receive_skb_core+0x514/0x8ec
> [7.972251]   [] __netif_receive_skb+0x2c/0x8c
> [7.977614]   [] netif_receive_skb_internal+0x7c/0x1f0
> [7.983666]   [] napi_gro_receive+0x88/0xdc
> [7.988764]   [] fec_enet_rx_napi+0x390/0x9c8
> [7.994036]   [] net_rx_action+0x148/0x344
> [7.999046]   [] __do_softirq+0x130/0x2bc
[...]
> [8.165859] stack backtrace:
> [8.170247] CPU: 0 PID: 179 Comm: kworker/0:1H Not tainted
> 4.6.0-rc6-next-20160503-2-g51d9962 #351
> [8.179572] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [8.186137] Workqueue: rpciod rpc_async_schedule
> [8.190791] Backtrace:
> [8.193307] [] (dump_backtrace) from [] > 
> (show_stack+0x18/0x1c)
> [8.200894]  r6:6193 r5: r4: r3:eebdc800
> [8.206692] [] (show_stack) from [] 
> (dump_stack+0xb0/0xe8)
> [8.213961] [] (dump_stack) from [] 
> (print_usage_bug+0x268/0x2dc)
> [8.221809]  r8:0004 r7:eebdcd00 r6:eebdc800 r5:c0ae4bbc r4:c0ec6054 
> r3:eebdc800
> [8.229712] [] (print_usage_bug) from [] 
> (mark_lock+0x29c/0x6b0)
> [8.237472]  r10:c016a1c8 r8:0004 r7:eebdc800 r6:1054 r5:eebdcd00 
> r4:0006
> [8.245456] [] (mark_lock) from [] 
> (__lock_acquire+0x550/0x17c8)
> [8.253216]  r10:c0d21d9c r9:02be r8:c0e97784 r7:eebdc800 r6:c153a09c 
> r5:eebdcd00
> [8.261188]  r4:0003 r3:0001
> [8.264837] [] (__lock_acquire) from [] 
> (lock_acquire+0x78/0x98)
> [8.272598]  r10:0001 r9:c0752328 r8:2d738f6b r7:0001 r6:c0752328 
> r5:6113
> [8.280568]  r4:
> [8.283155] [] (lock_acquire) from [] 
> (tcp_snd_una_update+0x64/0xa8)
> [8.291261]  r7: r6:ee6b9500 r5:ee6b9500 r4:ee6b99cc
> [8.297050] [] (tcp_snd_una_update) from [] 
> (tcp_ack+0x134/0x129c)
> [8.304984]  r10:ee6b9570 r9:ee42f9c0 r8:2d738f6b r7:c0d02100 r6:0002 
> r5:ee6b9500
> [8.312956]  r4:0002
> [8.315542] [] (tcp_ack) from [] 
> (tcp_rcv_established+0x140/0x774)
> [8.323477]  r10:ee6b9570 r9:ee42f9c0 r8:c0d6bfb3 r7:c155a080 r6:ee6e9a62 
> r5:ee42f9c0
> [8.331448]  r4:ee6b9500
> [8.334039] [] (tcp_rcv_established) from [] 
> (tcp_v4_do_rcv+0x160/0x1c8)
> [8.342494]  r8:c0d6bfb3 r7:c155a080 r6:eea79600 r5:ee6b9500 r4:ee42f9c0
> [8.349348] [] (tcp_v4_do_rcv) from [] 
> (__release_sock+0x94/0x124)
> [8.357281]  r6: r5:ee6b9500 r4: r3:c075e79c
> [8.363065] [] (__release_sock) from [] 
> (release_sock+0x34/0xa4)

The latter now runs with bh enabled. Ask Eric for a proper
replacement of u64_stats_update_ in tcp_snd_una_update.

(it's 00:30 here)

-- 
Ueimor

[PATCH net-next #2 1/1] pch_gbe: replace private tx ring lock with common netif_tx_lock

2016-04-27 Thread Francois Romieu

pch_gbe_tx_ring.tx_lock is only used in the hard_xmit handler and
in the transmit completion reaper called from NAPI context.

Compile-tested only. Potential victims Cced.

Someone more knowledgeable may check if pch_gbe_tx_queue could
have some use for a mmiowb.

Signed-off-by: Francois Romieu <rom...@fr.zoreil.com>
Cc: Darren Hart <dvh...@infradead.org>
Cc: Andy Cress <andy.cr...@us.kontron.com>
Cc: br...@fossetcon.org

---
 Includes Nikolay's fix.

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h  |  2 --
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 10 ++
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 2a55d6d..8d710a3 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -481,7 +481,6 @@ struct pch_gbe_buffer {
 
 /**
  * struct pch_gbe_tx_ring - tx ring information
- * @tx_lock:   spinlock structs
  * @desc:  pointer to the descriptor ring memory
  * @dma:   physical address of the descriptor ring
  * @size:  length of descriptor ring in bytes
@@ -491,7 +490,6 @@ struct pch_gbe_buffer {
  * @buffer_info:   array of buffer information structs
  */
 struct pch_gbe_tx_ring {
-   spinlock_t tx_lock;
struct pch_gbe_tx_desc *desc;
dma_addr_t dma;
unsigned int size;
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index ca4add7..3cd87a4 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1640,7 +1640,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
   cleaned_count);
if (cleaned_count > 0)  { /*skip this if nothing cleaned*/
/* Recover from running out of Tx resources in xmit_frame */
-   spin_lock(_ring->tx_lock);
+   netif_tx_lock(adapter->netdev);
if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev
{
netif_wake_queue(adapter->netdev);
@@ -1652,7 +1652,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
 
netdev_dbg(adapter->netdev, "next_to_clean : %d\n",
   tx_ring->next_to_clean);
-   spin_unlock(_ring->tx_lock);
+   netif_tx_unlock(adapter->netdev);
}
return cleaned;
 }
@@ -1805,7 +1805,6 @@ int pch_gbe_setup_tx_resources(struct pch_gbe_adapter 
*adapter,
 
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
-   spin_lock_init(_ring->tx_lock);
 
for (desNo = 0; desNo < tx_ring->count; desNo++) {
tx_desc = PCH_GBE_TX_DESC(*tx_ring, desNo);
@@ -2135,13 +2134,9 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, 
struct net_device *netdev)
 {
struct pch_gbe_adapter *adapter = netdev_priv(netdev);
struct pch_gbe_tx_ring *tx_ring = adapter->tx_ring;
-   unsigned long flags;
-
-   spin_lock_irqsave(_ring->tx_lock, flags);
 
if (unlikely(!PCH_GBE_DESC_UNUSED(tx_ring))) {
netif_stop_queue(netdev);
-   spin_unlock_irqrestore(_ring->tx_lock, flags);
netdev_dbg(netdev,
   "Return : BUSY  next_to use : 0x%08x  next_to clean 
: 0x%08x\n",
   tx_ring->next_to_use, tx_ring->next_to_clean);
@@ -2150,7 +2145,6 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct 
net_device *netdev)
 
/* CRC,ITAG no support */
pch_gbe_tx_queue(adapter, tx_ring, skb);
-   spin_unlock_irqrestore(_ring->tx_lock, flags);
return NETDEV_TX_OK;
 }
 
-- 
2.5.5

Re: [PATCH net-next 1/1] pch_gbe: replace private tx ring lock with common netif_tx_lock

2016-04-27 Thread Francois Romieu

Nikolay Aleksandrov <niko...@cumulusnetworks.com> :
> On 04/27/2016 12:49 AM, Francois Romieu wrote:
[...]
> > @@ -1652,7 +1652,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
> >  
> > netdev_dbg(adapter->netdev, "next_to_clean : %d\n",
> >tx_ring->next_to_clean);
> > -   spin_unlock(_ring->tx_lock);
> > +   netif_tx_lock(adapter->netdev);
> 
> Shouldn't this be netif_tx_unlock ?

It should. Thanks for reviewing.

-- 
Ueimor

[PATCH net-next 1/1] pch_gbe: replace private tx ring lock with common netif_tx_lock

2016-04-26 Thread Francois Romieu

pch_gbe_tx_ring.tx_lock is only used in the hard_xmit handler and
in the transmit completion reaper called from NAPI context.

Signed-off-by: Francois Romieu <rom...@fr.zoreil.com>
---
 CONFIG_COMPILE_TESTed

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h  |  2 --
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 10 ++
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 2a55d6d..8d710a3 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -481,7 +481,6 @@ struct pch_gbe_buffer {
 
 /**
  * struct pch_gbe_tx_ring - tx ring information
- * @tx_lock:   spinlock structs
  * @desc:  pointer to the descriptor ring memory
  * @dma:   physical address of the descriptor ring
  * @size:  length of descriptor ring in bytes
@@ -491,7 +490,6 @@ struct pch_gbe_buffer {
  * @buffer_info:   array of buffer information structs
  */
 struct pch_gbe_tx_ring {
-   spinlock_t tx_lock;
struct pch_gbe_tx_desc *desc;
dma_addr_t dma;
unsigned int size;
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index ca4add7..5c8e2f1 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1640,7 +1640,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
   cleaned_count);
if (cleaned_count > 0)  { /*skip this if nothing cleaned*/
/* Recover from running out of Tx resources in xmit_frame */
-   spin_lock(_ring->tx_lock);
+   netif_tx_lock(adapter->netdev);
if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev
{
netif_wake_queue(adapter->netdev);
@@ -1652,7 +1652,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
 
netdev_dbg(adapter->netdev, "next_to_clean : %d\n",
   tx_ring->next_to_clean);
-   spin_unlock(_ring->tx_lock);
+   netif_tx_lock(adapter->netdev);
}
return cleaned;
 }
@@ -1805,7 +1805,6 @@ int pch_gbe_setup_tx_resources(struct pch_gbe_adapter 
*adapter,
 
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
-   spin_lock_init(_ring->tx_lock);
 
for (desNo = 0; desNo < tx_ring->count; desNo++) {
tx_desc = PCH_GBE_TX_DESC(*tx_ring, desNo);
@@ -2135,13 +2134,9 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, 
struct net_device *netdev)
 {
struct pch_gbe_adapter *adapter = netdev_priv(netdev);
struct pch_gbe_tx_ring *tx_ring = adapter->tx_ring;
-   unsigned long flags;
-
-   spin_lock_irqsave(_ring->tx_lock, flags);
 
if (unlikely(!PCH_GBE_DESC_UNUSED(tx_ring))) {
netif_stop_queue(netdev);
-   spin_unlock_irqrestore(_ring->tx_lock, flags);
netdev_dbg(netdev,
   "Return : BUSY  next_to use : 0x%08x  next_to clean 
: 0x%08x\n",
   tx_ring->next_to_use, tx_ring->next_to_clean);
@@ -2150,7 +2145,6 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct 
net_device *netdev)
 
/* CRC,ITAG no support */
pch_gbe_tx_queue(adapter, tx_ring, skb);
-   spin_unlock_irqrestore(_ring->tx_lock, flags);
return NETDEV_TX_OK;
 }
 
-- 
2.5.5

Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu

Florian Westphal <f...@strlen.de> :
> Francois Romieu <rom...@fr.zoreil.com> wrote:
[...]
> > Play it safe and keep the implicit local_irq_{save / restore} call ?
> > 
> > It may not be needed but it will help avoiding any unexpected regression
> > report pointing at the NETDEV_TX_LOCKED removal change.
> 
> I thought about that but it doesn't prevent the irq handler from
> running on another CPU, so leaving it around seemed like cargo culting
> to me...

I don't mind removing it in a different patch at all. I'd rather see
the commit history underline that it's unrelated to whatever
NETDEV_TX_LOCKED / LLTX change.

> I don't have an atl1c, but the atl1e in my laptop seems to work fine
> with the (similar) change.
> 
> If you disagree I can respin with local_irq_save of course, but, if
> 'playing it safe' is main goal then its simpler to convert
> spin_trylock_irqsave to spin_lock_irqsave.

Your call, really.

-- 
Ueimor

Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu

Florian Westphal  :
[...]
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index d0084d4..a3200ea 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
[...]
> @@ -2217,16 +2215,10 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff 
> *skb,
>   }
>  
>   tpd_req = atl1c_cal_tpd_req(skb);
> - if (!spin_trylock_irqsave(>tx_lock, flags)) {
> - if (netif_msg_pktdata(adapter))
> - dev_info(>pdev->dev, "tx locked\n");
> - return NETDEV_TX_LOCKED;
> - }
>  
>   if (atl1c_tpd_avail(adapter, type) < tpd_req) {
>   /* no enough descriptor, just stop queue */
>   netif_stop_queue(netdev);
> - spin_unlock_irqrestore(>tx_lock, flags);
>   return NETDEV_TX_BUSY;
>   }
>

Play it safe and keep the implicit local_irq_{save / restore} call ?

It may not be needed but it will help avoiding any unexpected regression
report pointing at the NETDEV_TX_LOCKED removal change.

-- 
Ueimor

Re: [PATCH net-next 2/6] atl1c: remove private tx lock

2016-04-25 Thread Francois Romieu

Florian Westphal  :
[...]
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index d0084d4..a3200ea 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
[...]
> @@ -2217,16 +2215,10 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff 
> *skb,
>   }
>  
>   tpd_req = atl1c_cal_tpd_req(skb);
> - if (!spin_trylock_irqsave(>tx_lock, flags)) {
> - if (netif_msg_pktdata(adapter))
> - dev_info(>pdev->dev, "tx locked\n");
> - return NETDEV_TX_LOCKED;
> - }
>  
>   if (atl1c_tpd_avail(adapter, type) < tpd_req) {
>   /* no enough descriptor, just stop queue */
>   netif_stop_queue(netdev);
> - spin_unlock_irqrestore(>tx_lock, flags);
>   return NETDEV_TX_BUSY;
>   }
>

Play it safe and keep the implicit local_irq_{save / restore} call ?

It may not be needed but it will help avoiding any unexpected regression
report pointing at the NETDEV_TX_LOCKED removal change.

-- 
Ueimor

Re: [PATCH] net: sxgbe: fix error paths in sxgbe_platform_probe()

2016-03-27 Thread Francois Romieu

Rasmus Villemoes <li...@rasmusvillemoes.dk> :
> We need to use post-decrement to ensure that irq_dispose_mapping is
> also called on priv->rxq[0]->irq_no; moreover, if one of the above for
> loops failed already at i==0 (so we reach one of these labels with
> that value of i), we'll enter an essentially infinite loop of
> out-of-bounds accesses.
> 
> Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>

(ok, i is signed)

Reviewed-by: Francois Romieu <rom...@fr.zoreil.com>

Someone messed with my review on 2014/03/25 and got it wrong. :o/

Two years after the initial submission, there is zero change regarding use
of sxgbe_core_ops for extension or manageability. The extra indirection is
ripe for removal during next net-next.

-- 
Ueimor

Re: [PATCH net-next v3.16]r8169: Correct value from speed 10 on MII_BMCR

2016-03-25 Thread Francois Romieu

Phil Sutter  :
[...]
> Your patch submissions are getting better, also good to see you're
> finally using git-send-email. A few things need to be corrected though:
> 

#define BMCR_RESV   0x003f  /* Unused...   */
#define BMCR_SPEED1000  0x0040  /* MSB of Speed (1000) */
#define BMCR_CTST   0x0080  /* Collision test  */
#define BMCR_FULLDPLX   0x0100  /* Full duplex */
#define BMCR_ANRESTART  0x0200  /* Auto negotiation restart*/
#define BMCR_ISOLATE0x0400  /* Isolate data paths from MII */
#define BMCR_PDOWN  0x0800  /* Enable low power state  */
#define BMCR_ANENABLE   0x1000  /* Enable auto negotiation */
#define BMCR_SPEED100   0x2000  /* Select 100Mbps  */
#define BMCR_LOOPBACK   0x4000  /* TXD loopback bits   */

BMCR_SPEED100 apart, *all* these bits are now set.

It does not make much sense.

> Also detailed instructions on how to trigger the problem you are fixing
> for would be good. In detail: Which specific hardware was used, in which
> situation did the problem occur, how did it behave in that situation and
> what was the expected behaviour?

Been there. Such requests are usually left unanswered. :o(

Btw, this stuff targets 3.16 (...) and net-next is still closed.

-- 
Ueimor

Re: SYN flooding on port 80 + DMAR:[DMA Write] faults

2016-03-12 Thread Francois Romieu

Toralf Förster  :
> Today my server (64 bit hardened Gentoo kernel) was faced a SYN-flood attack.
> I do wonder if the DMAR events points to an issue in the kernel ?

Please send a compressed log including all 'fault addr' lines as well
as the (module probe time) XID line from the r8169 driver.

-- 
Ueimor

1 2 3 4 5 6 7 8 9 >

1 - 100 of 800 matches

Mail list logo