date:20170830

net/ipv4: divide error in __tcp_select_window

2017-08-30 Thread idaifish

Hi:
   This bug seems still can be triggered by the attached PoC on latest
Ubuntu1604 (4.4.0-94-generic)


divide error:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 14933 Comm: syz-executor0 Not tainted 4.9.45 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: 880076ab9900 task.stack: 880062ae8000
RIP: 0010:[]  []
__tcp_select_window+0x2f3/0x6b0 net/ipv4/tcp_output.c:2499
RSP: 0018:880062aef6e8  EFLAGS: 00010283
RAX: 00ac RBX:  RCX: c9000195b000
RDX:  RSI: 0436 RDI: 880079add085
RBP: 880062aef728 R08: 1800 R09: 0002
R10:  R11:  R12: 00ac
R13:  R14:  R15: 0001
FS:  7f15c239a700() GS:88007fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2001c000 CR3: 79628000 CR4: 06f0
DR0: 8000 DR1: 8000 DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 880062aef6e8 880062aef6e8  880079adca40
  0436 880079adcae8 0436
 880062aef758 8297c36e 0068 
Call Trace:
 [] tcp_cleanup_rbuf+0x43e/0x4f0 net/ipv4/tcp.c:1468
 [] tcp_recvmsg+0xc2f/0x25d0 net/ipv4/tcp.c:1937
 [] inet_recvmsg+0x26e/0x3b0 net/ipv4/af_inet.c:765
 [] sock_recvmsg_nosec+0x8a/0xb0 net/socket.c:723
 [] ___sys_recvmsg+0x229/0x510 net/socket.c:2113
 [] __sys_recvmmsg+0x23e/0x660 net/socket.c:2221
 [] SYSC_recvmmsg net/socket.c:2302 [inline]
 [] SyS_recvmmsg+0xdf/0x180 net/socket.c:2286
 [] entry_SYSCALL_64_fastpath+0x1a/0xa9
Code: ec 7c 1f e8 b0 44 9d fe 44 3b 75 d4 75 c2 e8 a5 44 9d fe 8b 45
d0 44 01 e8 41 39 c4 41 0f 4f dc eb ae e8 91 44 9d fe 44 89 e0 99 <41>
f7 fe 41 0f af c6 89 c3 eb 9a e8 7d 44 9d fe 48 8d bb 91 04
RIP  [] __tcp_select_window+0x2f3/0x6b0
net/ipv4/tcp_output.c:2499
 RSP 
---[ end trace 771dfab907a5c7aa ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Rebooting in 86400 seconds..



-- 
Regards,
idaifish
divide error:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 14933 Comm: syz-executor0 Not tainted 4.9.45 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: 880076ab9900 task.stack: 880062ae8000
RIP: 0010:[]  [] 
__tcp_select_window+0x2f3/0x6b0 net/ipv4/tcp_output.c:2499
RSP: 0018:880062aef6e8  EFLAGS: 00010283
RAX: 00ac RBX:  RCX: c9000195b000
RDX:  RSI: 0436 RDI: 880079add085
RBP: 880062aef728 R08: 1800 R09: 0002
R10:  R11:  R12: 00ac
R13:  R14:  R15: 0001
FS:  7f15c239a700() GS:88007fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2001c000 CR3: 79628000 CR4: 06f0
DR0: 8000 DR1: 8000 DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 880062aef6e8 880062aef6e8  880079adca40
  0436 880079adcae8 0436
 880062aef758 8297c36e 0068 
Call Trace:
 [] tcp_cleanup_rbuf+0x43e/0x4f0 net/ipv4/tcp.c:1468
 [] tcp_recvmsg+0xc2f/0x25d0 net/ipv4/tcp.c:1937
 [] inet_recvmsg+0x26e/0x3b0 net/ipv4/af_inet.c:765
 [] sock_recvmsg_nosec+0x8a/0xb0 net/socket.c:723
 [] ___sys_recvmsg+0x229/0x510 net/socket.c:2113
 [] __sys_recvmmsg+0x23e/0x660 net/socket.c:2221
 [] SYSC_recvmmsg net/socket.c:2302 [inline]
 [] SyS_recvmmsg+0xdf/0x180 net/socket.c:2286
 [] entry_SYSCALL_64_fastpath+0x1a/0xa9
Code: ec 7c 1f e8 b0 44 9d fe 44 3b 75 d4 75 c2 e8 a5 44 9d fe 8b 45 d0 44 01 
e8 41 39 c4 41 0f 4f dc eb ae e8 91 44 9d fe 44 89 e0 99 <41> f7 fe 41 0f af c6 
89 c3 eb 9a e8 7d 44 9d fe 48 8d bb 91 04
RIP  [] __tcp_select_window+0x2f3/0x6b0 
net/ipv4/tcp_output.c:2499
 RSP 
---[ end trace 771dfab907a5c7aa ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Rebooting in 86400 seconds..
// autogenerated by syzkaller (http://github.com/google/syzkaller)

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

const int kFailStatus = 67;
const int kRetryStatus = 69;

__attribute__((noreturn)) static void doexit(int status)
{
  volatile unsigned i;
  syscall(__NR_exit_group, status);
  for (i = 0;; i++) {
  }
}

__attribute__((noreturn))

[PATCH] bnx2x: drop packets where gso_size is too big for hardware

2017-08-30 Thread Daniel Axtens

If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x 
0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 
7_13_1
... (dump of values continues) ...

Detect when gso_size + header length is greater than the maximum
packet size (9700 bytes) and drop the packet.

This raises the obvious question - how do we end up with a packet with
a gso_size that's greater than 9700? This has been observed on an
powerpc system when Open vSwitch is forwarding a packet from an
ibmveth device.

ibmveth is a bit special. It's the driver for communication between
virtual machines (aka 'partitions'/LPARs) running under IBM's
proprietary hypervisor on ppc machines. It allows sending very large
packets (up to 64kB) between LPARs. This involves some quite
'interesting' things: for example, when talking TCP, the MSS is stored
the checksum field (see ibmveth_rx_mss_helper() in ibmveth.c).

Normally on a box like this, there would be a Virtual I/O Server
(VIOS) partition that owns the physical network card. VIOS lets the
AIX partitions know when they're talking to a real network and that
they should drop their MSS. This works fine if VIOS owns the physical
network card.

However, in this case, a Linux partition owns the card (this is known
as a NovaLink setup). The negotiation between VIOS and AIX uses a
non-standard TCP option, so Linux has never supported that.  Instead,
Linux just supports receiving large packets. It doesn't support any
form of messaging/MSS negotiation back to other LPARs.

To get some clarity about where the large MSS was coming from, I asked
Thomas Falcon, the maintainer of ibmveth, for some background:

"In most cases, large segments are an aggregation of smaller packets
by the Virtual I/O Server (VIOS) partition and then are forwarded to
the Linux LPAR / ibmveth driver. These segments can be as large as
64KB. In this case, since the customer is using Novalink, I believe
what is happening is pretty straightforward: the large segments are
created by the AIX partition and then forwarded to the Linux
partition, ... The ibmveth driver doesn't do any aggregation itself
but just ensures the proper bits are set before sending the frame up
to avoid giving the upper layers indigestion."

It is possible to stop AIX from sending these large segments, but it
requires configuration on each LPAR. While ibmveth's behaviour is
admittedly weird, we should fix this here: it shouldn't be possible
for it to cause a firmware panic on another card.

Cc: Thomas Falcon  # ibmveth
Cc: Yuval Mintz  # bnx2x
Thanks-to: Jay Vosburgh  # veth info
Signed-off-by: Daniel Axtens 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 33 +++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c |  1 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 352beff796ae..b36d54737d70 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -2517,4 +2517,6 @@ void bnx2x_set_rx_ts(struct bnx2x *bp, struct sk_buff 
*skb);
  */
 int bnx2x_vlan_reconfigure_vid(struct bnx2x *bp);
 
+#define MAX_PACKET_SIZE(9700)
+
 #endif /* bnx2x.h */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1216c1f1e052..1c5517a9348c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3742,6 +3742,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
__le16 pkt_size = 0;
struct ethhdr *eth;
u8 mac_type = UNICAST_ADDRESS;
+   unsigned int pkts_compl = 0, bytes_compl = 0;
 
 #ifdef BNX2X_STOP_ON_ERROR
if (unlikely(bp->panic))
@@ -4029,6 +4030,14 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
   skb->len, hlen, skb_headlen(skb),
   skb_shinfo(skb)->gso_size);
 
+   if (unlikely(skb_shinfo(skb)->gso_size + hlen > 
MAX_PACKET_SIZE)) {
+   BNX2X_ERR("reported gso segment size plus headers "
+ "(%d + %d) > MAX_PACKET_SIZE; dropping pkt!",
+ skb_shinfo(skb)->gso_size, hlen);
+
+   goto free_and_drop;
+   }
+
tx_start_bd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_SW_LSO;
 
if

[PATCH] net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()

2017-08-30 Thread Andrii

Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c, 
similar

to the handling in net/ipv6/tcp_ipv6.c

Signed-off-by: Andrii Vladyka 
---

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 1b58eac..35c2edb 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -30,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dccp.h"
 #include "ipv6.h"
@@ -597,19 +599,13 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
   --ANK (980728)
 */
if (np->rxopt.all)
-   /*
-* FIXME: Add handling of IPV6_PKTOPTIONS skb. See the comments below
-*(wrt ipv6_pktopions) and net/ipv6/tcp_ipv6.c for an example.
-*/
opt_skb = skb_clone(skb, GFP_ATOMIC);
 
if (sk->sk_state == DCCP_OPEN) { /* Fast path */
if (dccp_rcv_established(sk, skb, dccp_hdr(skb), skb->len))
goto reset;
-   if (opt_skb) {
-   /* XXX This is where we would goto ipv6_pktoptions. */
-   __kfree_skb(opt_skb);
-   }
+   if (opt_skb)
+   goto ipv6_pktoptions;
return 0;
}
 
@@ -640,10 +636,8 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
 
if (dccp_rcv_state_process(sk, skb, dccp_hdr(skb), skb->len))
goto reset;
-   if (opt_skb) {
-   /* XXX This is where we would goto ipv6_pktoptions. */
-   __kfree_skb(opt_skb);
-   }
+   if (opt_skb)
+   goto ipv6_pktoptions;
return 0;
 
 reset:
@@ -653,6 +647,35 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
__kfree_skb(opt_skb);
kfree_skb(skb);
return 0;
+
+/* Handling IPV6_PKTOPTIONS skb the similar
+ * way it's done for net/ipv6/tcp_ipv6.c
+ */
+ipv6_pktoptions:
+   if (!((1 << sk->sk_state) & (DCCPF_CLOSED | DCCPF_LISTEN))) {
+   if (np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo)
+   np->mcast_oif = inet6_iif(opt_skb);
+   if (np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim)
+   np->mcast_hops = ipv6_hdr(opt_skb)->hop_limit;
+   if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
+   np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
+   if (np->repflow)
+   np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb));
+   if (ipv6_opt_accepted(sk, opt_skb,
+ _SKB_CB(opt_skb)->header.h6)) {
+   skb_set_owner_r(opt_skb, sk);
+   memmove(IP6CB(opt_skb),
+   _SKB_CB(opt_skb)->header.h6,
+   sizeof(struct inet6_skb_parm));
+   opt_skb = xchg(>pktoptions, opt_skb);
+   } else {
+   __kfree_skb(opt_skb);
+   opt_skb = xchg(>pktoptions, NULL);
+   }
+   }
+
+   kfree_skb(opt_skb);
+   return 0;
 }
 
 static int dccp_v6_rcv(struct sk_buff *skb)

Re: [PATCH] net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()

2017-08-30 Thread Andrii


I'll fix and re-send. Thanks.


On 8/31/2017 8:16 AM, David Miller wrote:

From: Andrii Vladyka 
Date: Wed, 30 Aug 2017 09:04:35 +0300


+   if (opt_skb)

 

Trailing whitespace.


@@ -653,6 +647,36 @@ static int dccp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
__kfree_skb(opt_skb);
kfree_skb(skb);
return 0;
+   

^^^

Likewise.

[PATCH net-next] bridge: add tracepoint in br_fdb_update

2017-08-30 Thread Roopa Prabhu

From: Roopa Prabhu 

This extends bridge fdb table tracepoints to also cover
learned fdb entries in the br_fdb_update path. Note that
unlike other tracepoints I have moved this to when the fdb
is modified because this is in the datapath and can generate
a lot of noise in the trace output. br_fdb_update is also called
from added_by_user context in the NTF_USE case which is already
traced ..hence the !added_by_user check.

Signed-off-by: Roopa Prabhu 
---
 include/trace/events/bridge.h | 31 +++
 net/bridge/br_fdb.c   |  5 -
 net/core/net-traces.c |  1 +
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
index 0f1cde0..1bee3e7 100644
--- a/include/trace/events/bridge.h
+++ b/include/trace/events/bridge.h
@@ -92,6 +92,37 @@ TRACE_EVENT(fdb_delete,
  __entry->addr[4], __entry->addr[5], __entry->vid)
 );
 
+TRACE_EVENT(br_fdb_update,
+
+   TP_PROTO(struct net_bridge *br, struct net_bridge_port *source,
+const unsigned char *addr, u16 vid, bool added_by_user),
+
+   TP_ARGS(br, source, addr, vid, added_by_user),
+
+   TP_STRUCT__entry(
+   __string(br_dev, br->dev->name)
+   __string(dev, source->dev->name)
+   __array(unsigned char, addr, ETH_ALEN)
+   __field(u16, vid)
+   __field(bool, added_by_user)
+   ),
+
+   TP_fast_assign(
+   __assign_str(br_dev, br->dev->name);
+   __assign_str(dev, source->dev->name);
+   memcpy(__entry->addr, addr, ETH_ALEN);
+   __entry->vid = vid;
+   __entry->added_by_user = added_by_user;
+   ),
+
+   TP_printk("br_dev %s source %s addr %02x:%02x:%02x:%02x:%02x:%02x vid 
%u added_by_user %d",
+ __get_str(br_dev), __get_str(dev), __entry->addr[0],
+ __entry->addr[1], __entry->addr[2], __entry->addr[3],
+ __entry->addr[4], __entry->addr[5], __entry->vid,
+ __entry->added_by_user)
+);
+
+
 #endif /* _TRACE_BRIDGE_H */
 
 /* This part must be outside protection */
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index be5e1da..4ea5c8b 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -583,8 +583,10 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
fdb->updated = now;
if (unlikely(added_by_user))
fdb->added_by_user = 1;
-   if (unlikely(fdb_modified))
+   if (unlikely(fdb_modified)) {
+   trace_br_fdb_update(br, source, addr, vid, 
added_by_user);
fdb_notify(br, fdb, RTM_NEWNEIGH);
+   }
}
} else {
spin_lock(>hash_lock);
@@ -593,6 +595,7 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
if (fdb) {
if (unlikely(added_by_user))
fdb->added_by_user = 1;
+   trace_br_fdb_update(br, source, addr, vid, 
added_by_user);
fdb_notify(br, fdb, RTM_NEWNEIGH);
}
}
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 4a0292c..1132820 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -42,6 +42,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(fib6_table_lookup);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
+EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
 #endif
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
-- 
2.1.4

Re: [PATCH] net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()

2017-08-30 Thread David Miller

From: Andrii Vladyka 
Date: Wed, 30 Aug 2017 09:04:35 +0300

> + if (opt_skb) 


Trailing whitespace.

> @@ -653,6 +647,36 @@ static int dccp_v6_do_rcv(struct sock *sk, struct 
> sk_buff *skb)
>   __kfree_skb(opt_skb);
>   kfree_skb(skb);
>   return 0;
> + 
   ^^^

Likewise.

Re: [pull request][net-next 0/3] Mellanox, mlx5 GRE tunnel offloads

2017-08-30 Thread David Miller

From: Saeed Mahameed 
Date: Thu, 31 Aug 2017 02:04:06 +0300

> The following changes provide GRE tunnel offloads for mlx5 ethernet netdevice 
> driver.
> 
> For more details please see tag log message below.
> Please pull and let me know if there's any problem.
> 
> Note: this series doesn't conflict with the ongoing net mlx5 submission.

Looks good, pulled.

Re: [PATCH net-next] liquidio: fix crash in presence of zeroed-out base address regs

2017-08-30 Thread David Miller

From: Felix Manlunas 
Date: Wed, 30 Aug 2017 16:19:53 -0700

> From: Rick Farrington 
> 
> Fix crash in linux PF driver when BARs have been cleared/de-programmed;
> fail early init (prior to mapping BARs) if the BAR0 or
> BAR1 registers are zero.
> 
> This situation can arise when the PF is added to a VM (PCI pass-through),
> then a PF FLR is issued (in the VM).  After this occurs, the BAR registers
> will be zero. If we attempt to load the PF driver in the host
> (after VM has been shutdown), the host can reset.
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Raghu Vatsavayi 
> Signed-off-by: Felix Manlunas 

Applied, thanks.

Re: [PATCH net-next] devlink: Maintain consistency in mac field name

2017-08-30 Thread David Miller

From: David Ahern 
Date: Wed, 30 Aug 2017 17:07:30 -0700

> IPv4 name uses "destination ip" as does the IPv6 patch set.
> Make the mac field consistent.
> 
> Signed-off-by: David Ahern 

Applied, thanks.

Re: [PATCH] rtlwifi: btcoex: 23b 1ant: fix duplicated code for different branches

2017-08-30 Thread Larry Finger


On 08/30/2017 08:42 AM, Gustavo A. R. Silva wrote:

Refactor code in order to avoid identical code for different branches.

This issue was detected with the help of Coccinelle.

Addresses-Coverity-ID: 1226788
Signed-off-by: Gustavo A. R. Silva 
---
This issue was reported by Coverity and it was tested by compilation only.
I'm suspicious this may be a copy/paste error. Please, verify.

  .../net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b1ant.c   | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)


This change is not correct. When bt_link_info->sco_exist is true, the call 
should be

halbtc8723b1ant_limited_rx(btcoexist,
   NORMAL_EXEC, true,
   false, 0x5);

NACK

I will push the correct patch.

Larry

Re: [GIT] Networking

2017-08-30 Thread Kalle Valo

David Miller  writes:

> From: Kalle Valo 
> Date: Wed, 30 Aug 2017 20:31:31 +0300
>
>> AFAICS the bug was introduced by 9df86e2e702c6 back in 2010. If the bug
>> has been there for 7 years so waiting for a few more weeks should not
>> hurt.
>
> As a maintainer you have a right to handle bug fixing in that way, but
> certainly that is not how I would handle this.
>
> It's easy to validate this fix, it's extremely unlikely to cause
> a regression, and fixes a problem someone actually was able to
> trigger.
>
> Deferring to -next only has the side effect of making people wait
> longer for the fix.

Yeah, you are right there. I did actually ponder which I tree should
commit it back in July but due to various reasons decided differently.

-- 
Kalle Valo

Re: [PATCH] rtlwifi: rtl8723be: fix duplicated code for different branches

2017-08-30 Thread Larry Finger


On 08/30/2017 12:04 PM, Gustavo A. R. Silva wrote:

Refactor code in order to avoid identical code for different branches.

Addresses-Coverity-ID: 1248728
Signed-off-by: Gustavo A. R. Silva 


According to Realtek, this change is OK.

Acked-by: Larry Finger 

Thanks,

Larry


---
This issue was reported by Coverity and it was tested by compilation only.
Please, verify if this is not a copy/paste error.
Also, notice this code has been there since 2014.

  drivers/net/wireless/realtek/rtlwifi/rtl8723be/dm.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/dm.c
index 131c0d1..15c117e 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/dm.c
@@ -883,12 +883,8 @@ static void 
rtl8723be_dm_txpower_tracking_callback_thermalmeter(
if ((rtldm->power_index_offset[RF90_PATH_A] != 0) &&
(rtldm->txpower_track_control)) {
rtldm->done_txpower = true;
-   if (thermalvalue > rtlefuse->eeprom_thermalmeter)
-   rtl8723be_dm_tx_power_track_set_power(hw, BBSWING, 0,
-index_for_channel);
-   else
-   rtl8723be_dm_tx_power_track_set_power(hw, BBSWING, 0,
-index_for_channel);
+   rtl8723be_dm_tx_power_track_set_power(hw, BBSWING, 0,
+ index_for_channel);
  
  		rtldm->swing_idx_cck_base = rtldm->swing_idx_cck;

rtldm->swing_idx_ofdm_base[RF90_PATH_A] =

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-08-30 Thread Mike Galbraith

On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook  wrote:
> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith  wrote:
> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
> >>
> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> >>> running haveged will help me trigger this on my system...
> >>
> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
> >
> > Wonderful! Thank you so much for helping track this down.
> >
> > So, it seems that sk_buff.users will need some more special attention
> > before we can convert it to refcount.
> >
> > x86-refcount will saturate with refcount_dec_and_test() if the result
> > is negative. But that would mean at least starting at 0. FULL should
> > have WARNed in this case, so I remain slightly confused why it was
> > missed by FULL.
> 
> Actually, if this is a race condition it's possible that FULL is slow
> enough to miss it...
> 
> I bet something briefly takes the refcount negative, and with
> unchecked atomics, it come back up positive again during the race.
> FULL may miss the race, and x86-refcount will catch it and saturate...

Hm, I'll go have a stare.. not that that's likely to turn anything up,
memory ordering stares usually inducing a zombie like state.

-Mike

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-30 Thread Greg KH

On Tue, Aug 29, 2017 at 11:32:58PM +0200, Ondrej Zary wrote:
> On Tuesday 29 August 2017 01:42:08 David Miller wrote:
> > From: Greg Kroah-Hartman 
> > Date: Sun, 27 Aug 2017 17:03:30 +0200
> >
> > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > from trying to use it, and to prevent people from having to maintain it,
> > > let's move it to drivers/staging/ so that we can delete it entirely from
> > > the kernel in a few releases.
> >
> > No objection, I'll apply this to net-next, thanks Greg.
> 
> IRDA works fine in Debian 9 (kernel 4.9) and I use it for simple file 
> transfer. Hope I'm not the only one...
> 
> # irattach /dev/ttyS0 -d tekram -s
> # irdadump
> 21:28:52.830350 xid:cmd aed8eb79 >  S=6 s=0 (14)
> 21:28:52.922368 xid:cmd aed8eb79 >  S=6 s=1 (14)
> 21:28:53.014350 xid:cmd aed8eb79 >  S=6 s=2 (14)
> 21:28:53.106338 xid:cmd aed8eb79 >  S=6 s=3 (14)
> 21:28:53.190276 xid:rsp aed8eb79 < 35d1 S=6 s=3 Nokia 6230i hint=b125 [ 
> PnP Modem Fax Telephony IrCOMM IrOBEX ] (28)
> 21:28:53.198384 xid:cmd aed8eb79 >  S=6 s=4 (14)
> 21:28:53.290382 xid:cmd aed8eb79 >  S=6 s=5 (14)
> 21:28:53.382341 xid:cmd aed8eb79 >  S=6 s=* pentium hint=0400 [ 
> Computer ] (23)
> ^C
> 8 packets received by filter
> 
> $ obexftp -i -l MMC
> Connecting..\done
> Receiving "MMC".../
>   [  ]>
> 
> 
>  user-perm="RWD"/>
>  user-perm="RWD"/>
>  user-perm="RWD"/>
> 
> $ obexftp -i -c MMC -g Image004.jpg
> Connecting..\done
> Sending "MMC"...|done
> Receiving "Image004.jpg"...-done
> Disconnecting..\done

Odd, and is this just a ir device connected to a "real" serial port, or
a specific IRDA device?

thanks,

greg k-h

Re: [PATCH net-next v4 2/2] tcp_diag: report TCP MD5 signing keys and addresses

2017-08-30 Thread Eric Dumazet

On Wed, 2017-08-30 at 18:33 -0700, Ivan Delalande wrote:
> Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
> processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
> not possible to retrieve these from the kernel once they have been
> configured on sockets.
> 
> Signed-off-by: Ivan Delalande 
> ---
>  include/uapi/linux/inet_diag.h |   1 +
>  include/uapi/linux/tcp.h   |   9 
>  net/ipv4/tcp_diag.c| 110 
> ++---
>  3 files changed, 114 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
> index 678496897a68..f52ff62bfabe 100644
> --- a/include/uapi/linux/inet_diag.h
> +++ b/include/uapi/linux/inet_diag.h
> @@ -143,6 +143,7 @@ enum {
>   INET_DIAG_MARK,
>   INET_DIAG_BBRINFO,
>   INET_DIAG_CLASS_ID,
> + INET_DIAG_MD5SIG,
>   __INET_DIAG_MAX,
>  };
>  
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 030e594bab45..15c25eccab2b 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -256,4 +256,13 @@ struct tcp_md5sig {
>   __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
>  };
>  
> +/* INET_DIAG_MD5SIG */
> +struct tcp_diag_md5sig {
> + __u8tcpm_family;
> + __u8tcpm_prefixlen;
> + __u16   tcpm_keylen;
> + __be32  tcpm_addr[4];
> + __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN];
> +};
> +
>  #endif /* _UAPI_LINUX_TCP_H */
> diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
> index a748c74aa8b7..65d0c34a76ee 100644
> --- a/net/ipv4/tcp_diag.c
> +++ b/net/ipv4/tcp_diag.c
> @@ -16,6 +16,7 @@
>  
>  #include 
>  
> +#include 
>  #include 
>  
>  static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
> @@ -36,6 +37,101 @@ static void tcp_diag_get_info(struct sock *sk, struct 
> inet_diag_msg *r,
>   tcp_get_info(sk, info);
>  }
>  
> +#ifdef CONFIG_TCP_MD5SIG
> +static void tcp_diag_md5sig_fill(struct tcp_diag_md5sig *info,
> +  const struct tcp_md5sig_key *key)
> +{
> + info->tcpm_family = key->family;
> + info->tcpm_prefixlen = key->prefixlen;
> + info->tcpm_keylen = key->keylen;
> + memcpy(info->tcpm_key, key->key, key->keylen);


if (key->keylen < TCP_MD5SIG_MAXKEYLEN), 
then you'll leak sensitive kernel data to user space.

Since I doubt many sockets are using MD5SIG, you could simply do at the
beginning of this function :

memset(info, 0, sizeof(*info));

> +
> + if (key->family == AF_INET) {
> + memset(info->tcpm_addr, 0, sizeof(info->tcpm_addr));

then also remove this memset() since the prior memset would do this
already.

> + info->tcpm_addr[0] = key->addr.a4.s_addr;
> + }
> + #if IS_ENABLED(CONFIG_IPV6)
> + else if (key->family == AF_INET6) {
> + memcpy(>tcpm_addr, >addr.a6,
> +sizeof(info->tcpm_addr));
> + }
> + #endif
> +}
> +

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-08-30 Thread Kees Cook

On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook  wrote:
> On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith  wrote:
>> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>>
>>> Interesting! Can you try with 633547973ffc3 ("net: convert
>>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>>> running haveged will help me trigger this on my system...
>>
>> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
>
> Wonderful! Thank you so much for helping track this down.
>
> So, it seems that sk_buff.users will need some more special attention
> before we can convert it to refcount.
>
> x86-refcount will saturate with refcount_dec_and_test() if the result
> is negative. But that would mean at least starting at 0. FULL should
> have WARNed in this case, so I remain slightly confused why it was
> missed by FULL.

Actually, if this is a race condition it's possible that FULL is slow
enough to miss it...

I bet something briefly takes the refcount negative, and with
unchecked atomics, it come back up positive again during the race.
FULL may miss the race, and x86-refcount will catch it and saturate...

-Kees

-- 
Kees Cook
Pixel Security

Re: [net-next PATCHv6 0/2] net: ethernet: Socionext Netsec

2017-08-30 Thread Florian Fainelli

On August 30, 2017 3:24:17 AM PDT, Jassi Brar  wrote:
>Hello,
>
>The OGMA/Netsec controller is used in latest SoC from
>Socionext/Fujitsu.
>
>I am refreshing the patchset by basically using official name of the IP
>from 'OGMA' to 'Netsec'. And the company is renamed too, from Fujitsu
>to Socionext to better reflect the reality.
>
> I have addressed comments (that could be) on the last revision -->
>https://patchwork.kernel.org/patch/4540651/
>
> Of course, I have scanned changes to the drivers/net/ethernet since
>last submission and integrated whichever applicable and rebased the
>driver on top of last rc.

It does not appear to be at first glance, but I will just ask anyways, this is 
not yet another variant of stmmac glued just a little bit differently into the 
SoC right?

Will take a closer look at the register set and driver tomorrow. Thanks

-- 
Florian

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-08-30 Thread Kees Cook

On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith  wrote:
> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>
>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> running haveged will help me trigger this on my system...
>
> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.

Wonderful! Thank you so much for helping track this down.

So, it seems that sk_buff.users will need some more special attention
before we can convert it to refcount.

x86-refcount will saturate with refcount_dec_and_test() if the result
is negative. But that would mean at least starting at 0. FULL should
have WARNed in this case, so I remain slightly confused why it was
missed by FULL.

Ingo, I'm not sure the best path for this. It seems we need to revert
230cd1279d001 and 633547973ffc3 and then we can restore
ARCH_HAS_REFCOUNT.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net-next] net/ncsi: Define {add,kill}_vid callbacks for !CONFIG_NET_NCSI

2017-08-30 Thread Florian Fainelli

On August 30, 2017 8:38:46 PM PDT, Samuel Mendoza-Jonas  
wrote:
>Patch "net/ncsi: Configure VLAN tag filter" defined two new callback
>functions in include/net/ncsi.h, but neglected the !CONFIG_NET_NCSI
>case. This can cause a build error if these are referenced elsewhere
>without NCSI enabled, for example in ftgmac100:
>
 ERROR: "ncsi_vlan_rx_kill_vid"
>[drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
 ERROR: "ncsi_vlan_rx_add_vid"
>[drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
>
>Add definitions for !CONFIG_NET_NCSI to bring it into line with the
>rest
>of ncsi.h
>
>Signed-off-by: Samuel Mendoza-Jonas 
>---
> include/net/ncsi.h | 8 
> 1 file changed, 8 insertions(+)
>
>diff --git a/include/net/ncsi.h b/include/net/ncsi.h
>index 1f96af46df49..2b13b6b91a4d 100644
>--- a/include/net/ncsi.h
>+++ b/include/net/ncsi.h
>@@ -36,6 +36,14 @@ int ncsi_start_dev(struct ncsi_dev *nd);
> void ncsi_stop_dev(struct ncsi_dev *nd);
> void ncsi_unregister_dev(struct ncsi_dev *nd);
> #else /* !CONFIG_NET_NCSI */
>+int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16
>vid)
>+{
>+  return -ENOTTY;

Returning -EOPNOTSUPP would probably be more correct here.

>+}
>+int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16
>vid)
>+{
>+  return -ENOTTY;

Likewise.

>+}
>static inline struct ncsi_dev *ncsi_register_dev(struct net_device
>*dev,
>   void (*notifier)(struct ncsi_dev *nd))
> {


-- 
Florian

linux-next: manual merge of the tip tree with the net-next tree

2017-08-30 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/net/ethernet/cavium/liquidio/lio_main.c

between commit:

  d1d97ee6e3a8 ("liquidio: moved liquidio_napi_drv_callback to lio_core.c")

from the net-next tree and commit:

  966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data")

from the tip tree.

I fixed it up (I added the blow merge fix patch) and can carry the fix
as necessary. This is now fixed as far as linux-next is concerned, but
any non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

From: Stephen Rothwell 
Date: Thu, 31 Aug 2017 13:42:50 +1000
Subject: [PATCH] liquidio: fix for merge with "smp: Avoid using two cache
 lines for struct call_single_data"

Signed-off-by: Stephen Rothwell 
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 0e7896cdb295..23f6b60030c5 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -613,7 +613,7 @@ static void liquidio_napi_drv_callback(void *arg)
droq->cpu_id == this_cpu) {
napi_schedule_irqoff(>napi);
} else {
-   struct call_single_data *csd = >csd;
+   call_single_data_t *csd = >csd;
 
csd->func = napi_schedule_wrapper;
csd->info = >napi;
-- 
2.13.2

-- 
Cheers,
Stephen Rothwell

[PATCH net-next] net/ncsi: Define {add,kill}_vid callbacks for !CONFIG_NET_NCSI

2017-08-30 Thread Samuel Mendoza-Jonas

Patch "net/ncsi: Configure VLAN tag filter" defined two new callback
functions in include/net/ncsi.h, but neglected the !CONFIG_NET_NCSI
case. This can cause a build error if these are referenced elsewhere
without NCSI enabled, for example in ftgmac100:

>>> ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] 
>>> undefined!
>>> ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] 
>>> undefined!

Add definitions for !CONFIG_NET_NCSI to bring it into line with the rest
of ncsi.h

Signed-off-by: Samuel Mendoza-Jonas 
---
 include/net/ncsi.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/ncsi.h b/include/net/ncsi.h
index 1f96af46df49..2b13b6b91a4d 100644
--- a/include/net/ncsi.h
+++ b/include/net/ncsi.h
@@ -36,6 +36,14 @@ int ncsi_start_dev(struct ncsi_dev *nd);
 void ncsi_stop_dev(struct ncsi_dev *nd);
 void ncsi_unregister_dev(struct ncsi_dev *nd);
 #else /* !CONFIG_NET_NCSI */
+int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+   return -ENOTTY;
+}
+int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+   return -ENOTTY;
+}
 static inline struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
void (*notifier)(struct ncsi_dev *nd))
 {
-- 
2.14.1

Re: [net-next PATCHv6 1/2] dt-bindings: net: Add DT bindings for Socionext Netsec

2017-08-30 Thread Jassi Brar

On Wed, Aug 30, 2017 at 8:49 PM, Andrew Lunn  wrote:
> On Wed, Aug 30, 2017 at 03:55:52PM +0530, Jassi Brar wrote:
>> This patch adds documentation for Device-Tree bindings for the
>> Socionext NetSec Controller driver.
>>
>> Signed-off-by: Jassi Brar 
>> ---
>>  .../devicetree/bindings/net/socionext-netsec.txt   | 46 
>> ++
>>  1 file changed, 46 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/net/socionext-netsec.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt 
>> b/Documentation/devicetree/bindings/net/socionext-netsec.txt
>> new file mode 100644
>> index 000..12d596c
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
>> @@ -0,0 +1,46 @@
>> +* Socionext NetSec Ethernet Controller IP
>> +
>> +Required properties:
>> +- compatible: Should be "socionext,netsecv5"
>> +- reg: Address and length of the register sets, the first is the main
>> + registers, then the rdlar and tdlar regions for the SoC
>> +- interrupts: Should contain ethernet controller interrupt
>> +- clocks: phandle to any clocks to be switched by runtime_pm
>> +- phy-mode: See ethernet.txt file in the same directory
>
>> +- max-speed: See ethernet.txt file in the same directory
>> +- max-frame-size: See ethernet.txt file in the same directory, if 9000 or
>> + above jumbo frames are enabled
>> +- local-mac-address: See ethernet.txt file in the same directory
>
> These three are required, not optimal?
>
optional :)

>> +- phy-handle: phandle to select child phy
>> +
>> +Optional properties:
>> +- use-jumbo: Boolean property to suggest if jumbo packets should be used or 
>> not
>> +
>> +For the child phy
>> +
>> +- compatible "ethernet-phy-ieee802.3-c22" is needed
>
> This is normally considered optional. Why require it?
>
Yes, will do.

Thanks.

Re: [PATCH][next] bpf: test_maps: fix typo "conenct" -> "connext"

2017-08-30 Thread Joe Perches

On Wed, 2017-08-30 at 12:47 +0100, Colin King wrote:
> From: Colin Ian King 

connext->connect typo in patch subject

> Trivial fix to typo in printf error message
> 
> Signed-off-by: Colin Ian King 
> ---
>  tools/testing/selftests/bpf/test_maps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_maps.c 
> b/tools/testing/selftests/bpf/test_maps.c
> index 7059bb315a10..8fdaf2e21c8a 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -525,7 +525,7 @@ static void test_sockmap(int tasks, void *data)
>   addr.sin_port = htons(ports[i - 2]);
>   err = connect(sfd[i], (struct sockaddr *), sizeof(addr));
>   if (err) {
> - printf("failed to conenct\n");
> + printf("failed to connect\n");
>   goto out;
>   }
>   }

Re: [PATCH net] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"

2017-08-30 Thread David Miller

From: Florian Fainelli 
Date: Wed, 30 Aug 2017 17:49:29 -0700

> This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
> Correctly process PHY_HALTED in phy_stop_machine()") because it is
> creating the possibility for a NULL pointer dereference.
> 
> David Daney provide the following call trace and diagram of events:
> 
> When ndo_stop() is called we call:
> 
>  phy_disconnect()
> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
> +---> phy_stop_machine()
> |  +---> phy_state_machine()
> |  +> queue_delayed_work(): Work queued.
> +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
> +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0048, epc == 80de37ec, ra == 80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 8004021ed100 task.stack: 800409d7
> $ 0   :  84720060 0048 0004
> $ 4   :  0001 0004 
> $ 8   :   98f3 
> $12   : 800409d73fe0 9c00 846547c8 af3b
> $16   : 8004096bab68 8004096babd0  8004096ba800
> $20   :   8109 0008
> $24   : 0061 808637b0
> $28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
> Hi: 002a
> Lo: 003f
> epc   : 80de37ec netif_carrier_off+0xc/0x58
> ra: 80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3KX SX UX KERNEL EXL IE
> Cause : 0088 (ExcCode 02)
> BadVA : 0048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
> task=8004021ed100, tls=)
> Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
>  808a1708 800409a54000 8000271bd300
> 8000271bd320 800409a54030 80ff0f00 0001
> 8109 808a1ac0 800402182080 8465
> 800402182080 8465 80ff 800409a54000
> 808a1970  8004099e8000 800402099240
>  808a8598  800408eeeb00
> 800409a54000 810a1d00  800409d73de8
> 800409d73de8 0088 0c009c00 800409d73e08
> 800409d73e08 800402182080 808a84d0 800402182080
> ...
> Call Trace:
> [] netif_carrier_off+0xc/0x58
> [] phy_state_machine+0x48c/0x4f8
> [] process_one_work+0x158/0x368
> [] worker_thread+0x150/0x4c0
> [] kthread+0xc8/0xe0
> [] ret_from_kernel_thread+0x14/0x1c
> 
> The original motivation for this change originated from Marc Gonzales
> indicating that his network driver did not have its adjust_link callback
> executing with phydev->link = 0 while he was expecting it.
> 
> PHYLIB has never made any such guarantees ever because phy_stop() merely just
> tells the workqueue to move into PHY_HALTED state which will happen
> asynchronously.
> 
> Reported-by: Geert Uytterhoeven 
> Reported-by: David Daney 
> Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in 
> phy_stop_machine()")
> Signed-off-by: Florian Fainelli 

Applied and queued up for -stable, thanks Florian.

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Andrew Lunn

> >/* Report late collisions as a frame error. */
> > if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
> > ndev->stats.rx_frame_errors++;
> >
> > I don't see anywhere else frame errors are counted, but it would be
> > good to prove we are looking in the right place.
> >
> 
> Andrew,
> 
> (adding IMX FEC driver maintainer to CC)
> 
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
> 
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>   inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>   TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
> 
> Instrumenting fec driver I see the following getting hit:
> 
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
> 
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

I did fix an issue recently with that. See 

commit fbbeefdd21049fcf9437c809da3828b210577f36
Author: Andrew Lunn 
Date:   Sun Jul 30 19:36:05 2017 +0200

net: fec: Allow reception of frames bigger than 1522 bytes

The FEC Receive Control Register has a 14 bit field indicating the
longest frame that may be received. It is being set to 1522. Frames
longer than this are discarded, but counted as being in error.

When using DSA, frames from the switch has an additional header,
either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
of 1522 bytes received by the switch on a port becomes 1530 bytes when
passed to the host via the FEC interface.

Change the maximum receive size to 2048 - 64, where 64 is the maximum
rx_alignment applied on the receive buffer for AVB capable FEC
cores. Use this value also for the maximum receive buffer size. The
driver is already allocating a receive SKB of 2048 bytes, so this
change should not have any significant effects.

Tested on imx51, imx6, vf610.

Signed-off-by: Andrew Lunn 
Signed-off-by: David S. Miller 


However, this is was of an all/nothing problem. All frames with the
full MTU were getting dropped, where as i think you are only seeing a
few dropped?

Anyway, try cherry picking that patch and see if it helps.

Andrew

[PATCH net-next v4 2/2] tcp_diag: report TCP MD5 signing keys and addresses

2017-08-30 Thread Ivan Delalande

Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.

Signed-off-by: Ivan Delalande 
---
 include/uapi/linux/inet_diag.h |   1 +
 include/uapi/linux/tcp.h   |   9 
 net/ipv4/tcp_diag.c| 110 ++---
 3 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 678496897a68..f52ff62bfabe 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -143,6 +143,7 @@ enum {
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
INET_DIAG_CLASS_ID,
+   INET_DIAG_MD5SIG,
__INET_DIAG_MAX,
 };
 
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 030e594bab45..15c25eccab2b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -256,4 +256,13 @@ struct tcp_md5sig {
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
 };
 
+/* INET_DIAG_MD5SIG */
+struct tcp_diag_md5sig {
+   __u8tcpm_family;
+   __u8tcpm_prefixlen;
+   __u16   tcpm_keylen;
+   __be32  tcpm_addr[4];
+   __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN];
+};
+
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a748c74aa8b7..65d0c34a76ee 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -16,6 +16,7 @@
 
 #include 
 
+#include 
 #include 
 
 static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
@@ -36,6 +37,101 @@ static void tcp_diag_get_info(struct sock *sk, struct 
inet_diag_msg *r,
tcp_get_info(sk, info);
 }
 
+#ifdef CONFIG_TCP_MD5SIG
+static void tcp_diag_md5sig_fill(struct tcp_diag_md5sig *info,
+const struct tcp_md5sig_key *key)
+{
+   info->tcpm_family = key->family;
+   info->tcpm_prefixlen = key->prefixlen;
+   info->tcpm_keylen = key->keylen;
+   memcpy(info->tcpm_key, key->key, key->keylen);
+
+   if (key->family == AF_INET) {
+   memset(info->tcpm_addr, 0, sizeof(info->tcpm_addr));
+   info->tcpm_addr[0] = key->addr.a4.s_addr;
+   }
+   #if IS_ENABLED(CONFIG_IPV6)
+   else if (key->family == AF_INET6) {
+   memcpy(>tcpm_addr, >addr.a6,
+  sizeof(info->tcpm_addr));
+   }
+   #endif
+}
+
+static int tcp_diag_put_md5sig(struct sk_buff *skb,
+  const struct tcp_md5sig_info *md5sig)
+{
+   const struct tcp_md5sig_key *key;
+   struct nlattr *attr;
+   struct tcp_diag_md5sig *info;
+   int md5sig_count = 0;
+
+   hlist_for_each_entry_rcu(key, >head, node)
+   md5sig_count++;
+   if (md5sig_count == 0)
+   return 0;
+
+   attr = nla_reserve(skb, INET_DIAG_MD5SIG,
+  md5sig_count * sizeof(struct tcp_diag_md5sig));
+   if (!attr)
+   return -EMSGSIZE;
+
+   info = nla_data(attr);
+   hlist_for_each_entry_rcu(key, >head, node) {
+   tcp_diag_md5sig_fill(info++, key);
+   if (--md5sig_count == 0)
+   break;
+   }
+   if (md5sig_count > 0)
+   memset(info, 0, md5sig_count * sizeof(struct tcp_diag_md5sig));
+
+   return 0;
+}
+#endif
+
+static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
+   struct sk_buff *skb)
+{
+#ifdef CONFIG_TCP_MD5SIG
+   if (net_admin) {
+   struct tcp_md5sig_info *md5sig;
+   int err = 0;
+
+   rcu_read_lock();
+   md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+   if (md5sig)
+   err = tcp_diag_put_md5sig(skb, md5sig);
+   rcu_read_unlock();
+   if (err < 0)
+   return err;
+   }
+#endif
+
+   return 0;
+}
+
+static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
+{
+   size_t size = 0;
+
+#ifdef CONFIG_TCP_MD5SIG
+   if (sk_fullsock(sk)) {
+   const struct tcp_md5sig_info *md5sig;
+   const struct tcp_md5sig_key *key;
+
+   rcu_read_lock();
+   md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+   if (md5sig) {
+   hlist_for_each_entry_rcu(key, >head, node)
+   size += sizeof(struct tcp_diag_md5sig);
+   }
+   rcu_read_unlock();
+   }
+#endif
+
+   return size;
+}
+
 static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
  const struct inet_diag_req_v2 *r, struct nlattr *bc)
 {
@@ -68,13 +164,15 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler

[PATCH net-next v4 1/2] inet_diag: allow protocols to provide additional data

2017-08-30 Thread Ivan Delalande

Extend inet_diag_handler to allow individual protocols to report
additional data on INET_DIAG_INFO through idiag_get_aux. The size
can be dynamic and is computed by idiag_get_aux_size.

Signed-off-by: Ivan Delalande 
---
 include/linux/inet_diag.h |  7 +++
 net/ipv4/inet_diag.c  | 22 ++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 65da430e260f..ee251c585854 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -25,6 +25,13 @@ struct inet_diag_handler {
  struct inet_diag_msg *r,
  void *info);
 
+   int (*idiag_get_aux)(struct sock *sk,
+bool net_admin,
+struct sk_buff *skb);
+
+   size_t  (*idiag_get_aux_size)(struct sock *sk,
+ bool net_admin);
+
int (*destroy)(struct sk_buff *in_skb,
   const struct inet_diag_req_v2 *req);
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 67325d5832d7..cb7012d1720f 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -93,8 +93,17 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, 
struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
-static size_t inet_sk_attr_size(void)
+static size_t inet_sk_attr_size(struct sock *sk,
+   const struct inet_diag_req_v2 *req,
+   bool net_admin)
 {
+   const struct inet_diag_handler *handler;
+   size_t aux = 0;
+
+   handler = inet_diag_table[req->sdiag_protocol];
+   if (handler && handler->idiag_get_aux_size)
+   aux = handler->idiag_get_aux_size(sk, net_admin);
+
returnnla_total_size(sizeof(struct tcp_info))
+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
+ nla_total_size(1) /* INET_DIAG_TOS */
@@ -105,6 +114,7 @@ static size_t inet_sk_attr_size(void)
+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
+ nla_total_size(TCP_CA_NAME_MAX)
+ nla_total_size(sizeof(struct tcpvegas_info))
+   + nla_total_size(aux)
+ 64;
 }
 
@@ -260,6 +270,10 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
 
handler->idiag_get_info(sk, r, info);
 
+   if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux)
+   if (handler->idiag_get_aux(sk, net_admin, skb) < 0)
+   goto errout;
+
if (sk->sk_state < TCP_TIME_WAIT) {
union tcp_cc_info info;
size_t sz = 0;
@@ -449,6 +463,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
const struct nlmsghdr *nlh,
const struct inet_diag_req_v2 *req)
 {
+   bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN);
struct net *net = sock_net(in_skb->sk);
struct sk_buff *rep;
struct sock *sk;
@@ -458,7 +473,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
if (IS_ERR(sk))
return PTR_ERR(sk);
 
-   rep = nlmsg_new(inet_sk_attr_size(), GFP_KERNEL);
+   rep = nlmsg_new(inet_sk_attr_size(sk, req, net_admin), GFP_KERNEL);
if (!rep) {
err = -ENOMEM;
goto out;
@@ -467,8 +482,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
err = sk_diag_fill(sk, rep, req,
   sk_user_ns(NETLINK_CB(in_skb).sk),
   NETLINK_CB(in_skb).portid,
-  nlh->nlmsg_seq, 0, nlh,
-  netlink_net_capable(in_skb, CAP_NET_ADMIN));
+  nlh->nlmsg_seq, 0, nlh, net_admin);
if (err < 0) {
WARN_ON(err == -EMSGSIZE);
nlmsg_free(rep);
-- 
2.14.1

[PATCH net-next v4 0/2] report TCP MD5 signing keys and addresses

2017-08-30 Thread Ivan Delalande

Allow userspace to retrieve MD5 signature keys and addresses configured
on TCP sockets through inet_diag.

Thanks to Eric Dumazet and Stephen Hemminger for their useful
explanations and feedback.

v4: - add new struct tcp_diag_md5sig to report the data instead of
  tcp_md5sig to avoid wasting 112 bytes on every tcpm_addr,
- memset tcpm_addr on IPv4 addresses to avoid leaks,
- style fix in inet_diag_dump_one_icsk.

v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for
  consistency,
- don't lock the socket in tcp_diag_put_md5sig,
- add checks on md5sig_count in tcp_diag_put_md5sig to not create
  the netlink attribute if the list is empty, and to avoid overflows
  or memory leaks if the list has changed in the meantime.

v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow
  protocols to provide additional data on INET_DIAG_INFO,
- lock socket before calling tcp_diag_put_md5sig.


I also have a patch for iproute2/ss to test this change, making it print
this new attribute. I'm planning to polish and send it if this series
gets applied.


Ivan Delalande (2):
  inet_diag: allow protocols to provide additional data
  tcp_diag: report TCP MD5 signing keys and addresses

 include/linux/inet_diag.h  |   7 +++
 include/uapi/linux/inet_diag.h |   1 +
 include/uapi/linux/tcp.h   |   9 
 net/ipv4/inet_diag.c   |  22 +++--
 net/ipv4/tcp_diag.c| 110 ++---
 5 files changed, 139 insertions(+), 10 deletions(-)

-- 
2.14.1

RE: [patch net-next v2 0/3] net/sched: Improve getting objects by indexes

2017-08-30 Thread Chris Mi



> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, August 31, 2017 5:39 AM
> To: Chris Mi 
> Cc: netdev@vger.kernel.org; j...@mojatatu.com;
> xiyou.wangc...@gmail.com; j...@resnulli.us; mawil...@microsoft.com
> Subject: Re: [patch net-next v2 0/3] net/sched: Improve getting objects by
> indexes
> 
> From: Chris Mi 
> Date: Wed, 30 Aug 2017 02:31:56 -0400
> 
> > Using current TC code, it is very slow to insert a lot of rules.
> >
> > In order to improve the rules update rate in TC, we introduced the
> > following two changes:
> > 1) changed cls_flower to use IDR to manage the filters.
> > 2) changed all act_xxx modules to use IDR instead of
> >a small hash table
> >
> > But IDR has a limitation that it uses int. TC handle uses u32.
> > To make sure there is no regression, we add several new IDR APIs to
> > support unsigned long.
> >
> > v2
> > ==
> >
> > Addressed Hannes's comment:
> > express idr_alloc in terms of idr_alloc_ext and most of the other
> > functions
> 
> Series applied, thanks.

Thank you, David,

-Chris

RE: [patch net-next v2 3/3] net/sched: Change act_api and act_xxx modules to use IDR

2017-08-30 Thread Chris Mi



> -Original Message-
> From: Jamal Hadi Salim [mailto:j...@mojatatu.com]
> Sent: Wednesday, August 30, 2017 8:11 PM
> To: Chris Mi ; netdev@vger.kernel.org
> Cc: xiyou.wangc...@gmail.com; j...@resnulli.us; da...@davemloft.net;
> mawil...@microsoft.com
> Subject: Re: [patch net-next v2 3/3] net/sched: Change act_api and act_xxx
> modules to use IDR
> 
> On 17-08-30 02:31 AM, Chris Mi wrote:
> > Typically, each TC filter has its own action. All the actions of the
> > same type are saved in its hash table. But the hash buckets are too
> > small that it degrades to a list. And the performance is greatly
> > affected. For example, it takes about 0m11.914s to insert 64K rules.
> > If we convert the hash table to IDR, it only takes about 0m1.500s.
> > The improvement is huge.
> >
> > But please note that the test result is based on previous patch that
> > cls_flower uses IDR.
> >
> > Signed-off-by: Chris Mi 
> > Signed-off-by: Jiri Pirko 
> 
> Acked-by: Jamal Hadi Salim 
> 
> Also already acked this before but you left it out in this version. If you 
> make
> changes to the patch then you will need a new ACK.
Sorry about that, Jamal. I think I need to make a note of the review comment
In case I forget it.
> 
> Dont forget to update selftests please.
Sure, we will work on that.

Thanks,
Chris
> 
> cheers,
> jamal

[PATCH net] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"

2017-08-30 Thread Florian Fainelli

This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.

David Daney provide the following call trace and diagram of events:

When ndo_stop() is called we call:

 phy_disconnect()
+---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
+---> phy_stop_machine()
|  +---> phy_state_machine()
|  +> queue_delayed_work(): Work queued.
+--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
+>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

 CPU 12 Unable to handle kernel paging request at virtual address
0048, epc == 80de37ec, ra == 80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 8004021ed100 task.stack: 800409d7
$ 0   :  84720060 0048 0004
$ 4   :  0001 0004 
$ 8   :   98f3 
$12   : 800409d73fe0 9c00 846547c8 af3b
$16   : 8004096bab68 8004096babd0  8004096ba800
$20   :   8109 0008
$24   : 0061 808637b0
$28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
Hi: 002a
Lo: 003f
epc   : 80de37ec netif_carrier_off+0xc/0x58
ra: 80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3KX SX UX KERNEL EXL IE
Cause : 0088 (ExcCode 02)
BadVA : 0048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
task=8004021ed100, tls=)
Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
 808a1708 800409a54000 8000271bd300
8000271bd320 800409a54030 80ff0f00 0001
8109 808a1ac0 800402182080 8465
800402182080 8465 80ff 800409a54000
808a1970  8004099e8000 800402099240
 808a8598  800408eeeb00
800409a54000 810a1d00  800409d73de8
800409d73de8 0088 0c009c00 800409d73e08
800409d73e08 800402182080 808a84d0 800402182080
...
Call Trace:
[] netif_carrier_off+0xc/0x58
[] phy_state_machine+0x48c/0x4f8
[] process_one_work+0x158/0x368
[] worker_thread+0x150/0x4c0
[] kthread+0xc8/0xe0
[] ret_from_kernel_thread+0x14/0x1c

The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.

PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.

Reported-by: Geert Uytterhoeven 
Reported-by: David Daney 
Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in 
phy_stop_machine()")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 5068c582d502..d0626bf5c540 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -749,9 +749,6 @@ void phy_stop_machine(struct phy_device *phydev)
if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
phydev->state = PHY_UP;
mutex_unlock(>lock);
-
-   /* Now we can run the state machine synchronously */
-   phy_state_machine(>state_queue.work);
 }
 
 /**
-- 
1.9.1

Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread Florian Fainelli

On 08/30/2017 05:13 PM, David Daney wrote:
> On 07/31/2017 05:28 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Fri, 28 Jul 2017 11:58:36 -0700
>>
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> Reported-by: Marc Gonzalez 
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Florian Fainelli 
>>> ---
>>> Changes in v2:
>>>
>>> - reword subject and commit message based on changes
>>> - dropped flush_scheduled_work() since it is redundant
>>
>> Applied and queued up for -stable, thanks.
>>
> 
> 
> This is broken.  Please revert.
> 
> Upstream commit 7ad813f20853 and in the stable branches as well.
> 
> When ndo_stop() is called we call:
> 
> 
>  phy_disconnect()
> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
> +---> phy_stop_machine()
> |  +---> phy_stop_machine()
> |  +> queue_delayed_work(): Work queued.
> +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
> +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

How about the following instead of a revert (which I have queued locally
as well along with your correct call graph). This still would not fix
Geert's problem where with this change, we do actually call back into
adjust_link after a phy_stop() which may be problematic for him so I
think the revert is just easier and Marc, we'll figure out something for
nb8800?

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..78168e19bd5d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1234,7 +1234,7 @@ void phy_state_machine(struct work_struct *work)
 * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
 * between states from phy_mac_interrupt()
 */
-   if (phydev->irq == PHY_POLL)
+   if (phydev->irq == PHY_POLL && phydev->state != PHY_HALTED)
queue_delayed_work(system_power_efficient_wq,
>state_queue,
   PHY_STATE_TIME * HZ);
 }
> 
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0048, epc == 80de37ec, ra == 80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 8004021ed100 task.stack: 800409d7
> $ 0   :  84720060 0048 0004
> $ 4   :  0001 0004 
> $ 8   :   98f3 
> $12   : 800409d73fe0 9c00 846547c8 af3b
> $16   : 8004096bab68 8004096babd0  8004096ba800
> $20   :   8109 0008
> $24   : 0061 808637b0
> $28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
> Hi: 002a
> Lo: 003f
> epc   : 80de37ec netif_carrier_off+0xc/0x58
> ra: 80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3KX SX UX KERNEL EXL IE
> Cause : 0088 (ExcCode 02)
> BadVA : 0048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
> task=8004021ed100, tls=)
> Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
>  808a1708 800409a54000 8000271bd300
> 8000271bd320 800409a54030 80ff0f00 0001
> 8109 808a1ac0 800402182080 8465
> 800402182080 8465 80ff 800409a54000
> 808a1970  8004099e8000 800402099240
>  808a8598  800408eeeb00
> 800409a54000 810a1d00  800409d73de8
> 800409d73de8 0088 0c009c00 800409d73e08
> 800409d73e08 800402182080 808a84d0 800402182080
> ...
> Call Trace:
> [] netif_carrier_off+0xc/0x58
> [] phy_state_machine+0x48c/0x4f8
> [] process_one_work+0x158/0x368
> [] worker_thread+0x150/0x4c0
> []

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Ilia Mirkin

On Wed, Aug 30, 2017 at 8:22 PM, Tim Harvey  wrote:
> On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn  wrote:
>> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
>>> Greetings,
>>>
>>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
>>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
>>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
>>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).
>>
>> Hi Tim
>>
>> Can you confirm the counter is this one:
>>
>>/* Report late collisions as a frame error. */
>> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
>> ndev->stats.rx_frame_errors++;
>>
>> I don't see anywhere else frame errors are counted, but it would be
>> good to prove we are looking in the right place.
>>
>
> Andrew,
>
> (adding IMX FEC driver maintainer to CC)
>
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
>
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>   inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>   TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
>
> Instrumenting fec driver I see the following getting hit:
>
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
>
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

Not sure if this is relevant to you, but
https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd
was necessary to fix some MTU issues on a i.MX51. Not sure if it's
upstream yet or not.

Cheers,

  -ilia

Re: multi-queue over IFF_NO_QUEUE "virtual" devices

2017-08-30 Thread Florian Fainelli

On 08/30/2017 04:37 PM, Cong Wang wrote:
> On Tue, Aug 29, 2017 at 8:49 PM, Florian Fainelli  
> wrote:
>> Le 08/07/17 à 15:26, Florian Fainelli a écrit :
>>> Hi,
>>>
>>> Most DSA supported Broadcom switches have multiple queues per ports
>>> (usually 8) and each of these queues can be configured with different
>>> pause, drop, hysteresis thresholds and so on in order to make use of the
>>> switch's internal buffering scheme and have some queues achieve some
>>> kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher
>>> priority than LAN to WAN for Q0).
>>>
>>> This is obviously very workload specific, so I'd want maximum
>>> programmability as much as possible.
>>>
>>> This brings me to a few questions:
>>>
>>> 1) If we have the DSA slave network devices currently flagged with
>>> IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application
>>> can control exactly which switch egress queue is used on a per-flow
>>> basis, would that be a problem (this is the dynamic selection of the TX
>>> queue)?
>>
>> So I have this part figured out, with a bunch of changes network devices
>> created by DSA are now multiqueue aware and the Broadcom tag layer is
>> capable of extracting the queue index, passing it in the tag where
>> expected and having the switch forward to the appropriate switch port
>> and queue within that port. It also sets the queue mapping in the SKB
>> for later consumption by the master network device driver: bcmsysport.c
>> because of 2).
>>
>>>
>>> 2) The conduit interface (CPU) port network interface has a congestion
>>> control scheme which requires each of its TX queues (32 or 16) to be
>>> statically mapped to each of the underlying switch port queues because
>>> the congestion/ HW needs to inspect the queue depths of the switch to
>>> accept/reject a packet at the CPU's TX ring level. Do we have a good way
>>> with tc to map a virtual/stacked device's queue(s) on-top of its
>>> physical/underlying device's queues (this is the static queue mapping
>>> necessary for congestion to work)?
>>
>> That part I have not figured out yet, with some static mapping I can
>> obtain the results that I want and was even considering the possibility
>> of doing something like this:
>>
>> - register a network device notifier with bcmsysport.c (master network
>> device) for this setup
>> - expose a helper function allowing me to obtain a given DSA network
>> device port index
>> - whenever DSA creates network devices reconfigure the ring and queue
>> mapping of the TX queues managed by bcmsysport.c with the DSA network
>> device port index that has just been registered and just do a 1-1
>> mapping of the 8 queues
>>
>> You would end-up with something like:
>>
>> gphy (port 0) queues 0-7 mapped to systemport queues 0-7
>> rgmii_1 (port 1) queues 0-7 mapped to systemport queues 8-15
>> rgmii_2 (port 2) queues 0-7 mapped to systemport queues 16 through 23
>> moca (port 7) queues 0-7 mapped to systemport queues 24-31
>>
>> This should be working because bcmsysport's TX queues are not under
>> direct control by the user, they are used via DSA created network
>> devices which indicate the queue they want to use. When the DSA
>> interfaces are brought down, their respective systemport queues now
>> become unused. This also works because the number of physical ports of
>> the switch times the number of queues is matching the number of TX
>> queues from systemport (like if someone designed it with that exact
>> purpose in mind ;)).
>>
>> The only problem with that approach of course is that it embeds a policy
>> within the systemport driver.
>>
>> Ideally I would really like to configure this via tc by setting up a
>> mapping between queues of one network devices to queues of another
>> network device, is that a possible thing, Jamal, Cong, Jiri, do you know?
> 
> I am not sure if I understand the mapping you are talking about here.
> 
> TC layer rarely deals with hardware queues directly (except probably mq),
> so this question probably don't belong to TC.
> 
> OTOH, TC can modify skb->hash, so you can redirect packets to a specific
> queue, but this doesn't sound like what you are you looking for.

I am actually building on TC being able to influence the value of
skb->queue_mapping, but that is just for the stacked devices, not the
underlying conduit device that does the actual transmission.

> 
> Maybe Jiri has more thoughts here since he works on TC offloading things.
> 

Patches with explanations and context (hopefully clearer) here:

http://patchwork.ozlabs.org/project/netdev/list/?series=728

Thanks!
-- 
Florian

[RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num

2017-08-30 Thread Florian Fainelli

Expose two helper functions:
* one to verify if a net_device is a DSA slave network device
* one to obtain the physical port number associated with a DSA slave
  network device

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 15 +++
 net/dsa/slave.c   | 12 +---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b10e8da3f8d7..649bd06f9fe4 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -459,6 +459,21 @@ static inline bool netdev_uses_dsa(struct net_device *dev)
return false;
 }
 
+#if IS_ENABLED(CONFIG_NET_DSA)
+bool dsa_slave_dev_check(struct net_device *dev);
+unsigned int dsa_slave_dev_port_num(struct net_device *dev);
+#else
+static inline bool dsa_slave_dev_check(struct net_device *dev)
+{
+   return false;
+}
+
+static inline dsa_slave_dev_port_num(struct net_device *dev)
+{
+   return DSA_MAX_PORTS;
+}
+#endif
+
 struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n);
 void dsa_unregister_switch(struct dsa_switch *ds);
 int dsa_register_switch(struct dsa_switch *ds);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bfd7173a3c6a..302ae3326e3a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -25,8 +25,6 @@
 
 #include "dsa_priv.h"
 
-static bool dsa_slave_dev_check(struct net_device *dev);
-
 /* slave mii_bus handling ***/
 static int dsa_slave_phy_read(struct mii_bus *bus, int addr, int reg)
 {
@@ -1346,10 +1344,18 @@ void dsa_slave_destroy(struct net_device *slave_dev)
free_netdev(slave_dev);
 }
 
-static bool dsa_slave_dev_check(struct net_device *dev)
+bool dsa_slave_dev_check(struct net_device *dev)
 {
return dev->netdev_ops == _slave_netdev_ops;
 }
+EXPORT_SYMBOL_GPL(dsa_slave_dev_check);
+
+unsigned int dsa_slave_dev_port_num(struct net_device *dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   return p->dp->index;
+}
+EXPORT_SYMBOL_GPL(dsa_slave_dev_port_num);
 
 static int dsa_slave_changeupper(struct net_device *dev,
 struct netdev_notifier_changeupper_info *info)
-- 
1.9.1

[RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue

2017-08-30 Thread Florian Fainelli

We need to tell the DSA master network device doing the actual
transmission what the desired switch port and queue number is for it to
resolve that to the internal transmit queue it is mapped to.

Signed-off-by: Florian Fainelli 
---
 net/dsa/tag_brcm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index dbb016434ace..19a617b09b3c 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -86,6 +86,11 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
struct net_device *dev
brcm_tag[2] = BRCM_IG_DSTMAP2_MASK;
brcm_tag[3] = (1 << p->dp->index) & BRCM_IG_DSTMAP1_MASK;
 
+   /* Now tell the master network device about the desired output queue
+* as well
+*/
+   skb_set_queue_mapping(skb, p->dp->index << dev->num_tx_queues | queue);
+
return skb;
 }
 
-- 
1.9.1

[RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping

2017-08-30 Thread Florian Fainelli

Establish a queue mapping between the DSA slave network device queues
created, and the transmit queue that SYSTEMPORT manages.

We need to configure the SYSTEMPORT transmit queue with the switch port
number and switch port queue number in order for the switch and
SYSTEMPORT hardware to utilize the out of band congestion notification.
This hardware mechanism works by looking at the switch port egress queue
and determines whether there is enough buffers for this queue, with that
class of service for a successful transmission and if not, backpressures
the SYSTEMPORT queue that is being used.

For this to work, we register a network device notifier that listens for
the registration of DSA network devices, and when that happens, extracts
the number of queues for these devices and their associated port number,
remembers that in the driver private structure and linearly maps those
queues to TX rings/queues that we manage.

This scheme works because DSA slave network deviecs always transmit
through SYSTEMPORT so when DSA slave network devices are
destroyed/brought down, the corresponding SYSTEMPORT queues are no
longer used. Also, by design of the DSA framework, the master network
device (SYSTEMPORT) is registered first.

For faster lookups we use an array of up to DSA_MAX_PORTS * number
of queues per port, and then map pointers to bcm_sysport_tx_ring such
that our ndo_select_queue() implementation can just index into that
array to locate the corresponding ring index.

Here is an example mapping with this code:

P0,Q0 -> Q0
..
P0,Q7 -> Q7
P1,Q0 -> Q8
..
P1,Q7 -> Q15
P2,Q0 -> Q16
..
P2,Q7 -> Q23
P7,Q0 -> Q24
..
P7,Q7 -> Q31

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 100 +++--
 drivers/net/ethernet/broadcom/bcmsysport.h |  11 +++-
 2 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 931751e4f369..eed4c3f672d7 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1387,7 +1387,14 @@ static int bcm_sysport_init_tx_ring(struct 
bcm_sysport_priv *priv,
tdma_writel(priv, 0, TDMA_DESC_RING_COUNT(index));
tdma_writel(priv, 1, TDMA_DESC_RING_INTR_CONTROL(index));
tdma_writel(priv, 0, TDMA_DESC_RING_PROD_CONS_INDEX(index));
-   tdma_writel(priv, RING_IGNORE_STATUS, TDMA_DESC_RING_MAPPING(index));
+
+   /* Configure QID and port mapping */
+   reg = tdma_readl(priv, TDMA_DESC_RING_MAPPING(index));
+   reg &= ~(RING_QID_MASK | RING_PORT_ID_MASK << RING_PORT_ID_SHIFT);
+   reg |= ring->switch_queue & RING_QID_MASK;
+   reg |= ring->switch_port << RING_PORT_ID_SHIFT;
+   reg |= RING_IGNORE_STATUS;
+   tdma_writel(priv, reg, TDMA_DESC_RING_MAPPING(index));
tdma_writel(priv, 0, TDMA_DESC_RING_PCP_DEI_VID(index));
 
/* Program the number of descriptors as MAX_THRESHOLD and half of
@@ -1405,8 +1412,9 @@ static int bcm_sysport_init_tx_ring(struct 
bcm_sysport_priv *priv,
napi_enable(>napi);
 
netif_dbg(priv, hw, priv->netdev,
- "TDMA cfg, size=%d, desc_cpu=%p\n",
- ring->size, ring->desc_cpu);
+ "TDMA cfg, size=%d, desc_cpu=%p switch q=%d,port=%d\n",
+ ring->size, ring->desc_cpu, ring->switch_queue,
+ ring->switch_port);
 
return 0;
 }
@@ -1987,6 +1995,78 @@ static int bcm_sysport_stop(struct net_device *dev)
.set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
+static u16 bcm_sysport_select_queue(struct net_device *dev, struct sk_buff 
*skb,
+   void *accel_priv,
+   select_queue_fallback_t fallback)
+{
+   struct bcm_sysport_priv *priv = netdev_priv(dev);
+   u16 queue = skb_get_queue_mapping(skb);
+   struct bcm_sysport_tx_ring *tx_ring;
+   unsigned int q, port;
+
+   if (!netdev_uses_dsa(dev))
+   return fallback(dev, skb);
+
+   /* DSA tagging layer will have configured the correct queue */
+   q = queue & 0xff;
+   port = queue >> priv->per_port_num_tx_queues;
+   tx_ring = priv->ring_map[q + port * priv->per_port_num_tx_queues];
+
+   return tx_ring->index;
+}
+
+static int bcm_sysport_map_queues(struct bcm_sysport_priv *priv,
+ struct net_device *slave_dev)
+{
+   struct net_device *dev = priv->netdev;
+   struct bcm_sysport_tx_ring *ring;
+   unsigned int num_tx_queues;
+   unsigned int q, start, port;
+
+   port = dsa_slave_dev_port_num(slave_dev);
+   num_tx_queues = slave_dev->num_tx_queues;
+
+   if (priv->per_port_num_tx_queues &&
+   priv->per_port_num_tx_queues != num_tx_queues)
+   netdev_warn(slave_dev, "asymetric number of per-port queues\n");
+
+   priv->per_port_num_tx_queues =

[RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278

2017-08-30 Thread Florian Fainelli

BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
fix that.

Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 4 
 drivers/net/dsa/bcm_sf2.h | 1 +
 drivers/net/dsa/bcm_sf2_cfp.c | 8 
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index fc9f9f171e55..af299fe343cc 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1043,6 +1043,7 @@ struct bcm_sf2_of_data {
u32 type;
const u16 *reg_offsets;
unsigned int core_reg_align;
+   unsigned int num_cfp_rules;
 };
 
 /* Register offsets for the SWITCH_REG_* block */
@@ -1066,6 +1067,7 @@ struct bcm_sf2_of_data {
.type   = BCM7445_DEVICE_ID,
.core_reg_align = 0,
.reg_offsets= bcm_sf2_7445_reg_offsets,
+   .num_cfp_rules  = 256,
 };
 
 static const u16 bcm_sf2_7278_reg_offsets[] = {
@@ -1088,6 +1090,7 @@ struct bcm_sf2_of_data {
.type   = BCM7278_DEVICE_ID,
.core_reg_align = 1,
.reg_offsets= bcm_sf2_7278_reg_offsets,
+   .num_cfp_rules  = 128,
 };
 
 static const struct of_device_id bcm_sf2_of_match[] = {
@@ -1144,6 +1147,7 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
priv->type = data->type;
priv->reg_offsets = data->reg_offsets;
priv->core_reg_align = data->core_reg_align;
+   priv->num_cfp_rules = data->num_cfp_rules;
 
/* Auto-detection using standard registers will not work, so
 * provide an indication of what kind of device we are for
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index d9c96b281fc0..02c499f9c56b 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -72,6 +72,7 @@ struct bcm_sf2_priv {
u32 type;
const u16   *reg_offsets;
unsigned intcore_reg_align;
+   unsigned intnum_cfp_rules;
 
/* spinlock protecting access to the indirect registers */
spinlock_t  indir_lock;
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 2fb32d67065f..8a1da7e67707 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -98,7 +98,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct 
bcm_sf2_priv *priv,
 {
u32 reg;
 
-   WARN_ON(addr >= CFP_NUM_RULES);
+   WARN_ON(addr >= priv->num_cfp_rules);
 
reg = core_readl(priv, CORE_CFP_ACC);
reg &= ~(XCESS_ADDR_MASK << XCESS_ADDR_SHIFT);
@@ -109,7 +109,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct 
bcm_sf2_priv *priv,
 static inline unsigned int bcm_sf2_cfp_rule_size(struct bcm_sf2_priv *priv)
 {
/* Entry #0 is reserved */
-   return CFP_NUM_RULES - 1;
+   return priv->num_cfp_rules - 1;
 }
 
 static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int port,
@@ -523,7 +523,7 @@ static int bcm_sf2_cfp_rule_get_all(struct bcm_sf2_priv 
*priv,
if (!(reg & OP_STR_DONE))
break;
 
-   } while (index < CFP_NUM_RULES);
+   } while (index < priv->num_cfp_rules);
 
/* Put the TCAM size here */
nfc->data = bcm_sf2_cfp_rule_size(priv);
@@ -544,7 +544,7 @@ int bcm_sf2_get_rxnfc(struct dsa_switch *ds, int port,
case ETHTOOL_GRXCLSRLCNT:
/* Subtract the default, unusable rule */
nfc->rule_cnt = bitmap_weight(priv->cfp.used,
- CFP_NUM_RULES) - 1;
+ priv->num_cfp_rules) - 1;
/* We support specifying rule locations */
nfc->data |= RX_CLS_LOC_SPECIAL;
break;
-- 
1.9.1

[RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping

2017-08-30 Thread Florian Fainelli

Even though TC2QOS mapping is for switch egress queues, we need to
configure it correclty in order for the Broadcom tag ingress (CPU ->
switch) queue selection to work correctly since there is a 1:1 mapping
between switch egress queues and ingress queues.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 3f1ad9d5d7c5..fc9f9f171e55 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -103,6 +103,7 @@ static void bcm_sf2_brcm_hdr_setup(struct bcm_sf2_priv 
*priv, int port)
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   unsigned int i;
u32 reg, offset;
 
if (priv->type == BCM7445_DEVICE_ID)
@@ -129,6 +130,14 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int 
port)
reg |= MII_DUMB_FWDG_EN;
core_writel(priv, reg, CORE_SWITCH_CTRL);
 
+   /* Configure Traffic Class to QoS mapping, allow each priority to map
+* to a different queue number
+*/
+   reg = core_readl(priv, CORE_PORT_TC2_QOS_MAP_PORT(port));
+   for (i = 0; i < 8; i++)
+   reg |= i << (PRT_TO_QID_SHIFT * i);
+   core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
+
bcm_sf2_brcm_hdr_setup(priv, port);
 
/* Force link status for IMP port */
-- 
1.9.1

[RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues

2017-08-30 Thread Florian Fainelli

The switch supports 8 egress queues per port, so indicate that such that
net/dsa/slave.c::dsa_slave_create can allocate the right number of TX
queues.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8492c9d64004..3f1ad9d5d7c5 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1147,6 +1147,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
ds = dev->ds;
ds->ops = _sf2_ops;
 
+   /* Advertise the 8 egress queues */
+   ds->num_tx_queues = 8;
+
dev_set_drvdata(>dev, priv);
 
spin_lock_init(>indir_lock);
-- 
1.9.1

[RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping

2017-08-30 Thread Florian Fainelli

We originally used skb->priority but that was not quite correct as this
bitfield needs to contain the egress switch queue we intend to send this
SKB to.

Signed-off-by: Florian Fainelli 
---
 net/dsa/tag_brcm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index de74c3f77818..dbb016434ace 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -62,6 +62,7 @@
 static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device 
*dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
+   u16 queue = skb_get_queue_mapping(skb);
u8 *brcm_tag;
 
if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
@@ -78,7 +79,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
struct net_device *dev
 * deprecated
 */
brcm_tag[0] = (1 << BRCM_OPCODE_SHIFT) |
-   ((skb->priority << BRCM_IG_TC_SHIFT) & BRCM_IG_TC_MASK);
+  ((queue & BRCM_IG_TC_MASK) << BRCM_IG_TC_SHIFT);
brcm_tag[1] = 0;
brcm_tag[2] = 0;
if (p->dp->index == 8)
-- 
1.9.1

[RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues

2017-08-30 Thread Florian Fainelli

Let switch drivers indicate how many RX and TX queues they support. Some
switches, such as Broadcom Starfighter 2 are resigned with 8 egress
queues. Future changes will allow us to leverage the queue mapping and
direct the transmission towards a particular queue.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h |  4 
 net/dsa/slave.c   | 10 --
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 398ca8d70ccd..b10e8da3f8d7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -243,6 +243,10 @@ struct dsa_switch {
/* devlink used to represent this switch device */
struct devlink  *devlink;
 
+   /* Number of switch port queues */
+   unsigned intnum_rx_queues;
+   unsigned intnum_tx_queues;
+
/* Dynamically allocated ports, keep last */
size_t num_ports;
struct dsa_port ports[];
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 78e78a6e6833..bfd7173a3c6a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1259,8 +1259,14 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
cpu_dp = ds->dst->cpu_dp;
master = cpu_dp->netdev;
 
-   slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
-NET_NAME_UNKNOWN, ether_setup);
+   if (!ds->num_rx_queues)
+   ds->num_rx_queues = 1;
+   if (!ds->num_tx_queues)
+   ds->num_tx_queues = 1;
+
+   slave_dev = alloc_netdev_mqs(sizeof(struct dsa_slave_priv), name,
+NET_NAME_UNKNOWN, ether_setup,
+ds->num_tx_queues, ds->num_rx_queues);
if (slave_dev == NULL)
return -ENOMEM;
 
-- 
1.9.1

[RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-08-30 Thread Florian Fainelli

This patch series is sent as reference, especially because the last patch
is trying not to be creating too many layer violations, but clearly there
are a little bit being created here anyways.

Essentially what I am trying to achieve is that you have a stacked device which
is multi-queue aware, that applications will be using, and for which they can
control the queue selection (using mq) the way they want. Each of each stacked
network devices are created for each port of the switch (this is what DSA
does). When a skb is submitted from say net_device X, we can derive its port
number and look at the queue_mapping value to determine which port of the
switch and queue we should be sending this to. The information is embedded in a
tag (4 bytes) and is used by the switch to steer the transmission.

These stacked devices will actually transmit using a "master" or conduit
network device which has a number of queues as well. In one version of the
hardware that I work with, we have up to 4 ports, each with 8 queues, and the
master device has a total of 32 hardware queues, so a 1:1 mapping is easy. With
another version of the hardware, same number of ports and queues, but only 16
hardware queues, so only a 2:1 mapping is possible.

In order for congestion information to work properly, I need to establish a
mapping, preferably before transmission starts (but reconfiguration while
interfaces are running would be possible too) between these stacked device's
queue and the conduit interface's queue.

Comments, flames, rotten tomatoes, anything!

Florian Fainelli (8):
  net: dsa: Allow switch drivers to indicate number of RX/TX queues
  net: dsa: tag_brcm: Set output queue from skb queue mapping
  net: dsa: bcm_sf2: Advertise number of egress queues
  net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
  net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
  net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num
  net: dsa: tag_brcm: Indicate to master netdevice port + queue
  net: systemport: Establish DSA network device queue mapping

 drivers/net/dsa/bcm_sf2.c  |  16 +
 drivers/net/dsa/bcm_sf2.h  |   1 +
 drivers/net/dsa/bcm_sf2_cfp.c  |   8 +--
 drivers/net/ethernet/broadcom/bcmsysport.c | 100 +++--
 drivers/net/ethernet/broadcom/bcmsysport.h |  11 +++-
 include/net/dsa.h  |  19 ++
 net/dsa/slave.c|  22 +--
 net/dsa/tag_brcm.c |   8 ++-
 8 files changed, 170 insertions(+), 15 deletions(-)

-- 
1.9.1

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Tim Harvey

On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn  wrote:
> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
>> Greetings,
>>
>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).
>
> Hi Tim
>
> Can you confirm the counter is this one:
>
>/* Report late collisions as a frame error. */
> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
> ndev->stats.rx_frame_errors++;
>
> I don't see anywhere else frame errors are counted, but it would be
> good to prove we are looking in the right place.
>

Andrew,

(adding IMX FEC driver maintainer to CC)

Yes, that's one of them being hit. It looks like ifconfig reports
'frame' as the accumulation of a few stats so here are some more
specifics from /sys/class/net/eth0/statistics:

root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
for i in `ls rx_*`; do echo $i:$(cat $i); done
rx_bytes:103229
rx_compressed:0
rx_crc_errors:22
rx_dropped:0
rx_errors:22
rx_fifo_errors:0
rx_frame_errors:22
rx_length_errors:22
rx_missed_errors:0
rx_nohandler:0
rx_over_errors:0
rx_packets:1174
root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
  inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
  TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)

Instrumenting fec driver I see the following getting hit:

status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */

Is this a frame size issue where the MV88E6176 is sending frames down
that exceed the MTU because of headers added?

Tim

Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread Florian Fainelli

On 08/30/2017 05:13 PM, David Daney wrote:
> On 07/31/2017 05:28 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Fri, 28 Jul 2017 11:58:36 -0700
>>
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> Reported-by: Marc Gonzalez 
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Florian Fainelli 
>>> ---
>>> Changes in v2:
>>>
>>> - reword subject and commit message based on changes
>>> - dropped flush_scheduled_work() since it is redundant
>>
>> Applied and queued up for -stable, thanks.
>>
> 
> 
> This is broken.  Please revert.

This has been causing problem for Geert as well, 2 vs 1, Marc, you lose,
I will send a revert for this shortly, sorry about that.

> 
> Upstream commit 7ad813f20853 and in the stable branches as well.
> 
> When ndo_stop() is called we call:
> 
> 
>  phy_disconnect()
> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
> +---> phy_stop_machine()
> |  +---> phy_stop_machine()
> |  +> queue_delayed_work(): Work queued.
> +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
> +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
> 
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0048, epc == 80de37ec, ra == 80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 8004021ed100 task.stack: 800409d7
> $ 0   :  84720060 0048 0004
> $ 4   :  0001 0004 
> $ 8   :   98f3 
> $12   : 800409d73fe0 9c00 846547c8 af3b
> $16   : 8004096bab68 8004096babd0  8004096ba800
> $20   :   8109 0008
> $24   : 0061 808637b0
> $28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
> Hi: 002a
> Lo: 003f
> epc   : 80de37ec netif_carrier_off+0xc/0x58
> ra: 80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3KX SX UX KERNEL EXL IE
> Cause : 0088 (ExcCode 02)
> BadVA : 0048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
> task=8004021ed100, tls=)
> Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
>  808a1708 800409a54000 8000271bd300
> 8000271bd320 800409a54030 80ff0f00 0001
> 8109 808a1ac0 800402182080 8465
> 800402182080 8465 80ff 800409a54000
> 808a1970  8004099e8000 800402099240
>  808a8598  800408eeeb00
> 800409a54000 810a1d00  800409d73de8
> 800409d73de8 0088 0c009c00 800409d73e08
> 800409d73e08 800402182080 808a84d0 800402182080
> ...
> Call Trace:
> [] netif_carrier_off+0xc/0x58
> [] phy_state_machine+0x48c/0x4f8
> [] process_one_work+0x158/0x368
> [] worker_thread+0x150/0x4c0
> [] kthread+0xc8/0xe0
> [] ret_from_kernel_thread+0x14/0x1c


-- 
Florian

Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread David Daney


And of course I mess up my pretty picture, see below.

On 08/30/2017 05:13 PM, David Daney wrote:

On 07/31/2017 05:28 PM, David Miller wrote:

From: Florian Fainelli 
Date: Fri, 28 Jul 2017 11:58:36 -0700


Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- reword subject and commit message based on changes
- dropped flush_scheduled_work() since it is redundant


Applied and queued up for -stable, thanks.




This is broken.  Please revert.

Upstream commit 7ad813f20853 and in the stable branches as well.

When ndo_stop() is called we call:


  phy_disconnect()
 +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
 +---> phy_stop_machine()
 |  +---> phy_stop_machine()


s/phy_stop_machine/phy_state_machine/

The call that the offending patch adds.



 |  +> queue_delayed_work(): Work queued.
 +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

  phy_state_machine()
 +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:


  CPU 12 Unable to handle kernel paging request at virtual address
0048, epc == 80de37ec, ra == 80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 8004021ed100 task.stack: 800409d7
$ 0   :  84720060 0048 0004
$ 4   :  0001 0004 
$ 8   :   98f3 
$12   : 800409d73fe0 9c00 846547c8 af3b
$16   : 8004096bab68 8004096babd0  8004096ba800
$20   :   8109 0008
$24   : 0061 808637b0
$28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
Hi: 002a
Lo: 003f
epc   : 80de37ec netif_carrier_off+0xc/0x58
ra: 80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3KX SX UX KERNEL EXL IE
Cause : 0088 (ExcCode 02)
BadVA : 0048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
task=8004021ed100, tls=)
Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
  808a1708 800409a54000 
8000271bd300
 8000271bd320 800409a54030 80ff0f00 
0001
 8109 808a1ac0 800402182080 
8465
 800402182080 8465 80ff 
800409a54000
 808a1970  8004099e8000 
800402099240
  808a8598  
800408eeeb00
 800409a54000 810a1d00  
800409d73de8
 800409d73de8 0088 0c009c00 
800409d73e08
 800409d73e08 800402182080 808a84d0 
800402182080

 ...
Call Trace:
[] netif_carrier_off+0xc/0x58
[] phy_state_machine+0x48c/0x4f8
[] process_one_work+0x158/0x368
[] worker_thread+0x150/0x4c0
[] kthread+0xc8/0xe0
[] ret_from_kernel_thread+0x14/0x1c

Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread David Daney


On 07/31/2017 05:28 PM, David Miller wrote:

From: Florian Fainelli 
Date: Fri, 28 Jul 2017 11:58:36 -0700


Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- reword subject and commit message based on changes
- dropped flush_scheduled_work() since it is redundant


Applied and queued up for -stable, thanks.




This is broken.  Please revert.

Upstream commit 7ad813f20853 and in the stable branches as well.

When ndo_stop() is called we call:


 phy_disconnect()
+---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
+---> phy_stop_machine()
|  +---> phy_stop_machine()
|  +> queue_delayed_work(): Work queued.
+--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
+>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:


 CPU 12 Unable to handle kernel paging request at virtual address
0048, epc == 80de37ec, ra == 80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 8004021ed100 task.stack: 800409d7
$ 0   :  84720060 0048 0004
$ 4   :  0001 0004 
$ 8   :   98f3 
$12   : 800409d73fe0 9c00 846547c8 af3b
$16   : 8004096bab68 8004096babd0  8004096ba800
$20   :   8109 0008
$24   : 0061 808637b0
$28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
Hi: 002a
Lo: 003f
epc   : 80de37ec netif_carrier_off+0xc/0x58
ra: 80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3KX SX UX KERNEL EXL IE
Cause : 0088 (ExcCode 02)
BadVA : 0048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
task=8004021ed100, tls=)
Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
 808a1708 800409a54000 8000271bd300
8000271bd320 800409a54030 80ff0f00 0001
8109 808a1ac0 800402182080 8465
800402182080 8465 80ff 800409a54000
808a1970  8004099e8000 800402099240
 808a8598  800408eeeb00
800409a54000 810a1d00  800409d73de8
800409d73de8 0088 0c009c00 800409d73e08
800409d73e08 800402182080 808a84d0 800402182080
...
Call Trace:
[] netif_carrier_off+0xc/0x58
[] phy_state_machine+0x48c/0x4f8
[] process_one_work+0x158/0x368
[] worker_thread+0x150/0x4c0
[] kthread+0xc8/0xe0
[] ret_from_kernel_thread+0x14/0x1c

[PATCH net-next] devlink: Maintain consistency in mac field name

2017-08-30 Thread David Ahern

IPv4 name uses "destination ip" as does the IPv6 patch set.
Make the mac field consistent.

Signed-off-by: David Ahern 
---
 net/core/devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 47931a202a0c..7d430c1d9c3e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -31,7 +31,7 @@
 
 static struct devlink_dpipe_field devlink_dpipe_fields_ethernet[] = {
{
-   .name = "destination_mac",
+   .name = "destination mac",
.id = DEVLINK_DPIPE_FIELD_ETHERNET_DST_MAC,
.bitwidth = 48,
},
-- 
2.1.4

Re: [pull request][net 00/11] Mellanox, mlx5 fixes 2017-08-30

2017-08-30 Thread David Miller

From: Saeed Mahameed 
Date: Thu, 31 Aug 2017 01:20:59 +0300

> This series contains some misc fixes to the mlx5 driver.
> 
> Please pull and let me know if there's any problem.

Series applied, thanks.

Re: multi-queue over IFF_NO_QUEUE "virtual" devices

2017-08-30 Thread Cong Wang

On Tue, Aug 29, 2017 at 8:49 PM, Florian Fainelli  wrote:
> Le 08/07/17 à 15:26, Florian Fainelli a écrit :
>> Hi,
>>
>> Most DSA supported Broadcom switches have multiple queues per ports
>> (usually 8) and each of these queues can be configured with different
>> pause, drop, hysteresis thresholds and so on in order to make use of the
>> switch's internal buffering scheme and have some queues achieve some
>> kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher
>> priority than LAN to WAN for Q0).
>>
>> This is obviously very workload specific, so I'd want maximum
>> programmability as much as possible.
>>
>> This brings me to a few questions:
>>
>> 1) If we have the DSA slave network devices currently flagged with
>> IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application
>> can control exactly which switch egress queue is used on a per-flow
>> basis, would that be a problem (this is the dynamic selection of the TX
>> queue)?
>
> So I have this part figured out, with a bunch of changes network devices
> created by DSA are now multiqueue aware and the Broadcom tag layer is
> capable of extracting the queue index, passing it in the tag where
> expected and having the switch forward to the appropriate switch port
> and queue within that port. It also sets the queue mapping in the SKB
> for later consumption by the master network device driver: bcmsysport.c
> because of 2).
>
>>
>> 2) The conduit interface (CPU) port network interface has a congestion
>> control scheme which requires each of its TX queues (32 or 16) to be
>> statically mapped to each of the underlying switch port queues because
>> the congestion/ HW needs to inspect the queue depths of the switch to
>> accept/reject a packet at the CPU's TX ring level. Do we have a good way
>> with tc to map a virtual/stacked device's queue(s) on-top of its
>> physical/underlying device's queues (this is the static queue mapping
>> necessary for congestion to work)?
>
> That part I have not figured out yet, with some static mapping I can
> obtain the results that I want and was even considering the possibility
> of doing something like this:
>
> - register a network device notifier with bcmsysport.c (master network
> device) for this setup
> - expose a helper function allowing me to obtain a given DSA network
> device port index
> - whenever DSA creates network devices reconfigure the ring and queue
> mapping of the TX queues managed by bcmsysport.c with the DSA network
> device port index that has just been registered and just do a 1-1
> mapping of the 8 queues
>
> You would end-up with something like:
>
> gphy (port 0) queues 0-7 mapped to systemport queues 0-7
> rgmii_1 (port 1) queues 0-7 mapped to systemport queues 8-15
> rgmii_2 (port 2) queues 0-7 mapped to systemport queues 16 through 23
> moca (port 7) queues 0-7 mapped to systemport queues 24-31
>
> This should be working because bcmsysport's TX queues are not under
> direct control by the user, they are used via DSA created network
> devices which indicate the queue they want to use. When the DSA
> interfaces are brought down, their respective systemport queues now
> become unused. This also works because the number of physical ports of
> the switch times the number of queues is matching the number of TX
> queues from systemport (like if someone designed it with that exact
> purpose in mind ;)).
>
> The only problem with that approach of course is that it embeds a policy
> within the systemport driver.
>
> Ideally I would really like to configure this via tc by setting up a
> mapping between queues of one network devices to queues of another
> network device, is that a possible thing, Jamal, Cong, Jiri, do you know?

I am not sure if I understand the mapping you are talking about here.

TC layer rarely deals with hardware queues directly (except probably mq),
so this question probably don't belong to TC.

OTOH, TC can modify skb->hash, so you can redirect packets to a specific
queue, but this doesn't sound like what you are you looking for.

Maybe Jiri has more thoughts here since he works on TC offloading things.

[PATCH net-next] liquidio: fix crash in presence of zeroed-out base address regs

2017-08-30 Thread Felix Manlunas

From: Rick Farrington 

Fix crash in linux PF driver when BARs have been cleared/de-programmed;
fail early init (prior to mapping BARs) if the BAR0 or
BAR1 registers are zero.

This situation can arise when the PF is added to a VM (PCI pass-through),
then a PF FLR is issued (in the VM).  After this occurs, the BAR registers
will be zero. If we attempt to load the PF driver in the host
(after VM has been shutdown), the host can reset.

Signed-off-by: Rick Farrington 
Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Felix Manlunas 
---
 .../net/ethernet/cavium/liquidio/cn23xx_pf_device.c  | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 4b0ca9f..8705e23 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1269,6 +1269,26 @@ static int cn23xx_sriov_config(struct octeon_device *oct)
 
 int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
 {
+   u32 data32;
+   u64 BAR0, BAR1;
+
+   pci_read_config_dword(oct->pci_dev, PCI_BASE_ADDRESS_0, );
+   BAR0 = (u64)(data32 & ~0xf);
+   pci_read_config_dword(oct->pci_dev, PCI_BASE_ADDRESS_1, );
+   BAR0 |= ((u64)data32 << 32);
+   pci_read_config_dword(oct->pci_dev, PCI_BASE_ADDRESS_2, );
+   BAR1 = (u64)(data32 & ~0xf);
+   pci_read_config_dword(oct->pci_dev, PCI_BASE_ADDRESS_3, );
+   BAR1 |= ((u64)data32 << 32);
+
+   if (!BAR0 || !BAR1) {
+   if (!BAR0)
+   dev_err(>pci_dev->dev, "device BAR0 unassigned\n");
+   if (!BAR1)
+   dev_err(>pci_dev->dev, "device BAR1 unassigned\n");
+   return 1;
+   }
+
if (octeon_map_pci_barx(oct, 0, 0))
return 1;
 
-- 
1.8.3.1

Re: [Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Cong Wang

On Wed, Aug 30, 2017 at 3:45 PM, Daniel Borkmann  wrote:
> On 08/31/2017 12:22 AM, Daniel Borkmann wrote:
>>
>> The prog->res.classid is the default one, but can be overridden
>> later depending on the specified program. cls_bpf_classify() does
>> after prog return (filter_res holds return code):
>>
>>  [...]
>>  if (filter_res == 0)
>>  continue;
>>  if (filter_res != -1) {
>>  res->class   = 0;
>>  res->classid = filter_res;
>>  } else {
>>  *res = prog->res;
>>  }
>>  [...]
>>
>> Meaning in case of a match (-1), we use the default bound one,
>> but prog may as well return an alternative found classid if it
>> wants to. So both versions are possible.
>
>
> But even for that case your patch looks fine to me actually, since
> for dynamic classid we set class to 0. No objections from my side
> then.

Sounds good. Then I will leave it as it is.

Thanks for explanation.

Re: [PATCH net-next] hv_netvsc: Fix typos in the document of UDP hashing

2017-08-30 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 30 Aug 2017 13:37:22 -0700

> From: Haiyang Zhang 
> 
> There are two typos in the document, netvsc.txt,
> regarding UDP hashing level. This patch fixes them.
> 
> Signed-off-by: Haiyang Zhang 

Applied, thanks.

[net-next 2/3] net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels

2017-08-30 Thread Saeed Mahameed

From: Gal Pressman 

Add TX offloads support for GRE tunneled packets by reporting the needed
netdev features.

Signed-off-by: Gal Pressman 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 51 +++
 include/linux/mlx5/mlx5_ifc.h |  2 +-
 2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index fdc2b92f020b..9475fb89a744 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3499,13 +3499,13 @@ static void mlx5e_del_vxlan_port(struct net_device 
*netdev,
mlx5e_vxlan_queue_work(priv, ti->sa_family, be16_to_cpu(ti->port), 0);
 }
 
-static netdev_features_t mlx5e_vxlan_features_check(struct mlx5e_priv *priv,
-   struct sk_buff *skb,
-   netdev_features_t features)
+static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv,
+struct sk_buff *skb,
+netdev_features_t features)
 {
struct udphdr *udph;
-   u16 proto;
-   u16 port = 0;
+   u8 proto;
+   u16 port;
 
switch (vlan_get_protocol(skb)) {
case htons(ETH_P_IP):
@@ -3518,14 +3518,17 @@ static netdev_features_t 
mlx5e_vxlan_features_check(struct mlx5e_priv *priv,
goto out;
}
 
-   if (proto == IPPROTO_UDP) {
+   switch (proto) {
+   case IPPROTO_GRE:
+   return features;
+   case IPPROTO_UDP:
udph = udp_hdr(skb);
port = be16_to_cpu(udph->dest);
-   }
 
-   /* Verify if UDP port is being offloaded by HW */
-   if (port && mlx5e_vxlan_lookup_port(priv, port))
-   return features;
+   /* Verify if UDP port is being offloaded by HW */
+   if (mlx5e_vxlan_lookup_port(priv, port))
+   return features;
+   }
 
 out:
/* Disable CSUM and GSO if the udp dport is not offloaded by HW */
@@ -3549,7 +3552,7 @@ static netdev_features_t mlx5e_features_check(struct 
sk_buff *skb,
/* Validate if the tunneled packet is being offloaded by HW */
if (skb->encapsulation &&
(features & NETIF_F_CSUM_MASK || features & NETIF_F_GSO_MASK))
-   return mlx5e_vxlan_features_check(priv, skb, features);
+   return mlx5e_tunnel_features_check(priv, skb, features);
 
return features;
 }
@@ -4014,20 +4017,32 @@ static void mlx5e_build_nic_netdev(struct net_device 
*netdev)
netdev->hw_features  |= NETIF_F_HW_VLAN_CTAG_RX;
netdev->hw_features  |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
-   if (mlx5e_vxlan_allowed(mdev)) {
-   netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
-  NETIF_F_GSO_UDP_TUNNEL_CSUM |
-  NETIF_F_GSO_PARTIAL;
+   if (mlx5e_vxlan_allowed(mdev) || MLX5_CAP_ETH(mdev, 
tunnel_stateless_gre)) {
+   netdev->hw_features |= NETIF_F_GSO_PARTIAL;
netdev->hw_enc_features |= NETIF_F_IP_CSUM;
netdev->hw_enc_features |= NETIF_F_IPV6_CSUM;
netdev->hw_enc_features |= NETIF_F_TSO;
netdev->hw_enc_features |= NETIF_F_TSO6;
-   netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
-   netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM |
-  NETIF_F_GSO_PARTIAL;
+   netdev->hw_enc_features |= NETIF_F_GSO_PARTIAL;
+   }
+
+   if (mlx5e_vxlan_allowed(mdev)) {
+   netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
+  NETIF_F_GSO_UDP_TUNNEL_CSUM;
+   netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL |
+  NETIF_F_GSO_UDP_TUNNEL_CSUM;
netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
}
 
+   if (MLX5_CAP_ETH(mdev, tunnel_stateless_gre)) {
+   netdev->hw_features |= NETIF_F_GSO_GRE |
+  NETIF_F_GSO_GRE_CSUM;
+   netdev->hw_enc_features |= NETIF_F_GSO_GRE |
+  NETIF_F_GSO_GRE_CSUM;
+   netdev->gso_partial_features |= NETIF_F_GSO_GRE |
+   NETIF_F_GSO_GRE_CSUM;
+   }
+
mlx5_query_port_fcs(mdev, _supported, _enabled);
 
if (fcs_supported)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ae7d09b9c52f..3d5d32e5446c 100644
---

[net-next 3/3] net/mlx5e: Support RSS for GRE tunneled packets

2017-08-30 Thread Saeed Mahameed

From: Gal Pressman 

Introduce a new flow table and indirect TIRs which are used to hash the
inner packet headers of GRE tunneled packets.

When a GRE tunneled packet is received, the TTC flow table will match
the new IPv4/6->GRE rules which will forward it to the inner TTC table.
The inner TTC is similar to its counterpart outer TTC table, but
matching the inner packet headers instead of the outer ones (and does
not include the new IPv4/6->GRE rules).
The new rules will not add steering hops since they are added to an
already existing flow group which will be matched regardless of this
patch. Non GRE traffic will not be affected.

The inner flow table will forward the packet to inner indirect TIRs
which hash the inner packet and thus result in RSS for the tunneled
packets.

Testing 8 TCP streams bandwidth over GRE:
System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 21.3 Gbps (Single RQ)
Now   : 90.5 Gbps (RSS spread on 8 RQs)

Signed-off-by: Gal Pressman 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  18 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 248 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  57 -
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   4 +-
 5 files changed, 321 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0039b4725405..a31912415264 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -620,6 +620,12 @@ enum mlx5e_traffic_types {
MLX5E_NUM_INDIR_TIRS = MLX5E_TT_ANY,
 };
 
+enum mlx5e_tunnel_types {
+   MLX5E_TT_IPV4_GRE,
+   MLX5E_TT_IPV6_GRE,
+   MLX5E_NUM_TUNNEL_TT,
+};
+
 enum {
MLX5E_STATE_ASYNC_EVENTS_ENABLED,
MLX5E_STATE_OPENED,
@@ -679,6 +685,7 @@ struct mlx5e_l2_table {
 struct mlx5e_ttc_table {
struct mlx5e_flow_table  ft;
struct mlx5_flow_handle  *rules[MLX5E_NUM_TT];
+   struct mlx5_flow_handle  *tunnel_rules[MLX5E_NUM_TUNNEL_TT];
 };
 
 #define ARFS_HASH_SHIFT BITS_PER_BYTE
@@ -711,6 +718,7 @@ enum {
MLX5E_VLAN_FT_LEVEL = 0,
MLX5E_L2_FT_LEVEL,
MLX5E_TTC_FT_LEVEL,
+   MLX5E_INNER_TTC_FT_LEVEL,
MLX5E_ARFS_FT_LEVEL
 };
 
@@ -736,6 +744,7 @@ struct mlx5e_flow_steering {
struct mlx5e_vlan_table vlan;
struct mlx5e_l2_table   l2;
struct mlx5e_ttc_table  ttc;
+   struct mlx5e_ttc_table  inner_ttc;
struct mlx5e_arfs_tablesarfs;
 };
 
@@ -769,6 +778,7 @@ struct mlx5e_priv {
u32tisn[MLX5E_MAX_NUM_TC];
struct mlx5e_rqt   indir_rqt;
struct mlx5e_tir   indir_tir[MLX5E_NUM_INDIR_TIRS];
+   struct mlx5e_tir   inner_indir_tir[MLX5E_NUM_INDIR_TIRS];
struct mlx5e_tir   direct_tir[MLX5E_MAX_NUM_CHANNELS];
u32tx_rates[MLX5E_MAX_NUM_SQS];
inthard_mtu;
@@ -903,7 +913,7 @@ int mlx5e_redirect_rqt(struct mlx5e_priv *priv, u32 rqtn, 
int sz,
   struct mlx5e_redirect_rqt_param rrp);
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_params *params,
enum mlx5e_traffic_types tt,
-   void *tirc);
+   void *tirc, bool inner);
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
@@ -932,6 +942,12 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params,
 void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
  struct mlx5e_params *params, u8 rq_type);
 
+static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
+{
+   return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
+   MLX5_CAP_FLOWTABLE_NIC_RX(mdev, 
ft_field_support.inner_ip_version));
+}
+
 static inline
 struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 0dd7e9caf150..c6ec90e9c95b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1212,9 +1212,18 @@ static void mlx5e_modify_tirs_hash(struct mlx5e_priv 
*priv, void *in, int inlen)
 
for (tt = 0; tt < MLX5E_NUM_INDIR_TIRS; tt++) {
memset(tirc, 0, ctxlen);
-   mlx5e_build_indir_tir_ctx_hash(>channels.params, tt, 
tirc);
+   mlx5e_build_indir_tir_ctx_hash(>channels.params, tt, 
tirc,

[pull request][net-next 0/3] Mellanox, mlx5 GRE tunnel offloads

2017-08-30 Thread Saeed Mahameed

Hi Dave,

Tthe following changes provide GRE tunnel offloads for mlx5 ethernet netdevice 
driver.

For more details please see tag log message below.
Please pull and let me know if there's any problem.

Note: this series doesn't conflict with the ongoing net mlx5 submission.

Thanks,
Saeed.

---

The following changes since commit 90774a93ef075b39e55d31fe56fc286d71a046ac:

  bpf: test_maps: fix typos, "conenct" and "listeen" (2017-08-30 15:32:16 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-GRE-Offload

for you to fetch changes up to 7b3722fa9ef647eb1ae6a60a5d46f7c67ab09a33:

  net/mlx5e: Support RSS for GRE tunneled packets (2017-08-31 01:54:15 +0300)


mlx5-updates-2017-08-31 (GRE Offloads support)

This series provides the support for MPLS RSS and GRE TX offloads and
RSS support.

The first patch from Gal and Ariel provides the mlx5 driver support for
ConnectX capability to perform IP version identification and matching in
order to distinguish between IPv4 and IPv6 without the need to specify the
encapsulation type, thus perform RSS in MPLS automatically without
specifying MPLS ethertyoe. This patch will also serve for inner GRE IPv4/6
classification for inner GRE RSS.

2nd patch from Gal, Adds the TX offloads support for GRE tunneled packets,
by reporting the needed netdev features.

3rd patch from Gal, Adds GRE inner RSS support by creating the needed device
resources (Steering Tables/rules and traffic classifiers) to Match GRE traffic
and perform RSS hashing on the inner headers.

Improvement:
Testing 8 TCP streams bandwidth over GRE:
System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 21.3 Gbps (Single RQ)
Now   : 90.5 Gbps (RSS spread on 8 RQs)

Thanks,
Saeed.


Gal Pressman (3):
  net/mlx5e: Use IP version matching to classify IP traffic
  net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels
  net/mlx5e: Support RSS for GRE tunneled packets

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  18 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 281 -
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 108 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   4 +-
 include/linux/mlx5/mlx5_ifc.h  |   2 +-
 6 files changed, 384 insertions(+), 40 deletions(-)

[net-next 1/3] net/mlx5e: Use IP version matching to classify IP traffic

2017-08-30 Thread Saeed Mahameed

From: Gal Pressman 

This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classification steering rules.

Since IP packets with MPLS tag header have MPLS ethertype, they
missed the IPv4/6 ethertype rule and ended up hitting the default
filter forwarding all the packets to the same single RQ (No RSS).

Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces ethertype
matching by the device's capability to perform IP version
identification and matching in order to distinguish between IPv4 and
IPv6.
Therefore, when driver is performing flow steering configuration on the
device it will use IP version matching in IP classified rules instead
of ethertype matching which will cause relevant MPLS tagged packets to
hit this rule as well.

If the device doesn't support IP version matching the driver will fall back
to use legacy ethertype matching in the steering as before.

Signed-off-by: Gal Pressman 
Signed-off-by: Ariel Levkovich 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 33 ++---
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index eecbc6d4f51f..85e6226dacfb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -660,6 +660,17 @@ static struct {
},
 };
 
+static u8 mlx5e_etype_to_ipv(u16 ethertype)
+{
+   if (ethertype == ETH_P_IP)
+   return 4;
+
+   if (ethertype == ETH_P_IPV6)
+   return 6;
+
+   return 0;
+}
+
 static struct mlx5_flow_handle *
 mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
struct mlx5_flow_table *ft,
@@ -667,10 +678,12 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
u16 etype,
u8 proto)
 {
+   int match_ipv_outer = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, 
ft_field_support.outer_ip_version);
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_handle *rule;
struct mlx5_flow_spec *spec;
int err = 0;
+   u8 ipv;
 
spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
if (!spec)
@@ -681,7 +694,13 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, 
outer_headers.ip_protocol);
MLX5_SET(fte_match_param, spec->match_value, 
outer_headers.ip_protocol, proto);
}
-   if (etype) {
+
+   ipv = mlx5e_etype_to_ipv(etype);
+   if (match_ipv_outer && ipv) {
+   spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+   MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, 
outer_headers.ip_version);
+   MLX5_SET(fte_match_param, spec->match_value, 
outer_headers.ip_version, ipv);
+   } else if (etype) {
spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, 
outer_headers.ethertype);
MLX5_SET(fte_match_param, spec->match_value, 
outer_headers.ethertype, etype);
@@ -739,7 +758,9 @@ static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv 
*priv)
 #define MLX5E_TTC_TABLE_SIZE   (MLX5E_TTC_GROUP1_SIZE +\
 MLX5E_TTC_GROUP2_SIZE +\
 MLX5E_TTC_GROUP3_SIZE)
-static int mlx5e_create_ttc_table_groups(struct mlx5e_ttc_table *ttc)
+
+static int mlx5e_create_ttc_table_groups(struct mlx5e_ttc_table *ttc,
+bool use_ipv)
 {
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5e_flow_table *ft = >ft;
@@ -761,7 +782,10 @@ static int mlx5e_create_ttc_table_groups(struct 
mlx5e_ttc_table *ttc)
/* L4 Group */
mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria);
MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ip_protocol);
-   MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ethertype);
+   if (use_ipv)
+   MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ip_version);
+   else
+   MLX5_SET_TO_ONES(fte_match_param, mc, outer_headers.ethertype);
MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS);
MLX5_SET_CFG(in, start_flow_index, ix);
ix += MLX5E_TTC_GROUP1_SIZE;
@@ -812,6 +836,7 @@ void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv)
 
 int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
 {
+   bool match_ipv_outer = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, 
ft_field_support.outer_ip_version);
struct mlx5e_ttc_table *ttc =

Re: [PATCH net] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278

2017-08-30 Thread David Miller

From: Florian Fainelli 
Date: Wed, 30 Aug 2017 12:39:33 -0700

> BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
> fix that.
> 
> Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
> Signed-off-by: Florian Fainelli 

Applied and queued up for -stable, thanks.

I hope we remember to increase CFP_NUM_RULES if we ever get a chip
that supports more than 256... :-/

Re: [PATCH net-next] xen-netfront: be more drop monitor friendly

2017-08-30 Thread David Miller

From: Eric Dumazet 
Date: Wed, 30 Aug 2017 10:32:58 -0700

> From: Eric Dumazet 
> 
> xennet_start_xmit() might copy skb with inappropriate layout
> into a fresh one.
> 
> Old skb is freed, and at this point it is not a drop, but
> a consume. New skb will then be either consumed or dropped. 
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.

Re: [PATCH net] kcm: do not attach PF_KCM sockets to avoid deadlock

2017-08-30 Thread David Miller

From: Eric Dumazet 
Date: Wed, 30 Aug 2017 09:29:31 -0700

> From: Eric Dumazet 
> 
> syzkaller had no problem to trigger a deadlock, attaching a KCM socket
> to another one (or itself). (original syzkaller report was a very
> confusing lockdep splat during a sendmsg())
> 
> It seems KCM claims to only support TCP, but no enforcement is done,
> so we might need to add additional checks.
> 
> Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
> Signed-off-by: Eric Dumazet 
> Reported-by: Dmitry Vyukov 

Applied and queued up for -stable, thanks.

Re: [Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Daniel Borkmann


On 08/31/2017 12:22 AM, Daniel Borkmann wrote:

On 08/31/2017 12:01 AM, Cong Wang wrote:

On Wed, Aug 30, 2017 at 2:48 PM, Daniel Borkmann  wrote:

On 08/30/2017 11:30 PM, Cong Wang wrote:
[...]


Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.


[...]


---
   include/net/sch_generic.h |  1 +
   net/sched/cls_basic.c |  9 +++
   net/sched/cls_bpf.c   |  9 +++


Same is for cls_bpf as well, so bind_class wouldn't work there
either as we could return dynamic classids. bind_class cannot
be added here, too.


I think you are probably right, but the following code is
misleading there:

 if (tb[TCA_BPF_CLASSID]) {
 prog->res.classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
 tcf_bind_filter(tp, >res, base);
 }

If the classid is dynamic, why this tb[TCA_BPF_CLASSID]?


The prog->res.classid is the default one, but can be overridden
later depending on the specified program. cls_bpf_classify() does
after prog return (filter_res holds return code):

 [...]
 if (filter_res == 0)
 continue;
 if (filter_res != -1) {
 res->class   = 0;
 res->classid = filter_res;
 } else {
 *res = prog->res;
 }
 [...]

Meaning in case of a match (-1), we use the default bound one,
but prog may as well return an alternative found classid if it
wants to. So both versions are possible.


But even for that case your patch looks fine to me actually, since
for dynamic classid we set class to 0. No objections from my side
then.

Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread David Miller

From: Colin King 
Date: Wed, 30 Aug 2017 18:15:25 +0100

> From: Colin Ian King 
> 
> Trivial fix to typos in printf error messages:
> "conenct" -> "connect"
> "listeen" -> "listen"
> 
> thanks to Daniel Borkmann for spotting one of these mistakes
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH][next] qed: fix spelling mistake: "calescing" -> "coalescing"

2017-08-30 Thread David Miller

From: Colin King 
Date: Wed, 30 Aug 2017 12:40:12 +0100

> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DP_NOTICE message
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net-next] net: hns3: Fixes the wrong IS_ERR check on the returned phydev value

2017-08-30 Thread David Miller

From: Salil Mehta 
Date: Wed, 30 Aug 2017 12:06:03 +0100

> This patch removes the wrong check being done for the phy device being
> returned by the mdiobus_get_phy() function. This function never returns
> the error pointers.
> 
> Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet
> Driver for hip08 SoC")
> Reported-by: Dan Carpenter 
> Signed-off-by: Salil Mehta 

Applied.

Re: [PATCH net 0/9] net/sched: init failure fixes

2017-08-30 Thread David Miller

From: Jamal Hadi Salim 
Date: Wed, 30 Aug 2017 08:15:37 -0400

> On 17-08-30 05:48 AM, Nikolay Aleksandrov wrote:
>> Hi all,
>> I went over all qdiscs' init, destroy and reset callbacks and found
>> the
>> issues fixed in each patch. Mostly they are null pointer dereferences
>> due
>> to uninitialized timer (qdisc watchdog) or double frees due to
>> ->destroy
>> cleaning up a second time. There's more information in each patch.
>> I've tested these by either sending wrong attributes from user-spaces,
>> no
>> attributes or by simulating memory alloc failure where
>> applicable. Also
>> tried all of the qdiscs as a default qdisc.
>> Most of these bugs were present before commit 87b60cfacf9f, I've tried
>> to
>> include proper fixes tags in each patch.
>> I haven't included individual patch acks in the set, I'd appreciate it
>> if
>> you take another look and resend them.
>> 
> 
> 
> Hi Nik,
> 
> For all patches:
> 
> Acked-by: Jamal Hadi Salim 

Series applied, thanks Nikolay.

Re: [Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Daniel Borkmann


On 08/31/2017 12:01 AM, Cong Wang wrote:

On Wed, Aug 30, 2017 at 2:48 PM, Daniel Borkmann  wrote:

On 08/30/2017 11:30 PM, Cong Wang wrote:
[...]


Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.


[...]


---
   include/net/sch_generic.h |  1 +
   net/sched/cls_basic.c |  9 +++
   net/sched/cls_bpf.c   |  9 +++


Same is for cls_bpf as well, so bind_class wouldn't work there
either as we could return dynamic classids. bind_class cannot
be added here, too.


I think you are probably right, but the following code is
misleading there:

 if (tb[TCA_BPF_CLASSID]) {
 prog->res.classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
 tcf_bind_filter(tp, >res, base);
 }

If the classid is dynamic, why this tb[TCA_BPF_CLASSID]?


The prog->res.classid is the default one, but can be overridden
later depending on the specified program. cls_bpf_classify() does
after prog return (filter_res holds return code):

[...]
if (filter_res == 0)
continue;
if (filter_res != -1) {
res->class   = 0;
res->classid = filter_res;
} else {
*res = prog->res;
}
[...]

Meaning in case of a match (-1), we use the default bound one,
but prog may as well return an alternative found classid if it
wants to. So both versions are possible.

[net 03/11] net/mlx5: Fix arm SRQ command for ISSI version 0

2017-08-30 Thread Saeed Mahameed

From: Noa Osherovich 

Support for ISSI version 0 was recently broken as the arm_srq_cmd
command, which is used only for ISSI version 0, was given the opcode
for ISSI version 1 instead of ISSI version 0.

Change arm_srq_cmd to use the correct command opcode for ISSI version
0.

Fixes: af1ba291c5e4 ('{net, IB}/mlx5: Refactor internal SRQ API')
Signed-off-by: Noa Osherovich 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
index f774de6f5fcb..520f6382dfde 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
@@ -201,13 +201,13 @@ static int destroy_srq_cmd(struct mlx5_core_dev *dev,
 static int arm_srq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
   u16 lwm, int is_srq)
 {
-   /* arm_srq structs missing using identical xrc ones */
-   u32 srq_in[MLX5_ST_SZ_DW(arm_xrc_srq_in)] = {0};
-   u32 srq_out[MLX5_ST_SZ_DW(arm_xrc_srq_out)] = {0};
+   u32 srq_in[MLX5_ST_SZ_DW(arm_rq_in)] = {0};
+   u32 srq_out[MLX5_ST_SZ_DW(arm_rq_out)] = {0};
 
-   MLX5_SET(arm_xrc_srq_in, srq_in, opcode,   MLX5_CMD_OP_ARM_XRC_SRQ);
-   MLX5_SET(arm_xrc_srq_in, srq_in, xrc_srqn, srq->srqn);
-   MLX5_SET(arm_xrc_srq_in, srq_in, lwm,  lwm);
+   MLX5_SET(arm_rq_in, srq_in, opcode, MLX5_CMD_OP_ARM_RQ);
+   MLX5_SET(arm_rq_in, srq_in, op_mod, MLX5_ARM_RQ_IN_OP_MOD_SRQ);
+   MLX5_SET(arm_rq_in, srq_in, srq_number, srq->srqn);
+   MLX5_SET(arm_rq_in, srq_in, lwm,  lwm);
 
return  mlx5_cmd_exec(dev, srq_in, sizeof(srq_in),
  srq_out, sizeof(srq_out));
-- 
2.13.0

[net 02/11] net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap.

2017-08-30 Thread Saeed Mahameed

From: Huy Nguyen 

Current code doesn't report DCB_CAP_DCBX_HOST capability when query
through getcap. User space lldptool expects capability to have HOST mode
set when it wants to configure DCBX CEE mode. In absence of HOST mode
capability, lldptool fails to switch to CEE mode.

This fix returns DCB_CAP_DCBX_HOST capability when port's DCBX
controlled mode is under software control.

Fixes: 3a6a931dfb8e ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 21 -
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0039b4725405..2f26fb34d741 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -263,6 +263,7 @@ struct mlx5e_dcbx {
 
/* The only setting that cannot be read from FW */
u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
+   u8 cap;
 };
 #endif
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 810b51029c7f..c1d384fca4dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -288,13 +288,8 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
-   struct mlx5e_dcbx *dcbx = >dcbx;
-   u8 mode = DCB_CAP_DCBX_VER_IEEE | DCB_CAP_DCBX_VER_CEE;
-
-   if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
-   mode |= DCB_CAP_DCBX_HOST;
 
-   return mode;
+   return priv->dcbx.cap;
 }
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
@@ -312,6 +307,7 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 
mode)
/* set dcbx to fw controlled */
if (!mlx5e_dcbnl_set_dcbx_mode(priv, 
MLX5E_DCBX_PARAM_VER_OPER_AUTO)) {
dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_AUTO;
+   dcbx->cap &= ~DCB_CAP_DCBX_HOST;
return 0;
}
 
@@ -324,6 +320,8 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 
mode)
if (mlx5e_dcbnl_switch_to_host_mode(netdev_priv(dev)))
return 1;
 
+   dcbx->cap = mode;
+
return 0;
 }
 
@@ -628,9 +626,9 @@ static u8 mlx5e_dcbnl_getcap(struct net_device *netdev,
*cap = false;
break;
case DCB_CAP_ATTR_DCBX:
-   *cap = (DCB_CAP_DCBX_LLD_MANAGED |
-   DCB_CAP_DCBX_VER_CEE |
-   DCB_CAP_DCBX_STATIC);
+   *cap = priv->dcbx.cap |
+  DCB_CAP_DCBX_VER_CEE |
+  DCB_CAP_DCBX_VER_IEEE;
break;
default:
*cap = 0;
@@ -760,5 +758,10 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
if (MLX5_CAP_GEN(priv->mdev, dcbx))
mlx5e_dcbnl_query_dcbx_mode(priv, >mode);
 
+   priv->dcbx.cap = DCB_CAP_DCBX_VER_CEE |
+DCB_CAP_DCBX_VER_IEEE;
+   if (priv->dcbx.mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
+
mlx5e_ets_init(priv);
 }
-- 
2.13.0

[net 09/11] net/mlx5: E-Switch, Unload the representors in the correct order

2017-08-30 Thread Saeed Mahameed

From: Shahar Klein 

When changing from switchdev to legacy mode, all the representor port
devices (uplink nic and reps) are cleaned up. Part of this cleaning
process is removing the neigh entries and the hash table containing them.
However, a representor neigh entry might be linked to the uplink port
hash table and if the uplink nic is cleaned first the cleaning of the
representor will end up in null deref.
Fix that by unloading the representors in the opposite order of load.

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Shahar Klein 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 95b64025ce36..5bc0593bd76e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -815,7 +815,7 @@ void esw_offloads_cleanup(struct mlx5_eswitch *esw, int 
nvports)
struct mlx5_eswitch_rep *rep;
int vport;
 
-   for (vport = 0; vport < nvports; vport++) {
+   for (vport = nvports - 1; vport >= 0; vport--) {
rep = >offloads.vport_reps[vport];
if (!rep->valid)
continue;
-- 
2.13.0

[pull request][net 00/11] Mellanox, mlx5 fixes 2017-08-30

2017-08-30 Thread Saeed Mahameed

Hi Dave,

This series contains some misc fixes to the mlx5 driver.

Please pull and let me know if there's any problem.

For -stable:   

Kernels >= 4.12
net/mlx5e: Fix CQ moderation mode not set properly
net/mlx5e: Don't override user RSS upon set channels

Kernels >= 4.11
net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source 
address 

Kernels >= 4.10
net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap
net/mlx5e: Check for qos capability in dcbnl_initialize

Kernels >= 4.9
net/mlx5e: Fix dangling page pointer on DMA mapping error

Kernels >= 4.8
net/mlx5e: Fix inline header size for small packets
net/mlx5: E-Switch, Unload the representors in the correct order
net/mlx5: Fix arm SRQ command for ISSI version 0


Thanks,
Saeed.

---

The following changes since commit 183db481279437590f75a8a0479d512e5dd597de:

  drivers: net: xgene: Correct probe sequence handling (2017-08-29 16:13:08 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2017-08-30

for you to fetch changes up to 1213ad28f9595a08e3877248bbba1a25c40225d6:

  net/mlx5e: Fix CQ moderation mode not set properly (2017-08-30 21:20:43 +0300)


mlx5-fixes-2017-08-30


Eran Ben Elisha (1):
  net/mlx5e: Fix dangling page pointer on DMA mapping error

Huy Nguyen (4):
  net/mlx5e: Check for qos capability in dcbnl_initialize
  net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap.
  net/mlx5: Skip mlx5_unload_one if mlx5_load_one fails
  net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWN

Inbar Karmy (1):
  net/mlx5e: Don't override user RSS upon set channels

Moshe Shemesh (1):
  net/mlx5e: Fix inline header size for small packets

Noa Osherovich (1):
  net/mlx5: Fix arm SRQ command for ISSI version 0

Paul Blakey (1):
  net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address

Shahar Klein (1):
  net/mlx5: E-Switch, Unload the representors in the correct order

Tal Gilboa (1):
  net/mlx5e: Fix CQ moderation mode not set properly

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 24 ++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  8 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 17 ---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/srq.c  | 12 +--
 include/linux/mlx5/driver.h|  4 +---
 11 files changed, 44 insertions(+), 39 deletions(-)

[net 06/11] net/mlx5e: Fix dangling page pointer on DMA mapping error

2017-08-30 Thread Saeed Mahameed

From: Eran Ben Elisha 

Function mlx5e_dealloc_rx_wqe is using page pointer value as an
indication to valid DMA mapping. In case that the mapping failed, we
released the page but kept the dangling pointer. Store the page pointer
only after the DMA mapping passed to avoid invalid page DMA unmap.

Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi 
packet WQE")
Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 325b2c8c1c6d..7344433259fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -222,13 +222,13 @@ static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq 
*rq,
if (unlikely(!page))
return -ENOMEM;
 
-   dma_info->page = page;
dma_info->addr = dma_map_page(rq->pdev, page, 0,
  RQ_PAGE_SIZE(rq), rq->buff.map_dir);
if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) {
put_page(page);
return -ENOMEM;
}
+   dma_info->page = page;
 
return 0;
 }
-- 
2.13.0

[net 08/11] net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address

2017-08-30 Thread Saeed Mahameed

From: Paul Blakey 

Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to
resolve it as part of the route lookup. The resulting encap header
is left with a zeroed out ipv6 src address so the packets are sent
with this src ip.

Use an appropriate route lookup API that also resolves the source
ipv6 address if it's not supplied.

Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for 
IPv6 tunnels')
Signed-off-by: Paul Blakey 
Reviewed-by: Or Gerlitz 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3c536f560dd2..7f282e8f4e7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1443,12 +1443,10 @@ static int mlx5e_route_lookup_ipv6(struct mlx5e_priv 
*priv,
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
int ret;
 
-   dst = ip6_route_output(dev_net(mirred_dev), NULL, fl6);
-   ret = dst->error;
-   if (ret) {
-   dst_release(dst);
+   ret = ipv6_stub->ipv6_dst_lookup(dev_net(mirred_dev), NULL, ,
+fl6);
+   if (ret < 0)
return ret;
-   }
 
*out_ttl = ip6_dst_hoplimit(dst);
 
-- 
2.13.0

[net 11/11] net/mlx5e: Fix CQ moderation mode not set properly

2017-08-30 Thread Saeed Mahameed

From: Tal Gilboa 

cq_period_mode assignment was mistakenly removed so it was always set to "0",
which is EQE based moderation, regardless of the device CAPs and
requested value in ethtool.

Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params")
Signed-off-by: Tal Gilboa 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 57f31fa478ce..6ad7f07e7861 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1969,6 +1969,7 @@ static void mlx5e_build_rx_cq_param(struct mlx5e_priv 
*priv,
}
 
mlx5e_build_common_cq_param(priv, param);
+   param->cq_period_mode = params->rx_cq_period_mode;
 }
 
 static void mlx5e_build_tx_cq_param(struct mlx5e_priv *priv,
-- 
2.13.0

[net 05/11] net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWN

2017-08-30 Thread Saeed Mahameed

From: Huy Nguyen 

MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code.

Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen 
Reviewed-by: Daniel Jurgens 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 --
 include/linux/mlx5/driver.h| 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 4cdb414aa2d5..16885827367b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1563,8 +1563,6 @@ static void shutdown(struct pci_dev *pdev)
int err;
 
dev_info(>dev, "Shutdown was called\n");
-   /* Notify mlx5 clients that the kernel is being shut down */
-   set_bit(MLX5_INTERFACE_STATE_SHUTDOWN, >intf_state);
err = mlx5_try_fast_unload(dev);
if (err)
mlx5_unload_one(dev, priv, false);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 918f5e644506..205d82d4c468 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -674,7 +674,6 @@ enum mlx5_device_state {
 
 enum mlx5_interface_state {
MLX5_INTERFACE_STATE_UP = BIT(0),
-   MLX5_INTERFACE_STATE_SHUTDOWN = BIT(1),
 };
 
 enum mlx5_pci_status {
-- 
2.13.0

[net 10/11] net/mlx5e: Fix inline header size for small packets

2017-08-30 Thread Saeed Mahameed

From: Moshe Shemesh 

Fix inline header size, make sure it is not greater than skb len.
This bug effects small packets, for example L2 packets with size < 18.

Fixes: ae76715d153e ("net/mlx5e: Check the minimum inline header mode before 
xmit")
Signed-off-by: Moshe Shemesh 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index aaa0f4ebba9a..31353e5c3c78 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -128,10 +128,10 @@ static inline int mlx5e_skb_l3_header_offset(struct 
sk_buff *skb)
return mlx5e_skb_l2_header_offset(skb);
 }
 
-static inline unsigned int mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
-struct sk_buff *skb)
+static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
+   struct sk_buff *skb)
 {
-   int hlen;
+   u16 hlen;
 
switch (mode) {
case MLX5_INLINE_MODE_NONE:
@@ -140,19 +140,22 @@ static inline unsigned int mlx5e_calc_min_inline(enum 
mlx5_inline_modes mode,
hlen = eth_get_headlen(skb->data, skb_headlen(skb));
if (hlen == ETH_HLEN && !skb_vlan_tag_present(skb))
hlen += VLAN_HLEN;
-   return hlen;
+   break;
case MLX5_INLINE_MODE_IP:
/* When transport header is set to zero, it means no transport
 * header. When transport header is set to 0xff's, it means
 * transport header wasn't set.
 */
-   if (skb_transport_offset(skb))
-   return mlx5e_skb_l3_header_offset(skb);
+   if (skb_transport_offset(skb)) {
+   hlen = mlx5e_skb_l3_header_offset(skb);
+   break;
+   }
/* fall through */
case MLX5_INLINE_MODE_L2:
default:
-   return mlx5e_skb_l2_header_offset(skb);
+   hlen = mlx5e_skb_l2_header_offset(skb);
}
+   return min_t(u16, hlen, skb->len);
 }
 
 static inline void mlx5e_tx_skb_pull_inline(unsigned char **skb_data,
-- 
2.13.0

[net 07/11] net/mlx5e: Don't override user RSS upon set channels

2017-08-30 Thread Saeed Mahameed

From: Inbar Karmy 

Currently, increasing the number of combined channels is changing
the RSS spread to use the new created channels.
Prevent the RSS spread change in case the user explicitly declare it,
to avoid overriding user configuration.

Tested:
when RSS default:

# ethtool -L ens8 combined 4
RSS spread will change and point to 4 channels.

# ethtool -X ens8 equal 4
# ethtool -L ens8 combined 6
RSS will not change after increasing the number of the channels.

Fixes: 8bf368620486 ('ethtool: ensure channel counts are within bounds during 
SCHANNELS')
Signed-off-by: Inbar Karmy 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 917fade5f5d5..f5594014715b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -641,8 +641,10 @@ int mlx5e_ethtool_set_channels(struct mlx5e_priv *priv,
 
new_channels.params = priv->channels.params;
new_channels.params.num_channels = count;
-   mlx5e_build_default_indir_rqt(priv->mdev, 
new_channels.params.indirection_rqt,
- MLX5E_INDIR_RQT_SIZE, count);
+   if (!netif_is_rxfh_configured(priv->netdev))
+   mlx5e_build_default_indir_rqt(priv->mdev,
+ 
new_channels.params.indirection_rqt,
+ MLX5E_INDIR_RQT_SIZE, count);
 
if (!test_bit(MLX5E_STATE_OPENED, >state)) {
priv->channels.params = new_channels.params;
-- 
2.13.0

[net 04/11] net/mlx5: Skip mlx5_unload_one if mlx5_load_one fails

2017-08-30 Thread Saeed Mahameed

From: Huy Nguyen 

There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.

The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.

Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen 
Reviewed-by: Daniel Jurgens 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +---
 include/linux/mlx5/driver.h| 5 ++---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index c065132b956d..4cdb414aa2d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1186,7 +1186,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
}
}
 
-   clear_bit(MLX5_INTERFACE_STATE_DOWN, >intf_state);
set_bit(MLX5_INTERFACE_STATE_UP, >intf_state);
 out:
mutex_unlock(>intf_state_mutex);
@@ -1261,7 +1260,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
mlx5_drain_health_recovery(dev);
 
mutex_lock(>intf_state_mutex);
-   if (test_bit(MLX5_INTERFACE_STATE_DOWN, >intf_state)) {
+   if (!test_bit(MLX5_INTERFACE_STATE_UP, >intf_state)) {
dev_warn(>pdev->dev, "%s: interface is down, NOP\n",
 __func__);
if (cleanup)
@@ -1270,7 +1269,6 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
}
 
clear_bit(MLX5_INTERFACE_STATE_UP, >intf_state);
-   set_bit(MLX5_INTERFACE_STATE_DOWN, >intf_state);
 
if (mlx5_device_registered(dev))
mlx5_detach_device(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index df6ce59a1f95..918f5e644506 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -673,9 +673,8 @@ enum mlx5_device_state {
 };
 
 enum mlx5_interface_state {
-   MLX5_INTERFACE_STATE_DOWN = BIT(0),
-   MLX5_INTERFACE_STATE_UP = BIT(1),
-   MLX5_INTERFACE_STATE_SHUTDOWN = BIT(2),
+   MLX5_INTERFACE_STATE_UP = BIT(0),
+   MLX5_INTERFACE_STATE_SHUTDOWN = BIT(1),
 };
 
 enum mlx5_pci_status {
-- 
2.13.0

[net 01/11] net/mlx5e: Check for qos capability in dcbnl_initialize

2017-08-30 Thread Saeed Mahameed

From: Huy Nguyen 

qos capability is the master capability bit that determines
if the DCBX is supported for the PCI function. If this bit is off,
driver cannot run any dcbx code.

Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX")
Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 2eb54d36e16e..810b51029c7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -754,6 +754,9 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
 {
struct mlx5e_dcbx *dcbx = >dcbx;
 
+   if (!MLX5_CAP_GEN(priv->mdev, qos))
+   return;
+
if (MLX5_CAP_GEN(priv->mdev, dcbx))
mlx5e_dcbnl_query_dcbx_mode(priv, >mode);
 
-- 
2.13.0

Re: [PATCH] net: bcm63xx_enet: make bcm_enetsw_ethtool_ops const

2017-08-30 Thread David Miller

From: Bhumika Goyal 
Date: Wed, 30 Aug 2017 14:55:08 +0530

> Make this const as it is never modified.
> 
> Signed-off-by: Bhumika Goyal 

Applied to net-next.

Re: [PATCH v2] ipv6: sr: fix get_srh() to comply with IPv6 standard "RFC 8200"

2017-08-30 Thread David Miller

From: Ahmed Abdelsalam 
Date: Wed, 30 Aug 2017 10:50:37 +0200

> IPv6 packet may carry more than one extension header, and IPv6 nodes must
> accept and attempt to process extension headers in any order and occurring
> any number of times in the same packet. Hence, there should be no
> assumption that Segment Routing extension header is to appear immediately
> after the IPv6 header.
> 
> Moreover, section 4.1 of RFC 8200 gives a recommendation on the order of
> appearance of those extension headers within an IPv6 packet. According to
> this recommendation, Segment Routing extension header should appear after
> Hop-by-Hop and Destination Options headers (if they present).
> 
> This patch fixes the get_srh(), so it gets the segment routing header
> regardless of its position in the chain of the extension headers in IPv6
> packet, and makes sure that the IPv6 routing extension header is of Type 4.
> 
> Signed-off-by: Ahmed Abdelsalam 

Applied.

Re: [PATCH net-next v4 00/13] net: mvpp2: comphy configuration

2017-08-30 Thread David Miller

From: Antoine Tenart 
Date: Wed, 30 Aug 2017 10:29:11 +0200

> This series, following up the one one the GoP/MAC configuration, aims at
> stopping to depend on the firmware/bootloader configuration when using
> the PPv2 engine. With this series the PPv2 driver does not need to rely
> on a previous configuration, and dynamic reconfiguration while the
> kernel is running can be done (i.e. switch one port from SGMII to 10G,
> or the opposite). A port can now be configured in a different mode than
> what's done in the firmware/bootloader as well.
> 
> The series first contain patches in the generic PHY framework to support
> what is called the comphy (common PHYs), which is an h/w block providing
> PHYs that can be configured in various modes ranging from SGMII, 10G
> to SATA and others. As of now only the SGMII and 10G modes are
> supported by the comphy driver.
> 
> Then patches are modifying the PPv2 driver to first add the comphy
> initialization sequence (i.e. calls to the generic PHY framework) and to
> then take advantage of this to allow dynamic reconfiguration (i.e.
> configuring the mode of a port given what's connected, between sgmii and
> 10G). Note the use of the comphy in the PPv2 driver is kept optional
> (i.e. if not described in dt the driver still as before an relies on the
> firmware/bootloader configuration).
> 
> Finally there are dt/defconfig patches to describe and take advantage of
> this.
> 
> This was tested on a range of devices: 8040-db, 8040-mcbin and 7040-db.
> 
> @Dave: the dt patches should go through the mvebu tree (patches 9-13).

Ok, patches 1-8 applied to net-next, thanks!

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Andrew Lunn

On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
> Greetings,
> 
> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).

Hi Tim

Can you confirm the counter is this one:

   /* Report late collisions as a frame error. */
if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
ndev->stats.rx_frame_errors++;

I don't see anywhere else frame errors are counted, but it would be
good to prove we are looking in the right place.

 Andrew

Re: [Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Cong Wang

On Wed, Aug 30, 2017 at 2:48 PM, Daniel Borkmann  wrote:
> On 08/30/2017 11:30 PM, Cong Wang wrote:
> [...]
>>
>> Note, we still can NOT totally get rid of those class lookup in
>> ->enqueue() because cgroup and flow filters have no way to determine
>> the classid at setup time, they still have to go through dynamic lookup.
>
> [...]
>>
>> ---
>>   include/net/sch_generic.h |  1 +
>>   net/sched/cls_basic.c |  9 +++
>>   net/sched/cls_bpf.c   |  9 +++
>
>
> Same is for cls_bpf as well, so bind_class wouldn't work there
> either as we could return dynamic classids. bind_class cannot
> be added here, too.

I think you are probably right, but the following code is
misleading there:

if (tb[TCA_BPF_CLASSID]) {
prog->res.classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
tcf_bind_filter(tp, >res, base);
}

If the classid is dynamic, why this tb[TCA_BPF_CLASSID]?

Re: [PATCH] DSA support for Micrel KSZ8895

2017-08-30 Thread Andrew Lunn

> The KSZ8795 driver will be submitted after Labor Day (9/4) if
> testing reveals no problem.  The KSZ8895 driver will be submitted
> right after that.  You should have no problem using the driver right
> away.

Hi Tristram

Release early, release often. It stops people wasting time

Also, we are likely to give you feedback, asking you to make
changes. Testing is important, but you are probably going to have to
do it a number of times. So it is not everything passes now, don't
worry, you will have time to fix things up as you go through review
cycles.

Andrew

Re: [PATCH net-next] dp83640: don't hold spinlock while calling netif_rx_ni

2017-08-30 Thread David Miller

From: Stefan Sørensen 
Date: Wed, 30 Aug 2017 08:58:47 +0200

> We should not hold a spinlock while pushing the skb into the networking
> stack, so move the call to netif_rx_ni out of the critical region to where
> we have dropped the spinlock.
> 
> Signed-off-by: Stefan Sørensen 

Looks good, applied, thanks.

Re: [Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Daniel Borkmann


On 08/30/2017 11:30 PM, Cong Wang wrote:
[...]

Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.

[...]

---
  include/net/sch_generic.h |  1 +
  net/sched/cls_basic.c |  9 +++
  net/sched/cls_bpf.c   |  9 +++


Same is for cls_bpf as well, so bind_class wouldn't work there
either as we could return dynamic classids. bind_class cannot
be added here, too.


  net/sched/cls_flower.c|  9 +++
  net/sched/cls_fw.c|  9 +++
  net/sched/cls_matchall.c  |  9 +++
  net/sched/cls_route.c |  9 +++
  net/sched/cls_rsvp.h  |  9 +++
  net/sched/cls_tcindex.c   |  9 +++
  net/sched/cls_u32.c   |  9 +++
  net/sched/sch_api.c   | 68 +--
  11 files changed, 148 insertions(+), 2 deletions(-)

Re: [PATCH net-next] net: cpsw: Don't handle SIOC[GS]HWTSTAMP when CPTS is disabled

2017-08-30 Thread David Miller

From: Stefan Sørensen 
Date: Wed, 30 Aug 2017 08:50:55 +0200

> There is no reason to handle SIOC[GS]HWTSTAMP and return -EOPNOTSUPP when
> CPTS is disabled, so just pass them on to the phy. This will allow PTP
> timestamping on a capable phy by disabling CPTS.
> 
> Signed-off-by: Stefan Sørensen 

It should not be required to disable a Kconfig option just to get PHY
timestamping to work properly.

Rather, if the CPTS code returns -EOPNOTSUPP we should try to
fallthrough to the PHY library based methods.

Thanks.

Re: [PATCH net-next 3/3 v11] drivers: net: ethernet: qualcomm: rmnet: Initial implementation

2017-08-30 Thread Subash Abhinov Kasiviswanathan


Subash, keep in mind that since I applied your v11 patches already you
will need to send me relative fixes and changes at this point, rather
than resubmit the series.

Thank you.


Thanks for the heads up David. Will do.
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

Re: [patch net-next v2 0/3] net/sched: Improve getting objects by indexes

2017-08-30 Thread David Miller

From: Chris Mi 
Date: Wed, 30 Aug 2017 02:31:56 -0400

> Using current TC code, it is very slow to insert a lot of rules.
> 
> In order to improve the rules update rate in TC,
> we introduced the following two changes:
> 1) changed cls_flower to use IDR to manage the filters.
> 2) changed all act_xxx modules to use IDR instead of
>a small hash table
> 
> But IDR has a limitation that it uses int. TC handle uses u32.
> To make sure there is no regression, we add several new IDR APIs
> to support unsigned long.
> 
> v2
> ==
> 
> Addressed Hannes's comment:
> express idr_alloc in terms of idr_alloc_ext and most of the other functions

Series applied, thanks.

Re: [PATCH net 9/9] sch_tbf: fix two null pointer dereferences on init failure

2017-08-30 Thread Nikolay Aleksandrov

On 30/08/17 20:37, Cong Wang wrote:
> On Wed, Aug 30, 2017 at 2:49 AM, Nikolay Aleksandrov
>  wrote:
>> Reproduce:
>> $ sysctl net.core.default_qdisc=tbf
>> $ ip l set ethX up
> 
> I once upon a time had a patch to disallow those qdisc's
> to be default. Probably I should resend it.
> 

That sounds good. A lot of them can't be default anyway, they need
some options to be set, so it's definitely worth looking into.

Re: [PATCH net 6/9] sch_fq_codel: avoid double free on init failure

2017-08-30 Thread Nikolay Aleksandrov

On 30/08/17 20:36, Cong Wang wrote:
> On Wed, Aug 30, 2017 at 2:49 AM, Nikolay Aleksandrov
>  wrote:
>> It is very unlikely to happen but the backlogs memory allocation
>> could fail and will free q->flows, but then ->destroy() will free
>> q->flows too. For correctness remove the first free and let ->destroy
>> clean up.
>>
>> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
>>  net/sched/sch_fq_codel.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
>> index 337f2d6d81e4..2c0c05f2cc34 100644
>> --- a/net/sched/sch_fq_codel.c
>> +++ b/net/sched/sch_fq_codel.c
>> @@ -491,10 +491,8 @@ static int fq_codel_init(struct Qdisc *sch, struct 
>> nlattr *opt)
>> if (!q->flows)
>> return -ENOMEM;
>> q->backlogs = kvzalloc(q->flows_cnt * sizeof(u32), 
>> GFP_KERNEL);
>> -   if (!q->backlogs) {
>> -   kvfree(q->flows);
>> +   if (!q->backlogs)
>> return -ENOMEM;
>> -   }
> 
> This is fine. Or we can NULL it after kvfree().
> 
> I have no preference here. The only difference here is if we still
> expect ->init() to cleanup its own failure.
> 

We don't, that's the point of the changes that lead to these fixes,
the way ->destroy() is used by both the default qdisc infra and the
normal qdisc add suggest that it should clean up after ->init failure,
thus the change.

Re: [PATCH net 0/9] net/sched: init failure fixes

2017-08-30 Thread Nikolay Aleksandrov

On 30/08/17 15:15, Jamal Hadi Salim wrote:
> On 17-08-30 05:48 AM, Nikolay Aleksandrov wrote:
>> Hi all,
>> I went over all qdiscs' init, destroy and reset callbacks and found the
>> issues fixed in each patch. Mostly they are null pointer dereferences due
>> to uninitialized timer (qdisc watchdog) or double frees due to ->destroy
>> cleaning up a second time. There's more information in each patch.
>> I've tested these by either sending wrong attributes from user-spaces, no
>> attributes or by simulating memory alloc failure where applicable. Also
>> tried all of the qdiscs as a default qdisc.
>>
>> Most of these bugs were present before commit 87b60cfacf9f, I've tried to
>> include proper fixes tags in each patch.
>>
>> I haven't included individual patch acks in the set, I'd appreciate it if
>> you take another look and resend them.
>>
> 
> 
> Hi Nik,
> 
> For all patches:
> 
> Acked-by: Jamal Hadi Salim 
> 
> Would you please consider adding all the the tests
> you used to create the oopses in selftests? It will ensure this
> embarassing bugs get caught should they ever happen again.
> If you need help ping Lucas on Cc.
> 
> cheers,
> jamal

Hi,
Sure, I'll make the tests and send patches for tc selftests, using the infra at
tools/testing/selftests/tc-testing.

Thanks!

RE: [PATCH] DSA support for Micrel KSZ8895

2017-08-30 Thread Tristram.Ha

> On Mon 2017-08-28 16:09:27, Andrew Lunn wrote:
> > > I may be confused here, but AFAICT:
> > >
> > > 1) Yes, it has standard layout when accessed over MDIO.
> >
> >
> > Section 4.8 of the datasheet says:
> >
> > All the registers defined in this section can be also accessed
> > via the SPI interface.
> >
> > Meaning all PHY registers can be access via the SPI interface. So you
> > should be able to make a standard Linux MDIO bus driver which performs
> > SPI reads.
> 
> As far as I can tell (and their driver confirms) -- yes, all those registers 
> can be
> accessed over the SPI, they are just shuffled around... hence MDIO
> emulation code. I copied it from their code (see the copyrights) so no, I 
> don't
> believe there's nicer solution.
> 
> Best regards,
> 
>   Pavel

Can you hold on your developing work on KSZ8895 driver?  I am afraid your 
effort may be in vain.  We at Microchip are planning to release DSA drivers for 
all KSZ switches, starting at KSZ8795, then KSZ8895, and KSZ8863.

The driver files all follow the structures of the current KSZ9477 DSA driver, 
and the file tag_ksz.c will be updated to handle the tail tag of different 
chips, which requires including the ksz_priv.h header.  That is required 
nevertheless to support using the offload_fwd_mark indication.

The KSZ8795 driver will be submitted after Labor Day (9/4) if testing reveals 
no problem.  The KSZ8895 driver will be submitted right after that.  You should 
have no problem using the driver right away.

Tristram Ha
Principal Software Engineer
Microchip Technology Inc.

[Patch net-next] net_sched: add reverse binding for tc class

2017-08-30 Thread Cong Wang

TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:

1. If we add tc classes before its filters, everything is fine.
   Logically, the classes exist before we specify their ID's in
   filters, it is easy to bind them together, just as in the current
   code base.

2. If we add tc filters before the tc classes they bind, we have to
   do dynamic lookup in fast path. What's worse, this happens all
   the time not just once, because on fast path tcf_result is passed
   on stack, there is no way to propagate back to the one in tc filters.

This hidden difference hurts performance silently if we have many tc
classes in hierarchy.

This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.

Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_basic.c |  9 +++
 net/sched/cls_bpf.c   |  9 +++
 net/sched/cls_flower.c|  9 +++
 net/sched/cls_fw.c|  9 +++
 net/sched/cls_matchall.c  |  9 +++
 net/sched/cls_route.c |  9 +++
 net/sched/cls_rsvp.h  |  9 +++
 net/sched/cls_tcindex.c   |  9 +++
 net/sched/cls_u32.c   |  9 +++
 net/sched/sch_api.c   | 68 +--
 11 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c30b634c5f82..d6247a3c40df 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -217,6 +217,7 @@ struct tcf_proto_ops {
void **, bool);
int (*delete)(struct tcf_proto*, void *, bool*);
void(*walk)(struct tcf_proto*, struct tcf_walker 
*arg);
+   void(*bind_class)(void *, u32, unsigned long);
 
/* rtnetlink specific */
int (*dump)(struct net*, struct tcf_proto*, void *,
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 73cc7f167a38..d89ebafd2239 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -235,6 +235,14 @@ static void basic_walk(struct tcf_proto *tp, struct 
tcf_walker *arg)
}
 }
 
+static void basic_bind_class(void *fh, u32 classid, unsigned long cl)
+{
+   struct basic_filter *f = fh;
+
+   if (f && f->res.classid == classid)
+   f->res.class = cl;
+}
+
 static int basic_dump(struct net *net, struct tcf_proto *tp, void *fh,
  struct sk_buff *skb, struct tcmsg *t)
 {
@@ -280,6 +288,7 @@ static struct tcf_proto_ops cls_basic_ops __read_mostly = {
.delete =   basic_delete,
.walk   =   basic_walk,
.dump   =   basic_dump,
+   .bind_class =   basic_bind_class,
.owner  =   THIS_MODULE,
 };
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6f2dffe30f25..520c5027646a 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -607,6 +607,14 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto 
*tp, void *fh,
return -1;
 }
 
+static void cls_bpf_bind_class(void *fh, u32 classid, unsigned long cl)
+{
+   struct cls_bpf_prog *prog = fh;
+
+   if (prog && prog->res.classid == classid)
+   prog->res.class = cl;
+}
+
 static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
struct cls_bpf_head *head = rtnl_dereference(tp->root);
@@ -635,6 +643,7 @@ static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
.delete =   cls_bpf_delete,
.walk   =   cls_bpf_walk,
.dump   =   cls_bpf_dump,
+   .bind_class =   cls_bpf_bind_class,
 };
 
 static int __init cls_bpf_init_mod(void)
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index bd9dab41f8af..23832d8862c0 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1360,6 +1360,14 @@ static int fl_dump(struct net *net, struct tcf_proto 
*tp, void *fh,
return -1;
 }
 
+static void fl_bind_class(void *fh, u32 classid, unsigned long cl)
+{
+   struct cls_fl_filter *f = fh;
+
+   if (f && f->res.classid == classid)
+   f->res.class = cl;
+}
+
 static struct tcf_proto_ops cls_fl_ops __read_mostly = {
.kind   = "flower",
.classify   = fl_classify,
@@ -1370,6 +1378,7 @@ static struct tcf_proto_ops

Re: [PATCH net-next 3/3 v11] drivers: net: ethernet: qualcomm: rmnet: Initial implementation

2017-08-30 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Wed, 30 Aug 2017 15:19:19 -0600

> Sure, I'll implement this. Let me know if you have more comments.

Subash, keep in mind that since I applied your v11 patches already you
will need to send me relative fixes and changes at this point, rather
than resubmit the series.

Thank you.

Re: [PATCH net-next 3/3 v11] drivers: net: ethernet: qualcomm: rmnet: Initial implementation

2017-08-30 Thread Subash Abhinov Kasiviswanathan


General comment; other drivers that do similar things (macvlan, ipvlan)
use the term "port" to refer to what I think you're calling a
"rmnet_real_dev_info".  Maybe that's a shorter or less confusing term.
Could be renamed later too, if you wanted to do so.



Hi Dan

I'll rename it to rmnet_port.


Maybe this got elided during the revisions, but now I can't find
anywhere that sets RMNET_LOCAL_LOGICAL_ENDPOINT.  Looking at the
callchain, there are two places that LOCAL_LOGICAL_ENDPOINT matters:

rmnet_get_endpoint(): only ever called by __rmnet_set_endpoint_config()

__rmnet_set_endpoint_config(): only called from
rmnet_set_endpoint_config(); which itself is only called from
rmnet_newlink().

So the only place that 'config_id' is set, and thus that it could be
LOCAL_LOGICAL_ENDPOINT, is rmnet_newlink() via 'mux_id'.  But
IFLA_VLAN_ID is a u16, and so I don't see anywhere that
config_id/mux_id will ever be < 0, and thus anywhere that it could be
LOCAL_LOGICAL_ENDPOINT.

I could well just not be seeing it though...

This function (__rmnet_set_endpoint_config) seems to only be called
from rmnet_set_endpoint_config().  Perhaps just combine them?

But that brings up another point; can the rmnet "mode" or egress_dev
change at runtime, after the rmnet child has been created?  I forget if
that was possible with your original patchset that used ioctls.



The original series with IOCTL was able to change it.
With the current netlink based configuration, we are using a fixed 
config
of muxing and the egress dev is fixed for its lifetime. Practically, 
these

should never change for a set of rmnet devices attached to a real dev.
I will remove LOCAL_LOGICAL_ENDPOINT since it is unused.


Why not set the mux_id in rmnet_vnd_newlink()?

Also, bigger problem.  r->rmnet_devices[] is only 32 items in size.
But mux_id (which is used as an index into rmnet_devices in a few
places) can be up to 255 (RMNET_MAX_LOGICAL_EP).

So if you try to create an rmnet for mux ID 32, you panic the kernel.
See below my comments about rmnet_real_dev_info...



I'll fix this.


I can't see anywhere that the egress/ingress data get set except for
this function, so perhaps you could just skip these functions and
(since you already have 'r' from above) set r-

[egress|ingress]_data_format directly?




Yes, till this is made configurable, this need not be set separately.


This means that the first time you add an rmnet dev to a netdev, it'll
create a structure that's quite large (at least 255 * 6, but more due
to padding), when in most cases few of these items will be used.  Most
of the time you'd have only a couple PDNs active, but this will
allocate memory for MAX_LOGICAL_EP of them, no?

ipvlan uses a list to track the child devices attached to a physical
device so that it doesn't have to allocate them all at once and waste
memory; that technique could replace the 'rmnet_devices' member below.

It also uses a hash to find the actual ipvlan upperdev from the
rx_handler of the lowerdev, which is probably what would replace
muxed_ep[] here.

Is the relationship between rmnet "child"/upper devs and mux_ids 1:1?
Or can you have multiple rmnet devs for the same mux_id?

Dan


We can have multiple rmnet devices having the same mux_id. They will
need to be attached to different real_dev though. I'll look into the
creation of hash for the lookup. Once I have the hash up, I should
be able to get rid of some of the structures.

The other main functionality which I am unsure is the
bridge handling - passing on MAP data from one real_dev to another.
Is there some to achieve this using any existing netlink attributes?
Any suggestions would be appreciated.


Please implement ndo_get_iflink as well, so that it's easy to find out
what the "parent"/lowerdev for a given rmnet interface is.

That might mean adding a "phy_dev" member to rmnet_priv, but that might
help you clean up a lot of other stuff too



Sure, I'll implement this. Let me know if you have more comments.

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

[PATCH net-next] hv_netvsc: Fix typos in the document of UDP hashing

2017-08-30 Thread Haiyang Zhang

From: Haiyang Zhang 

There are two typos in the document, netvsc.txt,
regarding UDP hashing level. This patch fixes them.

Signed-off-by: Haiyang Zhang 
---
 Documentation/networking/netvsc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/netvsc.txt 
b/Documentation/networking/netvsc.txt
index fa8d86356791..93560fb1170a 100644
--- a/Documentation/networking/netvsc.txt
+++ b/Documentation/networking/netvsc.txt
@@ -32,9 +32,9 @@ Features
   hashing. Using L3 hashing is recommended in this case.
 
   For example, for UDP over IPv4 on eth0:
-  To include UDP port numbers in hasing:
+  To include UDP port numbers in hashing:
 ethtool -N eth0 rx-flow-hash udp4 sdfn
-  To exclude UDP port numbers in hasing:
+  To exclude UDP port numbers in hashing:
 ethtool -N eth0 rx-flow-hash udp4 sd
   To show UDP hash level:
 ethtool -n eth0 rx-flow-hash udp4
-- 
2.14.1

Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Daniel Borkmann


On 08/30/2017 10:13 PM, Shuah Khan wrote:

On 08/30/2017 12:47 PM, Daniel Borkmann wrote:

On 08/30/2017 07:15 PM, Colin King wrote:

From: Colin Ian King 

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes

Signed-off-by: Colin Ian King 


Acked-by: Daniel Borkmann 


I can get this into 4.14-rc1 unless it should go through net-next
for dependencies. In which case,


Yeah, it does depends on work sitting in net-next so easier
to go that route.

Thanks,
Daniel

Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Shuah Khan

On 08/30/2017 12:47 PM, Daniel Borkmann wrote:
> On 08/30/2017 07:15 PM, Colin King wrote:
>> From: Colin Ian King 
>>
>> Trivial fix to typos in printf error messages:
>> "conenct" -> "connect"
>> "listeen" -> "listen"
>>
>> thanks to Daniel Borkmann for spotting one of these mistakes
>>
>> Signed-off-by: Colin Ian King 
> 
> Acked-by: Daniel Borkmann 
> 
> 

I can get this into 4.14-rc1 unless it should go through net-next
for dependencies. In which case,

Acked-by: Shuah Khan 

thanks,
-- Shuah

DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Tim Harvey

Greetings,

I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).

What I see is the following:
- bring up eth0/lan1
- DHCP ipv4 on lan1
- iperf client to server on network connected to lan1 shows ~150mbps
TX without any errors/overruns/frame but 10 or so dropped
- iperf server with a 100mbps TCP client test shows
- iperf server will hang when connected to from iperf client on lan1
network and I see frame errors from ifconfig:

root@xenial:/# ifconfig lan1
lan1  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
  inet addr:172.24.22.125  Bcast:172.24.255.255  Mask:255.240.0.0
  inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:148 errors:0 dropped:30 overruns:0 frame:0
  TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:8780 (8.5 KiB)  TX bytes:1762 (1.7 KiB)

root@xenial:/# ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
  inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:386 errors:19 dropped:39 overruns:0 frame:57
  TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:39484 (38.5 KiB)  TX bytes:2880 (2.8 KiB)

It doesn't appear that this is a new issue as it exists on also on
older kernels.

Note that the IMX6 has an errata (ERR004512) [2] that limits the
theoretical max performance of the FEC to 470mbps (total TX+RX) and if
the TX and RK peak datarate is higher than ~400mps there is a risk of
ENET RX FIFO overrun but I don't think this is the issue here. It
would be the cause of the relatively low throughput of ~150 TX though
I would assume.

Best Regards,

Tim

[1] - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/imx6qdl-gw5904.dtsi
[2] - http://cache.nxp.com/docs/en/errata/IMX6DQCE.pdf - ERR004512

1 2 3 >

1 - 100 of 268 matches

Mail list logo