date:20161209

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Cong Wang

On Fri, Dec 9, 2016 at 8:13 PM, Cong Wang  wrote:
> On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs  wrote:
>> On 2016-12-08 22:57, Cong Wang wrote:
>>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs  wrote:
>>> > I also tried to extend Cong Wang's idea to attempt to proactively respond 
>>> > to a
>>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking 
>>> > error
>>> > stack dump using mutex_lock(_cmd_mutex) in the notifier callback.
>>> > Eliminating the lock since the sock is dead anways eliminates the error.
>>> >
>>> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll 
>>> > try to
>>> > get the test case to compile.
>>>
>>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 
>>> 'audit_pid'
>>> are updated as a whole and race between audit_receive_msg() and
>>> NETLINK_URELEASE.
>>
>> This is what I expected and why I originally added the mutex lock in the
>> callback...  The dumps I got were bare with no wrapper identifying the
>> process context or specific error, so I'm at a bit of a loss how to
>> solve this (without thinking more about it) other than instinctively
>> removing the mutex.
>
> Netlink notifier can safely be converted to blocking one, I will send
> a patch.
>
> But I seriously doubt you really need NETLINK_URELEASE here,
> it adds nothing but overhead, b/c the netlink notifier is called on
> every netlink socket in the system, but for net exit path, that is
> relatively a slow path.
>
> Also, kauditd_send_skb() needs audit_cmd_mutex too.

Please let me know what you think about the attached patch?

Thanks!
commit a12b43ee814625933ff155c20dc863c59cfcf240
Author: Cong Wang 
Date:   Fri Dec 9 17:56:42 2016 -0800

audit: close a race condition on audit_sock

Signed-off-by: Cong Wang 

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..ab947d8 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -423,6 +423,8 @@ static void kauditd_send_skb(struct sk_buff *skb)
snprintf(s, sizeof(s), "audit_pid=%d reset", 
audit_pid);
audit_log_lost(s);
audit_pid = 0;
+   audit_nlk_portid = 0;
+   sock_put(audit_sock);
audit_sock = NULL;
} else {
pr_warn("re-scheduling(#%d) write to 
audit_pid=%d\n",
@@ -899,6 +901,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change("audit_pid", new_pid, 
audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
+   sock_hold(skb->sk);
+   if (audit_sock)
+   sock_put(audit_sock);
audit_sock = skb->sk;
}
if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
@@ -1167,10 +1172,6 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
-   if (sock == audit_sock) {
-   audit_pid = 0;
-   audit_sock = NULL;
-   }
 
RCU_INIT_POINTER(aunet->nlsk, NULL);
synchronize_net();

[PATCH] net: socket: removed an unnecessary newline

2016-12-09 Thread kushwaha . a

From: Amit Kushwaha 

This patch removes a newline which was added
in socket.c file in net-next

Signed-off-by: Amit Kushwaha 
---
 net/socket.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index 5835383..dc01d7b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1,4 +1,3 @@
-
 /*
  * NET An implementation of the SOCKET network access protocol.
  *
-- 
1.7.9.5

Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)

2016-12-09 Thread Selvin Xavier

On Fri, Dec 9, 2016 at 12:17 PM, Selvin Xavier
 wrote:
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.

Please use bnxt_re branch in this git repository.

https://github.com/Broadcom/linux-rdma-nxt.git

Thanks,
Selvin Xavier

[Patch net-next] netlink: use blocking notifier

2016-12-09 Thread Cong Wang

netlink_chain is called in ->release(), which is apparently
a process context, so we don't have to use an atomic notifier
here.

Signed-off-by: Cong Wang 
---
 net/netlink/af_netlink.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 246f29d..801d474 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -113,7 +113,7 @@ static atomic_t nl_table_users = ATOMIC_INIT(0);
 
 #define nl_deref_protected(X) rcu_dereference_protected(X, 
lockdep_is_held(_table_lock));
 
-static ATOMIC_NOTIFIER_HEAD(netlink_chain);
+static BLOCKING_NOTIFIER_HEAD(netlink_chain);
 
 static DEFINE_SPINLOCK(netlink_tap_lock);
 static struct list_head netlink_tap_all __read_mostly;
@@ -711,7 +711,7 @@ static int netlink_release(struct socket *sock)
.protocol = sk->sk_protocol,
.portid = nlk->portid,
  };
-   atomic_notifier_call_chain(_chain,
+   blocking_notifier_call_chain(_chain,
NETLINK_URELEASE, );
}
 
@@ -2504,13 +2504,13 @@ static const struct file_operations netlink_seq_fops = {
 
 int netlink_register_notifier(struct notifier_block *nb)
 {
-   return atomic_notifier_chain_register(_chain, nb);
+   return blocking_notifier_chain_register(_chain, nb);
 }
 EXPORT_SYMBOL(netlink_register_notifier);
 
 int netlink_unregister_notifier(struct notifier_block *nb)
 {
-   return atomic_notifier_chain_unregister(_chain, nb);
+   return blocking_notifier_chain_unregister(_chain, nb);
 }
 EXPORT_SYMBOL(netlink_unregister_notifier);
 
-- 
2.5.5

[Patch net-next] ipvs: remove an annoying printk in netns init

2016-12-09 Thread Cong Wang

At most it is used for debugging purpose, but I don't think
it is even useful for debugging, just remove it.

Cc: Simon Horman 
Signed-off-by: Cong Wang 
---
 net/netfilter/ipvs/ip_vs_core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 2c1b498..febc7f3 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2231,8 +2231,6 @@ static int __net_init __ip_vs_init(struct net *net)
if (ip_vs_sync_net_init(ipvs) < 0)
goto sync_fail;
 
-   printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
-sizeof(struct netns_ipvs), ipvs->gen);
return 0;
 /*
  * Error handling
-- 
2.5.5

[GIT] Networking

2016-12-09 Thread David Miller


1) Limit the number of can filters to avoid > MAX_ORDER allocations.
   Fix from Marc Kleine-Budde.

2) Limit GSO max size in netvsc driver to avoid problems with
   NVGRE configurations.  From Stephen Hemminger.

3) Return proper error when memory allocation fails in
   ser_gigaset_init(), from Dan Carpenter.

4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
   Feng.

5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers,
   from Florian Fainelli.

6) Handle probe deferral properly in smsc911x driver.

Please pull, thanks a lot!

The following changes since commit bc3913a5378cd0ddefd1dfec6917cc12eb23a946:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2016-12-06 
09:24:11 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to d33695fbfab73a4a6550fa5c2d0bacc68d7c5901:

  net: mlx5: Fix Kconfig help text (2016-12-09 23:08:32 -0500)


Alex (1):
  drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links

Arjun V (1):
  cxgb4/cxgb4vf: Assign netdev->dev_port with port ID

Christopher Covington (1):
  net: mlx5: Fix Kconfig help text

Dan Carpenter (1):
  ser_gigaset: return -ENOMEM on error instead of success

Daniele Palmas (1):
  NET: usb: cdc_mbim: add quirk for supporting Telit LE922A

David S. Miller (3):
  Merge tag 'linux-can-fixes-for-4.9-20161207' of 
git://git.kernel.org/.../mkl/linux-can
  Merge tag 'linux-can-fixes-for-4.9-20161208' of 
git://git.kernel.org/.../mkl/linux-can
  Merge branch 'ethernet-missing-netdev-parent'

Florian Fainelli (3):
  phy: Don't increment MDIO bus refcount unless it's a different owner
  net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
  net: ethernet: cpmac: Call SET_NETDEV_DEV()

Gao Feng (1):
  driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed

Linus Walleij (1):
  net: smsc911x: back out silently on probe deferrals

Marc Kleine-Budde (1):
  can: raw: raw_setsockopt: limit number of can_filter that can be set

Peng Tao (1):
  vhost-vsock: fix orphan connection reset

Thomas Falcon (1):
  ibmveth: set correct gso_size and gso_type

stephen hemminger (1):
  netvsc: reduce maximum GSO size

추지호 (1):
  can: peak: fix bad memory access and free sequence

 drivers/isdn/gigaset/ser-gigaset.c  |  4 +++-
 drivers/net/can/usb/peak_usb/pcan_usb_core.c|  6 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  |  1 -
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c |  1 +
 drivers/net/ethernet/ibm/ibmveth.c  | 65 
+++--
 drivers/net/ethernet/ibm/ibmveth.h  |  1 +
 drivers/net/ethernet/lantiq_etop.c  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |  2 --
 drivers/net/ethernet/smsc/smsc911x.c|  9 -
 drivers/net/ethernet/ti/cpmac.c |  1 +
 drivers/net/ethernet/ti/cpsw-phy-sel.c  |  1 +
 drivers/net/hyperv/netvsc_drv.c |  5 +
 drivers/net/ipvlan/ipvlan_main.c|  4 +++-
 drivers/net/phy/phy_device.c| 16 +---
 drivers/net/usb/cdc_mbim.c  | 21 +
 drivers/net/usb/cdc_ncm.c   | 14 +-
 drivers/vhost/vsock.c   |  2 +-
 include/linux/usb/cdc_ncm.h |  3 ++-
 include/uapi/linux/can.h|  1 +
 net/can/raw.c   |  3 +++
 21 files changed, 142 insertions(+), 20 deletions(-)

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote:

> 
> Hmm... Is your ephemeral port range includes the port your load
> balancing app is using ?

I suspect that you might have processes doing bind( port = 0) that are
trapped into the bind_conflict() scan ?

With 100,000 + timewaits there, this possibly hurts.

Can you try the following loop breaker ?

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 
d5d3ead0a6c31e42e8843d30f8c643324a91b8e9..74f0f5ee6a02c624edb0263b9ddd27813f68d0a5
 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -51,7 +51,7 @@ int inet_csk_bind_conflict(const struct sock *sk,
int reuse = sk->sk_reuse;
int reuseport = sk->sk_reuseport;
kuid_t uid = sock_i_uid((struct sock *)sk);
-
+   unsigned int max_count;
/*
 * Unlike other sk lookup places we do not check
 * for sk_net here, since _all_ the socks listed
@@ -59,6 +59,7 @@ int inet_csk_bind_conflict(const struct sock *sk,
 * one this bucket belongs to.
 */
 
+   max_count = relax ? ~0U : 100;
sk_for_each_bound(sk2, >owners) {
if (sk != sk2 &&
!inet_v6_ipv6only(sk2) &&
@@ -84,6 +85,8 @@ int inet_csk_bind_conflict(const struct sock *sk,
break;
}
}
+   if (--max_count == 0)
+   return 1;
}
return sk2 != NULL;
 }
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 
1c86c478f578b49373e61a4c397f23f3dc7f3fc6..4f63d06e0d601da94eb3f2b35a988abd060e156c
 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -35,12 +35,14 @@ int inet6_csk_bind_conflict(const struct sock *sk,
int reuse = sk->sk_reuse;
int reuseport = sk->sk_reuseport;
kuid_t uid = sock_i_uid((struct sock *)sk);
+   unsigned int max_count;
 
/* We must walk the whole port owner list in this case. -DaveM */
/*
 * See comment in inet_csk_bind_conflict about sock lookup
 * vs net namespaces issues.
 */
+   max_count = relax ? ~0U : 100;
sk_for_each_bound(sk2, >owners) {
if (sk != sk2 &&
(!sk->sk_bound_dev_if ||
@@ -61,6 +63,8 @@ int inet6_csk_bind_conflict(const struct sock *sk,
ipv6_rcv_saddr_equal(sk, sk2, true))
break;
}
+   if (--max_count == 0)
+   return 1;
}
 
return sk2 != NULL;

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Cong Wang

On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs  wrote:
> On 2016-12-08 22:57, Cong Wang wrote:
>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs  wrote:
>> > I also tried to extend Cong Wang's idea to attempt to proactively respond 
>> > to a
>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking 
>> > error
>> > stack dump using mutex_lock(_cmd_mutex) in the notifier callback.
>> > Eliminating the lock since the sock is dead anways eliminates the error.
>> >
>> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll 
>> > try to
>> > get the test case to compile.
>>
>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 
>> 'audit_pid'
>> are updated as a whole and race between audit_receive_msg() and
>> NETLINK_URELEASE.
>
> This is what I expected and why I originally added the mutex lock in the
> callback...  The dumps I got were bare with no wrapper identifying the
> process context or specific error, so I'm at a bit of a loss how to
> solve this (without thinking more about it) other than instinctively
> removing the mutex.

Netlink notifier can safely be converted to blocking one, I will send
a patch.

But I seriously doubt you really need NETLINK_URELEASE here,
it adds nothing but overhead, b/c the netlink notifier is called on
every netlink socket in the system, but for net exit path, that is
relatively a slow path.

Also, kauditd_send_skb() needs audit_cmd_mutex too.

I will send a formal patch.

Thanks.

Re: [PATCH] net: mlx5: Fix Kconfig help text

2016-12-09 Thread David Miller

From: Christopher Covington 
Date: Fri,  9 Dec 2016 16:53:05 -0500

> Since the following commit, Infiniband and Ethernet have not been
> mutually exclusive.
> 
> Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet
> 
> Signed-off-by: Christopher Covington 

Applied.

Re: [PATCH net-next] net: skb_condense() can also deal with empty skbs

2016-12-09 Thread David Miller

From: Eric Dumazet 
Date: Fri, 09 Dec 2016 08:02:05 -0800

> From: Eric Dumazet 
> 
> It seems attackers can also send UDP packets with no payload at all.
> 
> skb_condense() can still be a win in this case.
> 
> It will be possible to replace the custom code in tcp_add_backlog()
> to get full benefit from skb_condense()
> 
> Signed-off-by: Eric Dumazet 

Applied.

Re: [PATCH] i40e: don't truncate match_method assignment

2016-12-09 Thread David Miller

From: Jacob Keller 
Date: Fri,  9 Dec 2016 13:39:21 -0800

> The .match_method field is a u8, so we shouldn't be casting to a u16,
> and because it is only one byte, we do not need to byte swap anything.
> Just assign the value directly. This avoids issues on Big Endian
> architectures which would have byte swapped and then incorrectly
> truncated the value.
> 
> Signed-off-by: Jacob Keller 
> Cc: Stephen Rothwell 
> Cc: Bimmy Pujari 
> ---
> Not sure if this was already in Jeff's queue, but since it's an obvious
> fix for the issue found by Stephen, I thought I'd send it out now just
> to make sure. Thanks for catching this, and sorry we didn't find the fix
> earlier.

Jeff, what do you want me to do with this?

Re: [PATCH] net: smsc911x: back out silently on probe deferrals

2016-12-09 Thread David Miller

From: Linus Walleij 
Date: Fri,  9 Dec 2016 14:18:00 +0100

> When trying to get a regulator we may get deferred and we see
> this noise:
> 
> smsc911x 1b80.ethernet-ebi2 (unnamed net_device) (uninitialized):
>couldn't get regulators -517
> 
> Then the driver continues anyway. Which means that the regulator
> may not be properly retrieved and reference counted, and may be
> switched off in case noone else is using it.
> 
> Fix this by returning silently on deferred probe and let the
> system work it out.
> 
> Cc: Jeremy Linton 
> Signed-off-by: Linus Walleij 

Looks good, applied, thanks.

Re: pull-request: mac80211-next 2016-12-09

2016-12-09 Thread David Miller

From: Johannes Berg 
Date: Fri,  9 Dec 2016 13:00:13 +0100

> Closing net-next caught me by surprise, so I had to rebase a bit,
> but these three patches really should go in soon. I'm not sending
> them for 4.9 this late though.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

Re: [PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.

2016-12-09 Thread David Miller

From: Bartosz Folta 
Date: Fri, 9 Dec 2016 10:05:46 +

> There are hardware PCI implementations of Cadence GEM network controller. 
> This patch will allow to use such hardware with reuse of existing Platform 
> Driver.

Please properly format your commit message text to 80 columns.

> 
> Signed-off-by: Bartosz Folta 
> ---
>  drivers/net/ethernet/cadence/Kconfig|   9 ++
>  drivers/net/ethernet/cadence/Makefile   |   1 +
>  drivers/net/ethernet/cadence/macb.c |  31 +--
>  drivers/net/ethernet/cadence/macb_pci.c | 152 
> 
>  include/linux/platform_data/macb.h  |   6 ++
>  5 files changed, 194 insertions(+), 5 deletions(-)  create mode 100644 
> drivers/net/ethernet/cadence/macb_pci.c

This patch doesn't apply to net-next, please respin.

Re: [PATCH] ibmveth: set correct gso_size and gso_type

2016-12-09 Thread David Miller

From: Thomas Falcon 
Date: Thu,  8 Dec 2016 16:40:03 -0600

> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 
> "We recently encountered a bug where a few customers using ibmveth on the
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size
> which is translated by TCP into the MSS later up the stack. The MSS is
> used to calculate the TCP window size and as that was abnormally large,
> it was calculating a zero window, even although the sockets receive buffer
> was completely empty."
> 
> We rely on the Virtual I/O Server partition in a pseries
> environment to provide the MSS through the TCP header checksum
> field. The stipulation is that users should not disable checksum
> offloading if rx packet aggregation is enabled through VIOS.
> 
> Some firmware offerings provide the MSS in the RX buffer.
> This is signalled by a bit in the RX queue descriptor.
> 
> Reviewed-by: Brian King 
> Reviewed-by: Pradeep Satyanarayana 
> Reviewed-by: Marcelo Ricardo Leitner 
> Reviewed-by: Jonathan Maxwell 
> Reviewed-by: David Dai 
> Signed-off-by: Thomas Falcon 

Applied, although mis-using the TCP checksum field for this is kind of
bogus.  I'm surprised there wasn't some other place you could stick
this value, which wouldn't modify the packet contents.

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 20:59 -0500, Josef Bacik wrote:
> On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik  wrote:
> > 
> >>  On Dec 8, 2016, at 7:32 PM, Eric Dumazet  
> >> wrote:
> >> 
> >>>  On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote:
> >>> 
> >>>  We can reproduce the problem at will, still trying to run down the
> >>>  problem.  I'll try and find one of the boxes that dumped a core 
> >>> and get
> >>>  a bt of everybody.  Thanks,
> >> 
> >>  OK, sounds good.
> >> 
> >>  I had a look and :
> >>  - could not spot a fix that came after 4.6.
> >>  - could not spot an obvious bug.
> >> 
> >>  Anything special in the program triggering the issue ?
> >>  SO_REUSEPORT and/or special socket options ?
> >> 
> > 
> > So they recently started using SO_REUSEPORT, that's what triggered 
> > it, if they don't use it then everything is fine.
> > 
> > I added some instrumentation for get_port to see if it was looping in 
> > there and none of my printk's triggered.  The softlockup messages are 
> > always on the inet_bind_bucket lock, sometimes in the process context 
> > in get_port or in the softirq context either through inet_put_port or 
> > inet_kill_twsk.  On the box that I have a coredump for there's only 
> > one processor in the inet code so I'm not sure what to make of that.  
> > That was a box from last week so I'll look at a more recent core and 
> > see if it's different.  Thanks,
> 
> Ok more investigation today, a few bullet points
> 
> - With all the debugging turned on the boxes seem to recover after 
> about a minute.  I'd get the spam of the soft lockup messages all on 
> the inet_bind_bucket, and then the box would be fine.
> - I looked at a core I had from before I started investigating things 
> and there's only one process trying to get the inet_bind_bucket of all 
> the 48 cpus.
> - I noticed that there was over 100k twsk's in that original core.
> - I put a global counter of the twsk's (since most of the softlockup 
> messages have the twsk timers in the stack) and noticed with the 
> debugging kernel it started around 16k twsk's and once it recovered it 
> was down to less than a thousand.  There's a jump where it goes from 8k 
> to 2k and then there's only one more softlockup message and the box is 
> fine.
> - This happens when we restart the service with the config option to 
> start using SO_REUSEPORT.
> 
> The application is our load balancing app, so obviously has lots of 
> connections opened at any given time.  What I'm wondering and will test 
> on Monday is if the SO_REUSEPORT change even matters, or if simply 
> restarting the service is what triggers the problem.  One thing I 
> forgot to mention is that it's also using TCP_FASTOPEN in both the 
> non-reuseport and reuseport variants.
> 
> What I suspect is happening is the service stops, all of the sockets it 
> had open go into TIMEWAIT with relatively the same timer period, and 
> then suddenly all wake up at the same time which coupled with the 
> massive amount of traffic that we see per box anyway results in so much 
> contention and ksoftirqd usage that the box livelocks for a while.  
> With the lock debugging and stuff turned on we aren't able to service 
> as much traffic so it recovers relatively quickly, whereas a normal 
> production kernel never recovers.
> 
> Please keep in mind that I"m a file system developer so my conclusions 
> may be completely insane, any guidance would be welcome.  I'll continue 
> hammering on this on Monday.  Thanks,

Hmm... Is your ephemeral port range includes the port your load
balancing app is using ?

Re: [PATCH net v2] ibmveth: set correct gso_size and gso_type

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 19:31 -0600, Thomas Falcon wrote:
> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 

> + DIV_ROUND_UP(skb->len - hdr_len, mss);
> + } else if (offset) {
> + skb_shinfo(skb)->gso_size = ntohs(tcph->check);
> + skb_shinfo(skb)->gso_segs =
> + DIV_ROUND_UP(skb->len - hdr_len,
> +  skb_shinfo(skb)->gso_size);
> + tcph->check = 0;
> + }

Are you sure that tcph->check could never be 0 on some cases ?

That would crash on a divide by 0

Re: [PATCH v3 net-next 0/4] udp: receive path optimizations

2016-12-09 Thread David Miller

From: Eric Dumazet 
Date: Thu,  8 Dec 2016 11:41:53 -0800

> This patch series provides about 100 % performance increase under flood. 
> 
> v2: added Paolo feedback on udp_rmem_release() for tiny sk_rcvbuf
> added the last patch touching sk_rmem_alloc later

Series applied, thanks.

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-09 Thread Lino Sanfilippo

Hi,

On 09.12.2016 12:21, Pavel Machek wrote:
> On Fri 2016-12-09 00:19:43, Francois Romieu wrote:
>> Lino Sanfilippo  :
>> [...]
>> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
>> > this is caused by that locking scheme (in a way I have not figured out yet)
>> > or if it is a different issue.
>> 
>> stmmac_tx_err races with stmmac_xmit.
> 
> Umm, yes, that looks real.
> 
> And that means that removing tx_lock will not be completely trivial
> :-(. Lino, any ideas there?
> 

Ok, the race is there but it looks like a problem that is not related to 
the use or removal of the private lock.
By a glimpse into other drivers (e.g sky2 or e1000), a possible way to handle a 
tx error is to start a separate task and restart the tx path in that task 
instead
the irq handler (or timer in case of the watchdog).

In that task we could do:
1. deactivate napi
2. deactivate irqs
3. wait for running napi/irqs do complete (_sync)
4. call stmmac_tx_err()
5. reenable napi
6. reenable irqs

We have to ensure that no xmit() is executing while stmmac_tx_err() does the 
cleanup,
so stmmac_tx_err() should IMO rather call netif_tx_disable() instead of 
netif_stop_queue()
(the former grabs the xmit lock before it sets __QUEUE_STATE_DRV_XOFF to disable
the queue).

Regards,
Lino

Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII

2016-12-09 Thread Jie Deng



On 2016/12/10 0:39, Andrew Lunn wrote:
> On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote:
>>
>> On 2016/12/9 6:15, Florian Fainelli wrote:
>>> On 12/06/2016 07:57 PM, Jie Deng wrote:
 This patch adds phy-mode support for Synopsys XLGMAC
>>> The functional changes look good, but I would like to see some
>>> description of what the XL part stands for here.
>>>
>>> While you are modifying this, do you also mind submitting a Device Tree
>>> specification change:
>>>
>>> https://www.devicetree.org/specifications/
>>>
>>> Thanks!
>> Thank you for the information.
>>
>> Currenlty, the XLGMAC is a new IP from Synopsys.
> I think Florian wants to know about the IEEE standard or what ever
> which defines what the phy-mode XLGMAC is, in the same way there are
> standards for RGMII, SGMII, etc.
>
> Andrew
Understood! Thank you !

Re: Synopsys Ethernet QoS

2016-12-09 Thread Jie Deng



On 2016/12/10 8:16, Andy Shevchenko wrote:
> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli  
> wrote:
>
>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>> did
>> actually pioneer the upstreaming effort, but it is good to see people
>> from Synopsys willing to fix that in the future.
> Wait, you would like to tell that we have more than 2 drivers for the
> same (okay, same vendor) IP?!
> It's better to unify them earlier, than have n+ copies.
>
> P.S. Though, I don't see how sxgbe got in the list. First glance on
> the code doesn't show similarities.
Glance on sxgbe_reg.h the register seems from Synopsys XGMAC IP... Probably,
amd-xgbe and sxgbe targeted the same IP

Re: [PATCH 0/2 v3] net: qcom/emac: simplify support for different SOCs

2016-12-09 Thread David Miller

From: Timur Tabi 
Date: Thu,  8 Dec 2016 13:24:19 -0600

> On SOCs that have the Qualcomm EMAC network controller, the internal
> PHY block is always different.  Sometimes the differences are small, 
> sometimes it might be a completely different IP.  Either way, using version
> numbers to differentiate them and putting all of the init code in one
> file does not scale.
> 
> This patchset does two things:  The first breaks up the current code into
> different files, and the second patch adds support for a third SOC, the
> Qualcomm Technologies QDF2400 ARM Server SOC.

Series applied.

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Josef Bacik


On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik  wrote:


 On Dec 8, 2016, at 7:32 PM, Eric Dumazet  
wrote:



 On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote:

 We can reproduce the problem at will, still trying to run down the
 problem.  I'll try and find one of the boxes that dumped a core 
and get

 a bt of everybody.  Thanks,


 OK, sounds good.

 I had a look and :
 - could not spot a fix that came after 4.6.
 - could not spot an obvious bug.

 Anything special in the program triggering the issue ?
 SO_REUSEPORT and/or special socket options ?



So they recently started using SO_REUSEPORT, that's what triggered 
it, if they don't use it then everything is fine.


I added some instrumentation for get_port to see if it was looping in 
there and none of my printk's triggered.  The softlockup messages are 
always on the inet_bind_bucket lock, sometimes in the process context 
in get_port or in the softirq context either through inet_put_port or 
inet_kill_twsk.  On the box that I have a coredump for there's only 
one processor in the inet code so I'm not sure what to make of that.  
That was a box from last week so I'll look at a more recent core and 
see if it's different.  Thanks,


Ok more investigation today, a few bullet points

- With all the debugging turned on the boxes seem to recover after 
about a minute.  I'd get the spam of the soft lockup messages all on 
the inet_bind_bucket, and then the box would be fine.
- I looked at a core I had from before I started investigating things 
and there's only one process trying to get the inet_bind_bucket of all 
the 48 cpus.

- I noticed that there was over 100k twsk's in that original core.
- I put a global counter of the twsk's (since most of the softlockup 
messages have the twsk timers in the stack) and noticed with the 
debugging kernel it started around 16k twsk's and once it recovered it 
was down to less than a thousand.  There's a jump where it goes from 8k 
to 2k and then there's only one more softlockup message and the box is 
fine.
- This happens when we restart the service with the config option to 
start using SO_REUSEPORT.


The application is our load balancing app, so obviously has lots of 
connections opened at any given time.  What I'm wondering and will test 
on Monday is if the SO_REUSEPORT change even matters, or if simply 
restarting the service is what triggers the problem.  One thing I 
forgot to mention is that it's also using TCP_FASTOPEN in both the 
non-reuseport and reuseport variants.


What I suspect is happening is the service stops, all of the sockets it 
had open go into TIMEWAIT with relatively the same timer period, and 
then suddenly all wake up at the same time which coupled with the 
massive amount of traffic that we see per box anyway results in so much 
contention and ksoftirqd usage that the box livelocks for a while.  
With the lock debugging and stuff turned on we aren't able to service 
as much traffic so it recovers relatively quickly, whereas a normal 
production kernel never recovers.


Please keep in mind that I"m a file system developer so my conclusions 
may be completely insane, any guidance would be welcome.  I'll continue 
hammering on this on Monday.  Thanks,


Josef

Re: Synopsys Ethernet QoS

2016-12-09 Thread Florian Fainelli

Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli  
> wrote:
> 
>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
> 
>> did
>> actually pioneer the upstreaming effort, but it is good to see people
>> from Synopsys willing to fix that in the future.
> 
> Wait, you would like to tell that we have more than 2 drivers for the
> same (okay, same vendor) IP?!
> It's better to unify them earlier, than have n+ copies.

Unfortunately that is the case, see this email:

https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html

dwc_eth_qos and stmmac have some overlap. There seems to be work
underway to unify these two to begin with.

> 
> P.S. Though, I don't see how sxgbe got in the list. First glance on
> the code doesn't show similarities.

Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
just my cursory look at the code, it may very well be something entirely
different. The descriptor formats just look suspiciously similar.
-- 
Florian

[PATCH net v2] ibmveth: set correct gso_size and gso_type

2016-12-09 Thread Thomas Falcon

This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King 
Reviewed-by: Pradeep Satyanarayana 
Reviewed-by: Marcelo Ricardo Leitner 
Reviewed-by: Jonathan Maxwell 
Reviewed-by: David Dai 
Signed-off-by: Thomas Falcon 
---
v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
and make sure everyone is included on CC
---
 drivers/net/ethernet/ibm/ibmveth.c | 72 --
 drivers/net/ethernet/ibm/ibmveth.h |  1 +
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index ebe6071..f0c3ae7 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -58,7 +58,7 @@
 
 static const char ibmveth_driver_name[] = "ibmveth";
 static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet 
Driver";
-#define ibmveth_driver_version "1.05"
+#define ibmveth_driver_version "1.06"
 
 MODULE_AUTHOR("Santiago Leon ");
 MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver");
@@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct 
ibmveth_adapter *adapter)
return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK;
 }
 
+static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter)
+{
+   return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT;
+}
+
 static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter)
 {
return 
be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length);
@@ -1174,6 +1179,52 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff 
*skb,
goto retry_bounce;
 }
 
+static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
+{
+   struct tcphdr *tcph;
+   int offset = 0;
+   int hdr_len;
+
+   /* only TCP packets will be aggregated */
+   if (skb->protocol == htons(ETH_P_IP)) {
+   struct iphdr *iph = (struct iphdr *)skb->data;
+
+   if (iph->protocol == IPPROTO_TCP) {
+   offset = iph->ihl * 4;
+   skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+   } else {
+   return;
+   }
+   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data;
+
+   if (iph6->nexthdr == IPPROTO_TCP) {
+   offset = sizeof(struct ipv6hdr);
+   skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+   } else {
+   return;
+   }
+   } else {
+   return;
+   }
+   /* if mss is not set through Large Packet bit/mss in rx buffer,
+* expect that the mss will be written to the tcp header checksum.
+*/
+   tcph = (struct tcphdr *)(skb->data + offset);
+   hdr_len = offset + tcph->doff * 4;
+   if (lrg_pkt) {
+   skb_shinfo(skb)->gso_size = mss;
+   skb_shinfo(skb)->gso_segs =
+   DIV_ROUND_UP(skb->len - hdr_len, mss);
+   } else if (offset) {
+   skb_shinfo(skb)->gso_size = ntohs(tcph->check);
+   skb_shinfo(skb)->gso_segs =
+   DIV_ROUND_UP(skb->len - hdr_len,
+skb_shinfo(skb)->gso_size);
+   tcph->check = 0;
+   }
+}
+
 static int ibmveth_poll(struct napi_struct *napi, int budget)
 {
struct ibmveth_adapter *adapter =
@@ -1182,6 +1233,7 @@ static int ibmveth_poll(struct napi_struct *napi, int 
budget)
int frames_processed = 0;
unsigned long lpar_rc;
struct iphdr *iph;
+   u16 mss = 0;
 
 restart_poll:
while (frames_processed < budget) {
@@ -1199,9 +1251,21 @@

Re: Synopsys Ethernet QoS

2016-12-09 Thread Andy Shevchenko

On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli  wrote:

> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)

> did
> actually pioneer the upstreaming effort, but it is good to see people
> from Synopsys willing to fix that in the future.

Wait, you would like to tell that we have more than 2 drivers for the
same (okay, same vendor) IP?!
It's better to unify them earlier, than have n+ copies.

P.S. Though, I don't see how sxgbe got in the list. First glance on
the code doesn't show similarities.

-- 
With Best Regards,
Andy Shevchenko

fib_frontend: Add network specific broadcasts, when it takes a sense

2016-12-09 Thread Brandon Philips

Hello-

A number of us are working on an OSS overlay network system called flannel.
It is used in a variety of Linux container systems and one of the backends
is VXLAN.

The issue we have: when creating the VXLAN interface and assigning it an
address we see a broadcast route being added by the Kernel. For example if
we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is created. This route is
unwanted because we assign 10.4.0.0 to one of our VXLAN interfaces.

However, the Kernel interface bring-up comment reads: Add network specific
broadcasts, when it takes a sense. The code is here:
https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872

Can someone explain why creation of the broadcast route is non-optional?
Would a patch to make it optional be acceptable? Is it safe for us to
simply delete the route? We have a patch that simply deletes the broadcast
route after interface creation but don't know why the Kernel code "makes
sense".

You can read more information about the issue here:
https://github.com/coreos/flannel/pull/569

Thank You,

Brandon

Re: [PATCH V2 03/22] bnxt_re: register with the NIC driver

2016-12-09 Thread Jonathan Toppins

On 12/09/2016 01:47 AM, Selvin Xavier wrote:
> This patch handles the registration with bnxt_en driver. The driver registers
> with netdev notifier chain. Upon receiving NETDEV_REGISTER event, the driver
> in turn registers with bnxt_en driver.
>   1. bnxt_en's ulp_probe function returns a structure that contains 
> information
>  about the device and additional entry points.
>   2. bnxt_en driver returns 'struct bnxt_eth_dev' that contains set of 
> operation
>  vectors that RocE driver invokes later.
>   3. bnxt_request_msix() allows the RoCE driver to specify the number of 
> MSI-X
>  vectors that are needed.
>   4. bnxt_send_fw_msg () can be used to send messages to the FW
>   5. bnxt_register_async_events() can be used to register for async event
>  callbacks.
> 
> v2: Remove some sparse warning. Also, remove some unused code from unreg path.
> 
> Signed-off-by: Eddie Wai 
> Signed-off-by: Devesh Sharma 
> Signed-off-by: Somnath Kotur 
> Signed-off-by: Sriharsha Basavapatna 
> Signed-off-by: Selvin Xavier 
> ---
>  drivers/infiniband/hw/bnxtre/bnxt_re.h  |  48 +++
>  drivers/infiniband/hw/bnxtre/bnxt_re_main.c | 436 
> 
>  2 files changed, 484 insertions(+)
> 

[...]

>  #endif
> diff --git a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c 
> b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> index ebe1c69..029824a 100644
> --- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> +++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
> +
> +static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
> +{
> + int i, j, rc;
> +
> + /* Registered a new RoCE device instance to netdev */
> + rc = bnxt_re_register_netdev(rdev);
> + if (rc) {
> + pr_err("Failed to register with netedev: %#x\n", rc);
> + return -EINVAL;
> + }
> + set_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, >flags);
> +
> + rc = bnxt_re_request_msix(rdev);
> + if (rc) {
> + pr_err("Failed to get MSI-X vectors: %#x\n", rc);
> + rc = -EINVAL;
> + goto fail;
> + }
> + set_bit(BNXT_RE_FLAG_GOT_MSIX, >flags);

Though this exit path looks correct (need to verify) once all patches
are applied, this looks incorrect if only considering this specific
patch. I think you need the following:

+ return 0;

> +
> +fail:
> + bnxt_re_ib_unreg(rdev, true);
> + return rc;
> +}
> +

fib_frontend: Add network specific broadcasts, when it takes a sense

2016-12-09 Thread Brandon Philips

Hello-

A number of us are working on an OSS overlay network system called
flannel. It is used in a variety of Linux container systems and one of
the backends is VXLAN.

The issue we have: when creating the VXLAN interface and assigning it
an address we see a broadcast route being added by the Kernel. For
example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is
created. This route is unwanted because we assign 10.4.0.0 to one of
our VXLAN interfaces.

However, the Kernel interface bring-up comment reads: Add network
specific broadcasts, when it takes a sense. The code is here:
https://github.com/torvalds/linux/blob/master/net/ipv4/fib_frontend.c#L859-L872

Can someone explain why creation of the broadcast route is
non-optional? Would a patch to make it optional be acceptable? Is it
safe for us to simply delete the route? We have a patch that simply
deletes the broadcast route after interface creation but don't know
why the Kernel code "makes sense".

You can read more information about the issue here:
https://github.com/coreos/flannel/pull/569

Thank You,

Brandon

Re: [PATCH 2/6] net: ethernet: ti: cpts: add support for ext rftclk selection

2016-12-09 Thread Grygorii Strashko



On 12/08/2016 06:47 PM, Stephen Boyd wrote:
> On 12/06, Grygorii Strashko wrote:
>> Subject: [PATCH] cpts refclk sel
>>
>> Signed-off-by: Grygorii Strashko 
>> ---
>>  arch/arm/boot/dts/keystone-k2e-netcp.dtsi | 10 +-
>>  drivers/net/ethernet/ti/cpts.c| 52 
>> ++-
>>  2 files changed, 60 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi 
>> b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
>> index 919e655..b27aa22 100644
>> --- a/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
>> +++ b/arch/arm/boot/dts/keystone-k2e-netcp.dtsi
>> @@ -138,7 +138,7 @@ netcp: netcp@2400 {
>>  /* NetCP address range */
>>  ranges = <0 0x2400 0x100>;
>>
>> -clocks = <>, <>, <>;
>> +clocks = <>, <>, <_mux>;

^^ mux clock used here

>>  clock-names = "pa_clk", "ethss_clk", "cpts";
>>  dma-coherent;
>>
>> @@ -162,6 +162,14 @@ netcp: netcp@2400 {
>>  cpts-ext-ts-inputs = <6>;
>>  cpts-ts-comp-length;
>>
>> +cpts_mux: cpts_refclk_mux {
>> +#clock-cells = <0>;
>> +clocks = <>, <>;
>> +cpts-mux-tbl = <0>, <1>;
>> +assigned-clocks = <_mux>;
>> +assigned-clock-parents = <>;
> 
> Is there a binding update?
 
this was pure RFC-DEV patch just to check the possibility of modeling 
CPTS_RFTCLK_SEL register as mux clock. 
Original patch:
https://lkml.org/lkml/2016/11/28/780

I've plan to resend it using clk framework.

 Why the subnode? 

Sry, I did not get this question - is there another way to pas phandle on clock
in clocks list property? Am I missing smth.?

Sry, this is my first clock :)

> Why not have it as part of the netcp node?

cpts is part of gbe ethss, which is part of netcp.

Only netcp is modeled as DD - cpts and gbe ethss implemented without using DD 
model,
so generic resources acquired by netcp and then passed to cpts and gbe ethss.

CPTS has register to control an external multiplexer that selects
one of up to 32 clocks for time sync reference (RFTCLK)

> Does the cpts-mux-tbl property change?

On Keystone 2 66AK2e (as example) the following list of clocks can be selected 
as ref clocks (list is different for other SoCs):
 = SYSCLK2
0001 = SYSCLK3
0010 = TIMI0
0011 = TIMI1
0100 = TSIPCLKA
1000 = TSREFCLK
1100 = TSIPCLKB
Others = Reserved

and only 0 and 1 are internal, other external and board specific
(parameters unknown and corresponding inputs can be used for other purposes),
so I can't define all parent clocks, only internal:

clocks = <>, <>;
cpts-mux-tbl = <0>, <1>;

to use another, external, clock - it should be explicitly defined in board file 
the board file 

timi1clk: timi1clk {
#clock-cells = <0>;
compatible = "fixed-clock";
...

_mux {
clocks = <>, <>, ;
^^^ i can't predict value here
cpts-mux-tbl = <0>, <1>, <3>;
^^i can't predict value here
assigned-clocks = <_mux>;
assigned-clock-parents = <>;
};

or I understood your question wrongly?

> 
>> +};
>> +
>>  interfaces {
>>  gbe0: interface-0 {
>>  slave-port = <0>;
>> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
>> index 938de22..ef94316 100644
>> --- a/drivers/net/ethernet/ti/cpts.c
>> +++ b/drivers/net/ethernet/ti/cpts.c
>> @@ -17,6 +17,7 @@
>>   * along with this program; if not, write to the Free Software
>>   * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301  USA
>>   */
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -672,6 +673,7 @@ int cpts_register(struct cpts *cpts)
>>  cpts->phc_index = ptp_clock_index(cpts->clock);
>>
>>  schedule_delayed_work(>overflow_work, cpts->ov_check_period);
>> +
> 
> Maybe in another patch.
> 

sure

>>  return 0;
>>
>>  err_ptp:
>> @@ -741,6 +743,54 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
>>   freq, cpts->cc_mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
>>  }
>>

...

>> +
>> +reg = >reg->rftclk_sel;
>> +
>> +clk = clk_register_mux_table(cpts->dev, refclk_np->name,
>> + parent_names, num_parents,
>> + 0, reg, 0, 0x1F, 0, mux_table, NULL);
>> +if (IS_ERR(clk))
>> +return PTR_ERR(clk);
>> +
>> +return of_clk_add_provider(refclk_np, of_clk_src_simple_get, clk);
> 
> Can you please use the clk_hw APIs instead?
> 

ok

-- 
regards,
-grygorii

Re: Synopsys Ethernet QoS

2016-12-09 Thread Florian Fainelli

On 12/09/2016 02:25 PM, Andy Shevchenko wrote:
> On Fri, Dec 9, 2016 at 5:41 PM, David Miller  wrote:
> 
>> But one thing I am against is changing the driver name for existing
>> users.  If an existing chip is supported by the stmmac driver for
>> existing users, they should still continue to use the "stmmac" driver.
>>
>> Therefore, if consolidation changes the driver module name for
>> existing users, then that is not a good plan at all.
> 
> You have at least one supporter here. Though I jumped in to the
> discussion very late, not sure if everyone have time to answer to
> that.

I don't have many stakes in the stmmac driver (or other Synopsys drivers
for that matter), but renaming seems like a terrible idea that is going
to make backporting of fixes difficult for distribution.

While moving the driver into a separate directory could be done, and git
knows how to track files, renaming the driver entirely would break many
platforms (including but not limited to, Device Tree) that you may not
have visibility over (compatible strings, properties, and platform
device driver name for instance).

It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe) did
actually pioneer the upstreaming effort, but it is good to see people
from Synopsys willing to fix that in the future.
-- 
Florian

Re: [PATCH] net:ethernet:samsung:initialize cur_rx_qnum

2016-12-09 Thread Francois Romieu

Rayagond Kokatanur  :
> This patch initialize the cur_rx_qnum upon occurence of rx interrupt,
> without this initialization driver will not work with multiple rx queues
> configurations.
> 
> NOTE: This patch is not tested on actual hw.

(your patch should include a Signed-off-by)

Imho the driver needs more changes to support multiple rx queues.

- rx interrupt for queue A -> priv->cur_rx_qnum = A
- rx interrupt for queue B -> priv->cur_rx_qnum = B
- rx napi processing   -> Err...

Please start turning priv->cur_rx_qnum into a SXGBE_RX_QUEUES sized bitmap.

-- 
Ueimor

Re: [PATCH v2 1/4] net: hix5hd2_gmac: add generic compatible string

2016-12-09 Thread Rob Herring

On Mon, Dec 05, 2016 at 09:27:58PM +0800, Dongpo Li wrote:
> The "hix5hd2" is SoC name, add the generic ethernet driver name.
> The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
> the SG/TXCSUM/TSO/UFO features.
> 
> Signed-off-by: Dongpo Li 
> ---
>  .../devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt|  9 +++--
>  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 15 
> +++
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt 
> b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
> index 75d398b..75920f0 100644
> --- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
> @@ -1,7 +1,12 @@
>  Hisilicon hix5hd2 gmac controller
>  
>  Required properties:
> -- compatible: should be "hisilicon,hix5hd2-gmac".
> +- compatible: should contain one of the following SoC strings:
> + * "hisilicon,hix5hd2-gemac"
> + * "hisilicon,hi3798cv200-gemac"
> + and one of the following version string:
> + * "hisilicon,hisi-gemac-v1"
> + * "hisilicon,hisi-gemac-v2"

What combinations are valid? I assume both chips don't have both v1 and 
v2. 2 SoCs and 2 versions so far, I don't think there is much point to 
have the v1 and v2 compatible strings.

>  - reg: specifies base physical address(s) and size of the device registers.
>The first region is the MAC register base and size.
>The second region is external interface control register.
> @@ -20,7 +25,7 @@ Required properties:
>  
>  Example:
>   gmac0: ethernet@f984 {
> - compatible = "hisilicon,hix5hd2-gmac";
> + compatible = "hisilicon,hix5hd2-gemac", 
> "hisilicon,hisi-gemac-v1";

You can't just change compatible strings.

>   reg = <0xf984 0x1000>,<0xf984300c 0x4>;
>   interrupts = <0 71 4>;
>   #address-cells = <1>;

Re: Synopsys Ethernet QoS

2016-12-09 Thread Andy Shevchenko

On Fri, Dec 9, 2016 at 5:41 PM, David Miller  wrote:

> But one thing I am against is changing the driver name for existing
> users.  If an existing chip is supported by the stmmac driver for
> existing users, they should still continue to use the "stmmac" driver.
>
> Therefore, if consolidation changes the driver module name for
> existing users, then that is not a good plan at all.

You have at least one supporter here. Though I jumped in to the
discussion very late, not sure if everyone have time to answer to
that.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH iproute2] Makefile: really suppress printing of directories

2016-12-09 Thread David Ahern

On 12/9/16 12:50 PM, Stephen Hemminger wrote:
> On Wed,  7 Dec 2016 12:55:09 -0800
> David Ahern  wrote:
> 
>> Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
>> defined however Config always defines VERBOSE. Update the check to
>> whether VERBOSE is 0.
>>
>> Fixes: 57bdf8b76451 ("Make builds default to quiet mode")
>> Signed-off-by: David Ahern 
> 
> Applied to net-next.
> 
> Patch only works with net-next, please label it next time.
> 

That does not sound right. The patch this one fixes was applied back in May, 
and Makefile has only had one other commit against it since.

Regardless, I will add the label to git to default to net-next.

[PATCH] net: mlx5: Fix Kconfig help text

2016-12-09 Thread Christopher Covington

Since the following commit, Infiniband and Ethernet have not been
mutually exclusive.

Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet

Signed-off-by: Christopher Covington 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index aae4688..521cfdb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -18,8 +18,6 @@ config MLX5_CORE_EN
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
- Ethernet and Infiniband support in ConnectX-4 are currently mutually
- exclusive.
 
 config MLX5_CORE_EN_DCB
bool "Data Center Bridging (DCB) Support"
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

[PATCH] i40e: don't truncate match_method assignment

2016-12-09 Thread Jacob Keller

The .match_method field is a u8, so we shouldn't be casting to a u16,
and because it is only one byte, we do not need to byte swap anything.
Just assign the value directly. This avoids issues on Big Endian
architectures which would have byte swapped and then incorrectly
truncated the value.

Signed-off-by: Jacob Keller 
Cc: Stephen Rothwell 
Cc: Bimmy Pujari 
---
Not sure if this was already in Jeff's queue, but since it's an obvious
fix for the issue found by Stephen, I thought I'd send it out now just
to make sure. Thanks for catching this, and sorry we didn't find the fix
earlier.

 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 69a51a4119d6..6ccf18464339 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2257,8 +2257,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
}
add_list[num_add].queue_number = 0;
/* set invalid match method for later detection */
-   add_list[num_add].match_method =
-   cpu_to_le16((u16)I40E_AQC_MM_ERR_NO_RES);
+   add_list[num_add].match_method = I40E_AQC_MM_ERR_NO_RES;
cmd_flags |= I40E_AQC_MACVLAN_ADD_PERFECT_MATCH;
add_list[num_add].flags = cpu_to_le16(cmd_flags);
num_add++;
-- 
2.11.0.rc2.152.g4d04e67

Re: [PATCH net-next v3 1/2] net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac

2016-12-09 Thread Rob Herring

On Sat, Dec 03, 2016 at 12:32:37AM +0100, Martin Blumenstingl wrote:
> This allows configuring the RGMII TX clock delay. The RGMII clock is
> generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue.
> The configuration depends on the actual hardware (no delay may be
> needed due to the design of the actual circuit, the PHY might add this
> delay, etc.).
> 
> Signed-off-by: Martin Blumenstingl 
> Tested-by: Neil Armstrong 
> ---
>  Documentation/devicetree/bindings/net/meson-dwmac.txt | 14 ++
>  1 file changed, 14 insertions(+)

Acked-by: Rob Herring

Re: [PATCH net-next 1/3] net/mlx5e: use %pad format string for dma_addr_t

2016-12-09 Thread Saeed Mahameed



On 12/08/2016 11:57 PM, Arnd Bergmann wrote:
> On 32-bit ARM with 64-bit dma_addr_t I get this warning about an
> incorrect format string:
> 
> In file included from 
> /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0:
> drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function 
> ‘mlx5_frag_buf_alloc_node’:
> drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to 
> pointer from integer of different size [-Werror=int-to-pointer-cast]
> 
> We have the special %pad format for printing dma_addr_t, so use that
> to print the correct address and avoid the warning.
> 
> Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)")
> Signed-off-by: Arnd Bergmann 

Thank you Arnd !!

Acked-by: Saeed Mahameed

Re: [PATCH iproute2] Makefile: really suppress printing of directories

2016-12-09 Thread Stephen Hemminger

On Wed,  7 Dec 2016 12:55:09 -0800
David Ahern  wrote:

> Makefile adds --no-print-directory to MAKEFLAGS if VERBOSE is not
> defined however Config always defines VERBOSE. Update the check to
> whether VERBOSE is 0.
> 
> Fixes: 57bdf8b76451 ("Make builds default to quiet mode")
> Signed-off-by: David Ahern 

Applied to net-next.

Patch only works with net-next, please label it next time.

Re: [PATCH v2 iproute2/net-next 3/3] tc: flower: support matching on ICMP type and code

2016-12-09 Thread Stephen Hemminger

On Wed,  7 Dec 2016 14:54:03 +0100
Simon Horman  wrote:

> Support matching on ICMP type and code.
> 
> Example usage:
> 
> tc qdisc add dev eth0 ingress
> 
> tc filter add dev eth0 protocol ip parent : flower \
>   indev eth0 ip_proto icmp type 8 code 0 action drop
> 
> tc filter add dev eth0 protocol ipv6 parent : flower \
>   indev eth0 ip_proto icmpv6 type 128 code 0 action drop
> 
> Signed-off-by: Simon Horman 

Applied to net-next

Re: [PATCH v2 iproute2/net-next 2/3] tc: flower: introduce enum flower_endpoint

2016-12-09 Thread Stephen Hemminger

On Wed,  7 Dec 2016 14:54:02 +0100
Simon Horman  wrote:

> Introduce enum flower_endpoint and use it instead of a bool
> as the type for paramatising source and destination.
> 
> This is intended to improve read-ability and provide some type
> checking of endpoint parameters.
> 
> Signed-off-by: Simon Horman 

Applied to net-next

Re: [PATCH v2 iproute2/net-next 1/3] tc: flower: update headers for TCA_FLOWER_KEY_ICMP*

2016-12-09 Thread Stephen Hemminger

On Wed,  7 Dec 2016 14:54:01 +0100
Simon Horman  wrote:

> These are proposed changes for net-next.
> 
> Signed-off-by: Simon Horman 

Picked this up with upstream headers update

Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-09 Thread Michael S. Tsirkin

On Fri, Dec 09, 2016 at 03:18:02PM +, Bart Van Assche wrote:
> On 12/08/16 22:40, Madhani, Himanshu wrote:
> > We’ll take a look and send patches to resolve these warnings.
> 
> Thanks!
> 
> Bart.
> 

Sounds good. I posted what I have so far so that you can
start from that.

-- 
MST

Re: [PATCH iproute2 net-next] bpf: Fix number of retries when growing log buffer

2016-12-09 Thread Stephen Hemminger

On Wed,  7 Dec 2016 10:47:59 +0100
Thomas Graf  wrote:

> The log buffer is automatically grown when the verifier output does not
> fit into the default buffer size. The number of growing attempts was
> not sufficient to reach the maximum buffer size so far.
> 
> Perform 9 iterations to reach max and let the 10th one fail.
> 
> j:0 i:65536 max:16777215
> j:1 i:131072max:16777215
> j:2 i:262144max:16777215
> j:3 i:524288max:16777215
> j:4 i:1048576   max:16777215
> j:5 i:2097152   max:16777215
> j:6 i:4194304   max:16777215
> j:7 i:8388608   max:16777215
> j:8 i:16777216  max:16777215
> 
> Signed-off-by: Thomas Graf 
> Acked-by: Daniel Borkmann 

Applied to net-next

Re: [PATCH] uio-hv-generic: store physical addresses instead of virtual

2016-12-09 Thread Arnd Bergmann

On Friday, December 9, 2016 9:28:44 AM CET Stephen Hemminger wrote:
> On Fri,  9 Dec 2016 12:44:40 +0100
> Arnd Bergmann  wrote:

> > Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus")
> > Signed-off-by: Arnd Bergmann 
> 
> Thanks, the code was inherited from outside, and only tested on x86_64.
> Not sure which platform and GCC version generates the warning, was this just 
> W=1?
> 
> Acked-by: Stephen Hemminger 

This was a regular warning with a randconfig build on arm32, but
it happens on any 32-bit architecture when CONFIG_PHYS_ADDR_T_64BIT
is enabled.

Arnd

Re: [PATCH net-next v2] dsa:mv88e6xxx: dispose irq mapping for chip->irq

2016-12-09 Thread Andrew Lunn

On Wed, Dec 07, 2016 at 05:40:12PM +0100, Volodymyr Bendiuga wrote:
> Yes, most of the users of of_irq_get() do not use irq_dispose_mapping().
> 
> But some of them do (some irq chips), and I believe the correct way
> of doing this is to
> 
> dispose irq mapping, as the description for this function says that
> it unmaps
> 
> the irq, which is mapped by of_irq_parse_and_map(). Not disposing
> irq might not make
> 
> any affect on most drivers, but some, that get EPROBE_DEFER error do
> need to dispose.
> 
> This is what I get when I run the code.
> 
> of_irq_put() could be implemented, and it would be a wrapper for
> irq_dispose_mapping()
> 
> as I can see it. Should I do it this way?

Hi Volodymyr

Yes, i think having of_irq_put() would be good. It gives some symmetry
to the API.

   Andrew

Re: [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf

2016-12-09 Thread Joe Stringer

On 8 December 2016 at 21:18, Wangnan (F)  wrote:
>
>
> On 2016/12/9 13:04, Wangnan (F) wrote:
>>
>>
>>
>> On 2016/12/9 10:46, Joe Stringer wrote:
>>
>> [SNIP]
>>
>>>   diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
>>> index 62d89d50fcbd..616bd55f3be8 100644
>>> --- a/tools/lib/bpf/Makefile
>>> +++ b/tools/lib/bpf/Makefile
>>> @@ -149,6 +149,8 @@ CMD_TARGETS = $(LIB_FILE)
>>> TARGETS = $(CMD_TARGETS)
>>>   +libbpf: all
>>> +
>>
>>
>> Why we need this? I tested this patch without it and it seems to work, and
>> this line causes an extra error:
>>  $ pwd
>>  /home/wn/kernel/tools/lib/bpf
>>  $ make libbpf
>>  ...
>>  gcc -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DHAVE_ELF_GETPHDRNUM_SUPPORT
>> -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security
>> -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes
>> -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked
>> -Wredundant-decls -Wshadow -Wstrict-aliasing=3 -Wstrict-prototypes
>> -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Werror
>> -Wall -fPIC -I. -I/home/wn/kernel-hydrogen/tools/include
>> -I/home/wn/kernel-hydrogen/tools/arch/x86/include/uapi
>> -I/home/wn/kernel-hydrogen/tools/include/uapilibbpf.c all   -o libbpf
>>  gcc: error: all: No such file or directory
>>  make: *** [libbpf] Error 1
>>
>> Thank you.
>
>
> It is not 'caused' by your patch. 'make libbpf' fails without
> your change because it tries to build an executable from
> libbpf.c, but main() is missing.
>
> I think libbpf should never be used as a make target. Your
> new dependency looks strange.

Thanks for the feedback, I sent a patch to address this on top of perf/core:

https://lkml.org/lkml/2016/12/9/518

Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)

2016-12-09 Thread Selvin Xavier

On Fri, Dec 9, 2016 at 8:56 PM, David Miller  wrote:
> From: Selvin Xavier 
> Date: Thu,  8 Dec 2016 22:47:54 -0800
>
>> This series introduces the RoCE driver for the Broadcom
>> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs.
>> This driver is dependent on the bnxt_en NIC driver and is
>> based on the bnxt_re branch in Doug's repository. bnxt_en changes
>> required for this patch series is already available in this branch.
>>
>> I am preparing a git repository with these changes as per Jason's
>> comment and will share the details later today.
>
> If this is targetted at the net-next tree, it is too late as I've
> closed the net-next tree two nights ago.
>

This patch series is targeting linux-rdma tree. netdev is copied since
this series is dependent on  bnxt_en.

Thanks
Selvin

[PATCH perf/core] samples/bpf: Drop unnecessary build targets.

2016-12-09 Thread Joe Stringer

Commit f72179ef11db ("samples/bpf: Switch over to libbpf") added these
two makefile changes that were unnecessary for switching samples to use
libbpf. The extra make is already handled by the build dependency, and
libbpf target doesn't build because it lacks main(). Remove these.

Reported-by: Wang Nan 
Signed-off-by: Joe Stringer 
---
 samples/bpf/Makefile   | 1 -
 tools/lib/bpf/Makefile | 2 --
 2 files changed, 3 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 9ffa6a2c061d..60ffc8115b67 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -127,7 +127,6 @@ CLANG ?= clang
 
 # Trick to allow make to be run from this directory
 all:
-   $(MAKE) -C ../../ tools/lib/bpf/
$(MAKE) -C ../../ $$PWD/
 
 clean:
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 616bd55f3be8..62d89d50fcbd 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -149,8 +149,6 @@ CMD_TARGETS = $(LIB_FILE)
 
 TARGETS = $(CMD_TARGETS)
 
-libbpf: all
-
 all: fixdep $(VERSION_FILES) all_cmd
 
 all_cmd: $(CMD_TARGETS)
-- 
2.10.2

Re: [PATCH] uio-hv-generic: store physical addresses instead of virtual

2016-12-09 Thread Stephen Hemminger

On Fri,  9 Dec 2016 12:44:40 +0100
Arnd Bergmann  wrote:

> gcc warns about the newly added driver when phys_addr_t is wider than
> a pointer:
> 
> drivers/uio/uio_hv_generic.c: In function 'hv_uio_mmap':
> drivers/uio/uio_hv_generic.c:71:17: error: cast to pointer from integer of 
> different size [-Werror=int-to-pointer-cast]
> virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT,
> drivers/uio/uio_hv_generic.c: In function 'hv_uio_probe':
> drivers/uio/uio_hv_generic.c:140:5: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
>= (phys_addr_t)dev->channel->ringbuffer_pages;
> drivers/uio/uio_hv_generic.c:147:3: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
>(phys_addr_t)vmbus_connection.int_page;
> drivers/uio/uio_hv_generic.c:153:3: error: cast from pointer to integer of 
> different size [-Werror=pointer-to-int-cast]
>(phys_addr_t)vmbus_connection.monitor_pages[1];
> 
> I can't see why we store a virtual address in a phys_addr_t here,
> as the only user of that variable converts it into a physical
> address anyway, so this moves the conversion to where it logically
> fits according to the types.
> 
> Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus")
> Signed-off-by: Arnd Bergmann 

Thanks, the code was inherited from outside, and only tested on x86_64.
Not sure which platform and GCC version generates the warning, was this just 
W=1?

Acked-by: Stephen Hemminger

Re: [PATCH v2 net-next 0/4] udp: receive path optimizations

2016-12-09 Thread Tom Herbert

On Fri, Dec 9, 2016 at 8:53 AM, Eric Dumazet  wrote:
> On Fri, 2016-12-09 at 08:43 -0800, Tom Herbert wrote:
>>
>>
>
>
>> Are you thinking of allowing unconnected socket to have multiple input
>> queues? Sort of an automatic and transparent SO_REUSEPORT...
>
> It all depends if the user application is using a single thread or
> multiple threads to drain the queue.
>
If they're using multiple threads hopefully there's no reason they
can't use SO_REUSEPORT. Since we should always assume DDOS is
possibility it seems like that should be a general recommendation: If
you have multiple threads listening on a port use SO_REUSEPORT.

> Since we used to grab socket lock in udp_recvmsg(), I guess nobody uses
> multiple threads to read packets from a single socket.
>
That's the hope! So the problem at hand is multiple producer CPUs and
one consumer CPU.

> So heavy users must use SO_REUSEPORT already, not sure what we would
> gain trying to go to a single socket, with the complexity of mem
> charging.
>
I think you're making a good point a the possibility that any
unconnected UDP socket could be subject to an attack, so any use of
unconnected UDP has the potential to become a "heavy user" (in fact
we've seen bring down whole networks before in production). Therefore
the single thread reader case is relevant to consider.

Tom

>
>>
>
>

Re: [PATCH v2 net-next 0/4] udp: receive path optimizations

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 08:43 -0800, Tom Herbert wrote:
> 
> 

> Are you thinking of allowing unconnected socket to have multiple input
> queues? Sort of an automatic and transparent SO_REUSEPORT... 

It all depends if the user application is using a single thread or
multiple threads to drain the queue.

Since we used to grab socket lock in udp_recvmsg(), I guess nobody uses
multiple threads to read packets from a single socket.

So heavy users must use SO_REUSEPORT already, not sure what we would
gain trying to go to a single socket, with the complexity of mem
charging.

>

Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII

2016-12-09 Thread Andrew Lunn

On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote:
> 
> 
> On 2016/12/9 6:15, Florian Fainelli wrote:
> > On 12/06/2016 07:57 PM, Jie Deng wrote:
> >> This patch adds phy-mode support for Synopsys XLGMAC
> > The functional changes look good, but I would like to see some
> > description of what the XL part stands for here.
> >
> > While you are modifying this, do you also mind submitting a Device Tree
> > specification change:
> >
> > https://www.devicetree.org/specifications/
> >
> > Thanks!
> Thank you for the information.
> 
> Currenlty, the XLGMAC is a new IP from Synopsys.

I think Florian wants to know about the IEEE standard or what ever
which defines what the phy-mode XLGMAC is, in the same way there are
standards for RGMII, SGMII, etc.

  Andrew

Re: [PATCH v2 net-next 0/4] udp: receive path optimizations

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 17:05 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 08 Dec 2016 13:13:15 -0800
> Eric Dumazet  wrote:
> 
> > On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> > > On Thu,  8 Dec 2016 09:38:55 -0800
> > > Eric Dumazet  wrote:
> > >   
> > > > This patch series provides about 100 % performance increase under 
> > > > flood.   
> > > 
> > > Could you please explain a bit more about what kind of testing you are
> > > doing that can show 100% performance improvement?
> > > 
> > > I've tested this patchset and my tests show *huge* speeds ups, but
> > > reaping the performance benefit depend heavily on setup and enabling
> > > the right UDP socket settings, and most importantly where the
> > > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).  
> > 
> > Right.
> > 
> > So here at Google we do not try (yet) to downgrade our expensive
> > Multiqueue Nics into dumb NICS from last decade by using a single queue
> > on them. Maybe it will happen when we can process 10Mpps per core,
> > but we are not there yet  ;)
> > 
> > So my test is using a NIC, programmed with 8 queues, on a dual-socket
> > machine. (2 physical packages)
> > 
> > 4 queues are handled by 4 cpus on socket0 (NUMA node 0)
> > 4 queues are handled by 4 cpus on socket1 (NUMA node 1)
> 
> Interesting setup, it will be good to catch cache-line bouncing and
> false-sharing, which the streak of recent patches show ;-) (Hopefully
> such setup are avoided for production).

Well, if you have 100Gbit NIC, and 2 NUMA nodes, what do you suggest
exactly, when jobs run on both nodes ?

If you suggest to remove one package, or force jobs to run on Socket0,
just because the NIC is attached to it, it wont be an option.

Most of the traffic is TCP, so RSS comes nicely here to affine traffic
on one RX queue of the NIC.

Now, if for some reason an innocent UDP socket is the target of a flood,
we need to not make all cpus blocked in a spinlock to eventually queue a
packet.

Be assured that high performance UDP servers use kernel bypass, or
SO_REUSEPORT already. My effort is not targeting these special users,
since they already have good performance.

My effort is to provide some isolation, a bit like the effort I did for
SYN flood attacks (Cpus were all spinning on listener spinlock)

> 
> 
> > So I explicitly put my poor single thread UDP application in the worst
> > condition, having skbs produced on two NUMA nodes. 
> 
> On which CPU do you place the single thread UDP application?

No matter in this case. You can either force it to run on a group of
cpu, or let the scheduler choose.

If you let the scheduler choose, then it might help the single tuple
flood attack, since the user thread will be moved on a difference cpu
than the ksoftirqd.

> 
> E.g. do you allow it to run on a CPU that also process ksoftirq?
> My experience is that performance is approx half, if ksoftirq and
> UDP-thread share a CPU (after you fixed the softirq issue).

Well, this is exactly what I said earlier. Your choices about cpu
pinning might help or might hurt in different scenarios.

> 
> 
> > Then my load generator use trafgen, with spoofed UDP source addresses,
> > like a UDP flood would use. Or typical DNS traffic, malicious or not.
> 
> I also like trafgen
>  https://github.com/netoptimizer/network-testing/tree/master/trafgen
> 
> > So I have 8 cpus all trying to queue packets in a single UDP socket.
> > 
> > Of course, a real high performance server would use 8 UDP sockets, and
> > SO_REUSEPORT with nice eBPF filter to spread the packets based on the
> > queue/cpu they arrived.
> 
> Once the ksoftirq and UDP-threads are silo'ed like that, it should
> basically correspond to the benchmarks of my single queue test,
> multiplied by the number of CPUs/UDP-threads.

Well, if one cpu is shared by the producer and consumer then packets are
hot in caches, so trying to avoid cache line misses as I did is not
really helping.

I optimized the case where we do not assume both parties run on the same
cpu. If you leave process scheduler do its job, then your throughput can
be doubled ;)

Now if for some reason you are stuck with a single CPU, this is a very
different problem, and af_packet might be better.

> 
> I think it might be a good idea (for me) to implement such a
> UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF
> filter) to demonstrate and make sure the stack scales (and every
> time we/I improve single queue performance, the numbers should multiply
> with the scaling). Maybe you already have such an example program?

Well, I do have something using SO_REUSEPORT, but not yet BPF, so not in
a state I can share at this moment.

Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)

2016-12-09 Thread Leon Romanovsky

On Thu, Dec 08, 2016 at 10:47:54PM -0800, Selvin Xavier wrote:
> 

...

>  create mode 100644 include/uapi/rdma/bnxt_re_uverbs_abi.h

Please use already established naming format for this file.
It will simplify our future integration with rdma-core library.

Thanks

➜  linux-rdma git:(master) ls -l include/uapi/rdma/*-abi.h 
-rw-r--r-- 1 leonro leonro 2291 Dec  7 13:07 include/uapi/rdma/cxgb3-abi.h
-rw-r--r-- 1 leonro leonro 2488 Dec  7 13:07 include/uapi/rdma/cxgb4-abi.h
-rw-r--r-- 1 leonro leonro 2864 Dec  7 13:07 include/uapi/rdma/mlx4-abi.h
-rw-r--r-- 1 leonro leonro 6103 Dec  8 12:52 include/uapi/rdma/mlx5-abi.h
-rw-r--r-- 1 leonro leonro 2932 Dec  7 13:07 include/uapi/rdma/mthca-abi.h
-rw-r--r-- 1 leonro leonro 3380 Dec  7 13:07 include/uapi/rdma/nes-abi.h
-rw-r--r-- 1 leonro leonro 3918 Dec  7 13:07 include/uapi/rdma/ocrdma-abi.h
-rw-r--r-- 1 leonro leonro 2559 Dec  7 13:07 include/uapi/rdma/qedr-abi.h

> 
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature

Re: stmmac DT property snps,axi_all

2016-12-09 Thread Alexandre Torgue


Hi Niklas

On 12/09/2016 10:53 AM, Niklas Cassel wrote:

On 12/09/2016 10:20 AM, Niklas Cassel wrote:

On 12/08/2016 02:36 PM, Alexandre Torgue wrote:

Hi Niklas,

On 12/05/2016 05:18 PM, Niklas Cassel wrote:

Hello Giuseppe


I'm trying to figure out what snps,axi_all is supposed to represent.

It appears that the value is saved, but never used in the code.

Looking at the register specification, I'm guessing that it represents
Address-Aligned Beats, but there is already the property snps,aal
for that.

IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus mode 
register) and reflects the aal bit in DMA bus register.
As you know we use "snps,aal" to set aal bit in DMA bus register.
So "snps,axi_all" entry seems useless. Let's see with Peppe.

Ok, I see. GMAC and GMAC4 is different here.

For GMAC4 AAL only exists in DMA_SYS_BUS_MODE.
It's not reflected anywhere else.

The code is correct in the driver.

If snps,axi_all is just created for a read-only register,
and it is currently never used in the code,
while we have snps,aal, which is correct and works,
I guess it should be ok to remove snps,axi_all.

I can cook up a patch.



Here we go :)

I will send it as a real patch once net-next reopens.


Thanks ;). Just check with Peppe next week (as he added in the past this 
property).


Regards
Alex




From defc01cb7c22611b89d9cf1fcae72544092bd62c Mon Sep 17 00:00:00 2001
From: Niklas Cassel 
Date: Fri, 9 Dec 2016 10:27:00 +0100
Subject: [PATCH net-next] net: stmmac: remove unused duplicate property
 snps,axi_all

For core revision 3.x Address-Aligned Beats is available in two registers.
The DT property snps,aal was created for AAL in the DMA bus register,
which is a read/write bit.
The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
register, which is a read only bit that reflects the value of AAL in the
DMA bus register.

Since the value of snps,axi_all is never used in the driver,
and since the property was created for a bit that is read only,
it should be safe to remove the property.

Signed-off-by: Niklas Cassel 
---
 Documentation/devicetree/bindings/net/stmmac.txt  | 1 -
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
 include/linux/stmmac.h| 1 -
 3 files changed, 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index 128da752fec9..c3d2fd480a1b 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -65,7 +65,6 @@ Optional properties:
 - snps,wr_osr_lmt: max write outstanding req. limit
 - snps,rd_osr_lmt: max read outstanding req. limit
 - snps,kbbe: do not cross 1KiB boundary.
-- snps,axi_all: align address
 - snps,blen: this is a vector of supported burst length.
 - snps,fb: fixed-burst
 - snps,mb: mixed-burst
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 082cd48db6a7..60ba8993c650 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -121,7 +121,6 @@ static struct stmmac_axi *stmmac_axi_setup(struct 
platform_device *pdev)
 axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
 axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm");
 axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe");
-axi->axi_axi_all = of_property_read_bool(np, "snps,axi_all");
 axi->axi_fb = of_property_read_bool(np, "snps,axi_fb");
 axi->axi_mb = of_property_read_bool(np, "snps,axi_mb");
 axi->axi_rb =  of_property_read_bool(np, "snps,axi_rb");
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 266dab9ad782..889e0e9a3f1c 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -103,7 +103,6 @@ struct stmmac_axi {
 u32 axi_wr_osr_lmt;
 u32 axi_rd_osr_lmt;
 bool axi_kbbe;
-bool axi_axi_all;
 u32 axi_blen[AXI_BLEN];
 bool axi_fb;
 bool axi_mb;

Re: [PATCH v2 net-next 0/4] udp: receive path optimizations

2016-12-09 Thread Jesper Dangaard Brouer

On Thu, 08 Dec 2016 13:13:15 -0800
Eric Dumazet  wrote:

> On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> > On Thu,  8 Dec 2016 09:38:55 -0800
> > Eric Dumazet  wrote:
> >   
> > > This patch series provides about 100 % performance increase under flood.  
> > >  
> > 
> > Could you please explain a bit more about what kind of testing you are
> > doing that can show 100% performance improvement?
> > 
> > I've tested this patchset and my tests show *huge* speeds ups, but
> > reaping the performance benefit depend heavily on setup and enabling
> > the right UDP socket settings, and most importantly where the
> > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).  
> 
> Right.
> 
> So here at Google we do not try (yet) to downgrade our expensive
> Multiqueue Nics into dumb NICS from last decade by using a single queue
> on them. Maybe it will happen when we can process 10Mpps per core,
> but we are not there yet  ;)
> 
> So my test is using a NIC, programmed with 8 queues, on a dual-socket
> machine. (2 physical packages)
> 
> 4 queues are handled by 4 cpus on socket0 (NUMA node 0)
> 4 queues are handled by 4 cpus on socket1 (NUMA node 1)

Interesting setup, it will be good to catch cache-line bouncing and
false-sharing, which the streak of recent patches show ;-) (Hopefully
such setup are avoided for production).

> So I explicitly put my poor single thread UDP application in the worst
> condition, having skbs produced on two NUMA nodes. 

On which CPU do you place the single thread UDP application?

E.g. do you allow it to run on a CPU that also process ksoftirq?
My experience is that performance is approx half, if ksoftirq and
UDP-thread share a CPU (after you fixed the softirq issue).

> Then my load generator use trafgen, with spoofed UDP source addresses,
> like a UDP flood would use. Or typical DNS traffic, malicious or not.

I also like trafgen
 https://github.com/netoptimizer/network-testing/tree/master/trafgen

> So I have 8 cpus all trying to queue packets in a single UDP socket.
> 
> Of course, a real high performance server would use 8 UDP sockets, and
> SO_REUSEPORT with nice eBPF filter to spread the packets based on the
> queue/cpu they arrived.

Once the ksoftirq and UDP-threads are silo'ed like that, it should
basically correspond to the benchmarks of my single queue test,
multiplied by the number of CPUs/UDP-threads.

I think it might be a good idea (for me) to implement such a
UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF
filter) to demonstrate and make sure the stack scales (and every
time we/I improve single queue performance, the numbers should multiply
with the scaling). Maybe you already have such an example program?

> In the case you have one cpu that you need to share between ksoftirq and
> all user threads, then your test results depend on process scheduler
> decisions more than anything we can code in network land.

Yes, also my experience, the scheduler have large influence.

> It is actually easy for user space to get more than 50% of the cycles,
> and 'starve' ksoftirqd.

FYI, Paolo recently added an option for parsing of pktgen payload in
the udp_sink.c program, this way we can simulate the app doing something.

I've started testing with 4 CPUs doing ksoftirq, multiple flows
(pktgen_sample04_many_flows.sh) and then increasing adding udp_sink
--reuse-port programs, on other 4 CPUs, and it looks like it scales
nicely :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[PATCH net-next] net: skb_condense() can also deal with empty skbs

2016-12-09 Thread Eric Dumazet

From: Eric Dumazet 

It seems attackers can also send UDP packets with no payload at all.

skb_condense() can still be a win in this case.

It will be possible to replace the custom code in tcp_add_backlog()
to get full benefit from skb_condense()

Signed-off-by: Eric Dumazet 
---
 net/core/skbuff.c |   22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 
84151cf40aebb973bad5bee3ee4be0758084d83c..b1451e66d570269252ce628b2dc1714b860e1ca4
 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4946,16 +4946,20 @@ EXPORT_SYMBOL(pskb_extract);
  */
 void skb_condense(struct sk_buff *skb)
 {
-   if (!skb->data_len ||
-   skb->data_len > skb->end - skb->tail ||
-   skb_cloned(skb))
-   return;
-
-   /* Nice, we can free page frag(s) right now */
-   __pskb_pull_tail(skb, skb->data_len);
+   if (skb->data_len) {
+   if (skb->data_len > skb->end - skb->tail ||
+   skb_cloned(skb))
+   return;
 
-   /* Now adjust skb->truesize, since __pskb_pull_tail() does
-* not do this.
+   /* Nice, we can free page frag(s) right now */
+   __pskb_pull_tail(skb, skb->data_len);
+   }
+   /* At this point, skb->truesize might be over estimated,
+* because skb had a fragment, and fragments do not tell
+* their truesize.
+* When we pulled its content into skb->head, fragment
+* was freed, but __pskb_pull_tail() could not possibly
+* adjust skb->truesize, not knowing the frag truesize.
 */
skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 }

Re: Synopsys Ethernet QoS

2016-12-09 Thread Joao Pinto

Às 3:41 PM de 12/9/2016, David Miller escreveu:
> From: Joao Pinto 
> Date: Fri, 9 Dec 2016 15:36:38 +
> 
>> Of course, I started a general discussion about the subject and
>> those were the conclusions, but I would like to know if you as the
>> subsystem maintainer also support the approach or have any
>> suggestion.
> 
> Generally, I support whatever the interested parties agree to.
> 
> But one thing I am against is changing the driver name for existing
> users.  If an existing chip is supported by the stmmac driver for
> existing users, they should still continue to use the "stmmac" driver.
> 
> Therefore, if consolidation changes the driver module name for
> existing users, then that is not a good plan at all.
> 

Of course, 100% with you! Retro-compatibility for existing drivers is a must
have. The consolidation is going to be done with extreme careful.

Joao

Re: Synopsys Ethernet QoS

2016-12-09 Thread David Miller

From: Joao Pinto 
Date: Fri, 9 Dec 2016 15:36:38 +

> Of course, I started a general discussion about the subject and
> those were the conclusions, but I would like to know if you as the
> subsystem maintainer also support the approach or have any
> suggestion.

Generally, I support whatever the interested parties agree to.

But one thing I am against is changing the driver name for existing
users.  If an existing chip is supported by the stmmac driver for
existing users, they should still continue to use the "stmmac" driver.

Therefore, if consolidation changes the driver module name for
existing users, then that is not a good plan at all.

Re: Synopsys Ethernet QoS

2016-12-09 Thread Joao Pinto

Hi David,

Of course, I started a general discussion about the subject and those were the
conclusions, but I would like to know if you as the subsystem maintainer also
support the approach or have any suggestion.

Thanks,
Joao

Às 3:33 PM de 12/9/2016, David Miller escreveu:
> From: Joao Pinto 
> Date: Fri, 9 Dec 2016 11:29:02 +
> 
>> Dear David Miller,
>  ...
>> I would like to know if you support this plan.
> 
> This is not how this works.
> 
> You need to discuss and work out a plan with the other people
> with a direct interest in the existing drivers and maintainence.
> 
> Not me.
>

Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-09 Thread Bart Van Assche

On 12/08/16 22:40, Madhani, Himanshu wrote:
> We’ll take a look and send patches to resolve these warnings.

Thanks!

Bart.

Re: Synopsys Ethernet QoS

2016-12-09 Thread David Miller

From: Joao Pinto 
Date: Fri, 9 Dec 2016 11:29:02 +

> Dear David Miller,
 ...
> I would like to know if you support this plan.

This is not how this works.

You need to discuss and work out a plan with the other people
with a direct interest in the existing drivers and maintainence.

Not me.

Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)

2016-12-09 Thread David Miller

From: Selvin Xavier 
Date: Thu,  8 Dec 2016 22:47:54 -0800

> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. 
> This driver is dependent on the bnxt_en NIC driver and is 
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
> 
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.

If this is targetted at the net-next tree, it is too late as I've
closed the net-next tree two nights ago.

Please resubmit this after the upcoming merge window closes.

Thanks.

Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf

2016-12-09 Thread Daniel Borkmann


Hi Arnaldo,

On 12/09/2016 04:09 PM, Arnaldo Carvalho de Melo wrote:

Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu:

(Was "libbpf: Synchronize implementations")

Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
samples/bpf/libbpf.[ch].

---
v3: Add ack for first patch.
 Split out second patch from v2 into separate changes for remaining diff.
 Add patches to switch samples/bpf over to using tools/lib/.
v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
 Don't shift non-bpf code into libbpf.
 Drop the patch to synchronize ELF definitions with tc.
v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
 First post.


Thanks, applied after addressing the -I$(objtree) issue raised by Wang,


[ Sorry for late reply. ]

First of all, glad to see us getting rid of the duplicate lib eventually! :)

Please note that this might result in hopefully just a minor merge issue
with net-next. Looks like patch 4/7 touches test_maps.c and test_verifier.c,
which moved to a new bpf selftest suite [1] this net-next cycle. Seems it's
just log buffer and some renames there, which can be discarded for both
files sitting in selftests.

Thanks,
Daniel

  [1] 
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/tools/testing/selftests/bpf

Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 06:24 -0800, Eric Dumazet wrote:

> It looks that you want a seqcount, even on 64bit arches,
> so that CPU 2 can restart its loop, and more importantly you need
> to not accumulate the values you read, because they might be old/invalid.

Untested patch to give general idea. I can polish it a bit later today.

 net/netfilter/nft_counter.c |   59 +-
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 
f6a02c5071c2aeafca7635da3282a809aa04d6ab..57ed95b024473a2aa76298fe5bb5013bf709801b
 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -31,18 +31,25 @@ struct nft_counter_percpu_priv {
struct nft_counter_percpu __percpu *counter;
 };
 
+static DEFINE_PER_CPU(seqcount_t, nft_counter_seq);
+
 static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
   struct nft_regs *regs,
   const struct nft_pktinfo *pkt)
 {
struct nft_counter_percpu *this_cpu;
+   seqcount_t *myseq;
 
local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
-   u64_stats_update_begin(_cpu->syncp);
+   myseq = this_cpu_ptr(_counter_seq);
+
+   write_seqcount_begin(myseq);
+
this_cpu->counter.bytes += pkt->skb->len;
this_cpu->counter.packets++;
-   u64_stats_update_end(_cpu->syncp);
+
+   write_seqcount_end(myseq);
local_bh_enable();
 }
 
@@ -110,52 +117,30 @@ static void nft_counter_fetch(struct nft_counter_percpu 
__percpu *counter,
 
memset(total, 0, sizeof(*total));
for_each_possible_cpu(cpu) {
+   seqcount_t *seqp = per_cpu_ptr(_counter_seq, cpu);
+
cpu_stats = per_cpu_ptr(counter, cpu);
do {
-   seq = u64_stats_fetch_begin_irq(_stats->syncp);
+   seq = read_seqcount_begin(seqp);
bytes   = cpu_stats->counter.bytes;
packets = cpu_stats->counter.packets;
-   } while (u64_stats_fetch_retry_irq(_stats->syncp, seq));
+   } while (read_seqcount_retry(seqp, seq));
 
total->packets += packets;
total->bytes += bytes;
}
 }
 
-static u64 __nft_counter_reset(u64 *counter)
-{
-   u64 ret, old;
-
-   do {
-   old = *counter;
-   ret = cmpxchg64(counter, old, 0);
-   } while (ret != old);
-
-   return ret;
-}
-
 static void nft_counter_reset(struct nft_counter_percpu __percpu *counter,
  struct nft_counter *total)
 {
struct nft_counter_percpu *cpu_stats;
-   u64 bytes, packets;
-   unsigned int seq;
-   int cpu;
 
-   memset(total, 0, sizeof(*total));
-   for_each_possible_cpu(cpu) {
-   bytes = packets = 0;
-
-   cpu_stats = per_cpu_ptr(counter, cpu);
-   do {
-   seq = u64_stats_fetch_begin_irq(_stats->syncp);
-   packets += 
__nft_counter_reset(_stats->counter.packets);
-   bytes   += 
__nft_counter_reset(_stats->counter.bytes);
-   } while (u64_stats_fetch_retry_irq(_stats->syncp, seq));
-
-   total->packets += packets;
-   total->bytes += bytes;
-   }
+   local_bh_disable();
+   cpu_stats = this_cpu_ptr(counter);
+   cpu_stats->counter.packets -= total->packets;
+   cpu_stats->counter.bytes -= total->bytes;
+   local_bh_enable();
 }
 
 static int nft_counter_do_dump(struct sk_buff *skb,
@@ -164,10 +149,9 @@ static int nft_counter_do_dump(struct sk_buff *skb,
 {
struct nft_counter total;
 
+   nft_counter_fetch(priv->counter, );
if (reset)
nft_counter_reset(priv->counter, );
-   else
-   nft_counter_fetch(priv->counter, );
 
if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes),
 NFTA_COUNTER_PAD) ||
@@ -285,7 +269,10 @@ static struct nft_expr_type nft_counter_type __read_mostly 
= {
 
 static int __init nft_counter_module_init(void)
 {
-   int err;
+   int err, cpu;
+
+   for_each_possible_cpu(cpu)
+   seqcount_init(per_cpu_ptr(_counter_seq, cpu));
 
err = nft_register_obj(_counter_obj);
if (err < 0)

Re: [PATCH net-next 0/2] Initial driver for Synopsys DWC XLGMAC

2016-12-09 Thread Carlos Palminha

Hi Jie,

I don't think we have the need to create the "dwc" subdirectory under 
"synopsys".
Its preferable to have them directly under drivers/net/ethernet/synopsys.

Regards,
C.Palminha

On 07-12-2016 03:57, Jie Deng wrote:
> This series provides the support for 25/40/50/100 GbE
> devices using Synopsys DWC Enterprise Ethernet (XLGMAC).
> 
> The first patch adds support for Synopsys XLGMII.
> The second patch provides the initial driver for Synopsys XLGMAC
> 
> The driver has three layers by refactoring AMD XGBE.
> 
> dwc-eth-xxx.x
>   The DWC ethernet core layer (DWC ECL). This layer contains codes
> can be shared by different DWC series ethernet cores
> 
> dwc-xxx.x (e.g. dwc-xlgmac.c)
>   The DWC MAC HW adapter layer (DWC MHAL). This layer contains
> special support for a specific MAC. e.g. currently, XLGMAC.
> 
> xxx-xxx-pci.c xxx-xxx-plat.c (e.g. dwc-xlgmac-pci.c)
>   The glue adapter layer (GAL). Vendors who adopt Synopsys Etherent
> cores can develop a glue driver for their platform.
> 
> Jie Deng (2):
>   net: phy: add extension of phy-mode for XLGMII
>   net: ethernet: Initial driver for Synopsys DWC XLGMAC
> 
>  Documentation/devicetree/bindings/net/ethernet.txt |1 +
>  MAINTAINERS|6 +
>  drivers/net/ethernet/synopsys/Kconfig  |2 +
>  drivers/net/ethernet/synopsys/Makefile |1 +
>  drivers/net/ethernet/synopsys/dwc/Kconfig  |   37 +
>  drivers/net/ethernet/synopsys/dwc/Makefile |9 +
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c|  228 ++
>  .../net/ethernet/synopsys/dwc/dwc-eth-debugfs.c|  328 +++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c   |  715 +
>  .../net/ethernet/synopsys/dwc/dwc-eth-ethtool.c|  567 
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c | 3098 
> 
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c   |  252 ++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c| 2319 +++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c|  216 ++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h | 1115 +++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth.h|  738 +
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c |  538 
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c |  135 +
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h |   85 +
>  include/linux/phy.h|3 +
>  20 files changed, 10393 insertions(+)
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/Kconfig
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/Makefile
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-debugfs.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ethtool.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth.h
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h
>

Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf

2016-12-09 Thread Arnaldo Carvalho de Melo

Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu:
> (Was "libbpf: Synchronize implementations")
> 
> Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
> samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
> samples/bpf/libbpf.[ch].
> 
> ---
> v3: Add ack for first patch.
> Split out second patch from v2 into separate changes for remaining diff.
> Add patches to switch samples/bpf over to using tools/lib/.
> v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
> Don't shift non-bpf code into libbpf.
> Drop the patch to synchronize ELF definitions with tc.
> v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
> First post.

Thanks, applied after addressing the -I$(objtree) issue raised by Wang,

- Arnaldo

Re: [PATCHv3 perf/core 6/7] samples/bpf: Remove perf_event_open() declaration

2016-12-09 Thread Arnaldo Carvalho de Melo

Em Thu, Dec 08, 2016 at 06:46:19PM -0800, Joe Stringer escreveu:
> This declaration was made in samples/bpf/libbpf.c for convenience, but
> there's already one in tools/perf/perf-sys.h. Reuse that one.
> 
> Signed-off-by: Joe Stringer 
> ---
> v3: First post.
> ---
>  samples/bpf/Makefile| 3 ++-
>  samples/bpf/bpf_load.c  | 3 ++-
>  samples/bpf/libbpf.c| 7 ---
>  samples/bpf/libbpf.h| 3 ---
>  samples/bpf/sampleip_user.c | 3 ++-
>  samples/bpf/trace_event_user.c  | 9 +
>  samples/bpf/trace_output_user.c | 3 ++-
>  samples/bpf/tracex6_user.c  | 3 ++-
>  8 files changed, 15 insertions(+), 19 deletions(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index c8f7ed37b2de..0adc47e67e65 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -92,7 +92,8 @@ always += test_current_task_under_cgroup_kern.o
>  always += trace_event_kern.o
>  always += sampleip_kern.o
>  
> -HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/
> +HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/ \
> +   -I$(objtree)/tools/include -I$(objtree)/tools/perf

Switching these to $(srctree) as well, to support building it like:

  make -j4 O=../build/v4.9.0-rc8+ samples/bpf/

>  
>  HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
>  HOSTLOADLIBES_fds_example += -lelf
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index f8e3c58a0897..d683bd278171 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define DEBUGFS "/sys/kernel/debug/tracing/"
>  
> @@ -168,7 +169,7 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
>   id = atoi(buf);
>   attr.config = id;
>  
> - efd = perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> + efd = sys_perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 
> 0);
>   if (efd < 0) {
>   printf("event %d fd %d err %s\n", id, efd, strerror(errno));
>   return -1;
> diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
> index d9af876b4a2c..bee473a494f1 100644
> --- a/samples/bpf/libbpf.c
> +++ b/samples/bpf/libbpf.c
> @@ -34,10 +34,3 @@ int open_raw_sock(const char *name)
>  
>   return sock;
>  }
> -
> -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
> - int group_fd, unsigned long flags)
> -{
> - return syscall(__NR_perf_event_open, attr, pid, cpu,
> -group_fd, flags);
> -}
> diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
> index cc815624aacf..09aedc320009 100644
> --- a/samples/bpf/libbpf.h
> +++ b/samples/bpf/libbpf.h
> @@ -188,7 +188,4 @@ struct bpf_insn;
>  /* create RAW socket and bind to interface 'name' */
>  int open_raw_sock(const char *name);
>  
> -struct perf_event_attr;
> -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
> - int group_fd, unsigned long flags);
>  #endif
> diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
> index 09ab620b324c..476a11947180 100644
> --- a/samples/bpf/sampleip_user.c
> +++ b/samples/bpf/sampleip_user.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define DEFAULT_FREQ 99
>  #define DEFAULT_SECS 5
> @@ -50,7 +51,7 @@ static int sampling_start(int *pmu_fd, int freq)
>   };
>  
>   for (i = 0; i < nr_cpus; i++) {
> - pmu_fd[i] = perf_event_open(_sample_attr, -1 /* pid */, i,
> + pmu_fd[i] = sys_perf_event_open(_sample_attr, -1 /* pid */, 
> i,
>   -1 /* group_fd */, 0 /* flags */);
>   if (pmu_fd[i] < 0) {
>   fprintf(stderr, "ERROR: Initializing perf sampling\n");
> diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
> index de8fd0266d78..ccb0cba8324a 100644
> --- a/samples/bpf/trace_event_user.c
> +++ b/samples/bpf/trace_event_user.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define SAMPLE_FREQ 50
>  
> @@ -126,9 +127,9 @@ static void test_perf_event_all_cpu(struct 
> perf_event_attr *attr)
>  
>   /* open perf_event on all cpus */
>   for (i = 0; i < nr_cpus; i++) {
> - pmu_fd[i] = perf_event_open(attr, -1, i, -1, 0);
> + pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
>   if (pmu_fd[i] < 0) {
> - printf("perf_event_open failed\n");
> + printf("sys_perf_event_open failed\n");
>   goto all_cpu_err;
>   }
>   assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 
> 0);
> @@ -147,9 +148,9 @@ static void test_perf_event_task(struct perf_event_attr 
>

Re: 4.9.0-rc8: tg3 dead after resume

2016-12-09 Thread Billy Shuman

On Thu, Dec 8, 2016 at 4:03 AM, Siva Reddy Kallam
 wrote:
> On Thu, Dec 8, 2016 at 12:14 AM, Billy Shuman  wrote:
>> On Wed, Dec 7, 2016 at 12:37 PM, Michael Chan  
>> wrote:
>>> On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman  wrote:
 After resume on 4.9.0-rc8 tg3 is dead.

 In logs I see:
 kernel: tg3 :44:00.0: phy probe failed, err -19
 kernel: tg3 :44:00.0: Problem fetching invariants of chip, aborting
>>>
>>> -19 is -ENODEV which means tg3 cannot read the PHY ID.
>>>
>>> If it's a true suspend/resume operation, the driver does not have to
>>> go through probe during resume.  Please explain how you do
>>> suspend/resume.
>>>
>>
>> Sorry my previous message was accidentally sent to early.
>>
>> I used systemd (systemctl suspend) to suspend.
>>
> We need more information to proceed further.
> Without suspend, Are you able to use the tg3 port?

Yes the port works fine without suspend.

> Which Broadcom card are you having in laptop?

The nic is a NetXtreme BCM57762 Gigabit Ethernet PCIe in a thunderbolt3 dock.

> Please provide complete tg3 specific logs in dmesg.
>

[   32.084010] tg3.c:v3.137 (May 11, 2014)
[   32.124695] tg3 :44:00.0 eth0: Tigon3 [partno(BCM957762) rev
57766001] (PCI Express) MAC address 98:e7:f4:8b:13:19
[   32.124698] tg3 :44:00.0 eth0: attached PHY is 57765
(10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   32.124699] tg3 :44:00.0 eth0: RXcsums[1] LinkChgREG[0]
MIirq[0] ASF[0] TSOcap[1]
[   32.124700] tg3 :44:00.0 eth0: dma_rwctrl[0001] dma_mask[64-bit]
[   32.219764] tg3 :44:00.0 enp68s0: renamed from eth0
[   36.219245] tg3 :44:00.0 enp68s0: Link is up at 1000 Mbps, full duplex
[   36.219250] tg3 :44:00.0 enp68s0: Flow control is on for TX and on for RX
[   36.219251] tg3 :44:00.0 enp68s0: EEE is disabled

after resume
[   92.292838] tg3 :44:00.0 enp68s0: No firmware running
[   93.521744] tg3 :44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=
[  106.704655] tg3 :44:00.0 enp68s0: Link is down
[  108.370356] tg3 :44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=

after rmmod, modprobe
[  570.933636] tg3 :44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=
[  604.847215] tg3.c:v3.137 (May 11, 2014)
[  605.010075] tg3 :44:00.0: phy probe failed, err -19
[  605.010077] tg3 :44:00.0: Problem fetching invariants of chip, aborting




>>> Did this work before?  There has been very few changes to tg3 recently.
>>>
>>
>> This is a new laptop for me, but the same behavior is seen on 4.4.36 and 
>> 4.8.12.
>>

 rmmod and modprobe does not fix the problem only a reboot resolves the 
 issue.

 Billy

Re: [PATCHv3 perf/core 3/7] tools lib bpf: Add flags to bpf_create_map()

2016-12-09 Thread Arnaldo Carvalho de Melo

Em Fri, Dec 09, 2016 at 11:36:18AM +0800, Wangnan (F) escreveu:
> 
> 
> On 2016/12/9 10:46, Joe Stringer wrote:
> > The map_flags argument to bpf_create_map() was previously not exposed.
> > By exposing it, users can access flags such as whether or not to
> > preallocate the map.
> > 
> > Signed-off-by: Joe Stringer 
> 
> Please mention commit 6c90598174322b029e40dd84a4eb01f56afe in
> commit message:
> 
> Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces
> map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
> parameter in libbpf.

will do it, thanks.

- Arnaldo
 
> Acked-by: Wang Nan 
> 
> > ---
> > v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf".
> > ---
> >   tools/lib/bpf/bpf.c| 3 ++-
> >   tools/lib/bpf/bpf.h| 2 +-
> >   tools/lib/bpf/libbpf.c | 3 ++-
> >   3 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 89e8e8e5b60e..d0afb26c2e0f 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -54,7 +54,7 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
> >   }
> >   int bpf_create_map(enum bpf_map_type map_type, int key_size,
> > -  int value_size, int max_entries)
> > +  int value_size, int max_entries, __u32 map_flags)
> >   {
> > union bpf_attr attr;
> > @@ -64,6 +64,7 @@ int bpf_create_map(enum bpf_map_type map_type, int 
> > key_size,
> > attr.key_size = key_size;
> > attr.value_size = value_size;
> > attr.max_entries = max_entries;
> > +   attr.map_flags = map_flags;
> > return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
> >   }
> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> > index 61130170a6ad..7fcdce16fd62 100644
> > --- a/tools/lib/bpf/bpf.h
> > +++ b/tools/lib/bpf/bpf.h
> > @@ -24,7 +24,7 @@
> >   #include 
> >   int bpf_create_map(enum bpf_map_type map_type, int key_size, int 
> > value_size,
> > -  int max_entries);
> > +  int max_entries, __u32 map_flags);
> >   /* Recommend log buffer size */
> >   #define BPF_LOG_BUF_SIZE 65536
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 2e974593f3e8..84e6b35da4bd 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -854,7 +854,8 @@ bpf_object__create_maps(struct bpf_object *obj)
> > *pfd = bpf_create_map(def->type,
> >   def->key_size,
> >   def->value_size,
> > - def->max_entries);
> > + def->max_entries,
> > + 0);
> > if (*pfd < 0) {
> > size_t j;
> > int err = *pfd;
>

Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects

2016-12-09 Thread Eric Dumazet

On Fri, 2016-12-09 at 11:24 +0100, Pablo Neira Ayuso wrote:
> Hi Paul,

Hi Pablo

Given that bytes/packets counters are modified without cmpxchg64()  :

static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
   struct nft_regs *regs,
   const struct nft_pktinfo *pkt)
{
struct nft_counter_percpu *this_cpu;

local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
u64_stats_update_begin(_cpu->syncp);
this_cpu->counter.bytes += pkt->skb->len;
this_cpu->counter.packets++;
u64_stats_update_end(_cpu->syncp);
local_bh_enable();
}

It means that the cmpxchg64() used to clear the stats is not good enough.

It does not help to make sure stats are properly cleared.

On 64 bit, the ->syncp is not there, so the nft_counter_reset() might
not see that a bytes or packets counter was modified by another cpu.


CPU 1  CPU 2

LOAD PTR->BYTES into REG_A old = *counter;
REG_A += skb->len;
   cmpxchg64(counter, old, 0);
PTR->BYTES = REG_A

It looks that you want a seqcount, even on 64bit arches,
so that CPU 2 can restart its loop, and more importantly you need
to not accumulate the values you read, because they might be old/invalid.

Another way would be to not use cmpxchg64() at all.
Way to expensive in fast path !

The percpu value would never be modified by an other cpu than the owner.

You need a per cpu seqcount, no need to add a syncp per nft percpu counter.


static DEFINE_PERCPU(seqcount_t, nft_pcpu_seq);

static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
   struct nft_regs *regs,
   const struct nft_pktinfo *pkt)
{
struct nft_counter_percpu *this_cpu;
seqcount_t *myseq;

local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
myseq = this_cpu_ptr(_pcpu_seq);

write_seqcount_begin(myseq);

this_cpu->counter.bytes += pkt->skb->len;
this_cpu->counter.packets++;

write_seqcount_end(myseq);

local_bh_enable();
}

Thanks !

Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-09 Thread Tejun Heo

Hello, John.

On Thu, Dec 08, 2016 at 09:39:38PM -0800, John Stultz wrote:
> So just to clarify the discussion for my purposes and make sure I
> understood, per-cgroup CAP rules was not desired, and instead we
> should either utilize an existing cap (are there still objections to
> CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
> bring back the older CAP_CGROUP_MIGRATE patch).

Let's create a new one.  It looks to be a bit too different to share
with an existing one.

> Tejun: Do you have a more finished version of your patch that I should
> add my changes on top of?

Oh, just submit the patch on top of the current for-next.  I can queue
mine on top of yours.  They are mostly orthogonal.

Thanks.

-- 
tejun

[PATCH] net: smsc911x: back out silently on probe deferrals

2016-12-09 Thread Linus Walleij

When trying to get a regulator we may get deferred and we see
this noise:

smsc911x 1b80.ethernet-ebi2 (unnamed net_device) (uninitialized):
   couldn't get regulators -517

Then the driver continues anyway. Which means that the regulator
may not be properly retrieved and reference counted, and may be
switched off in case noone else is using it.

Fix this by returning silently on deferred probe and let the
system work it out.

Cc: Jeremy Linton 
Signed-off-by: Linus Walleij 
---
 drivers/net/ethernet/smsc/smsc911x.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 86b7c04e3738..c492e4ffd9e7 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -442,9 +442,16 @@ static int smsc911x_request_resources(struct 
platform_device *pdev)
ret = regulator_bulk_get(>dev,
ARRAY_SIZE(pdata->supplies),
pdata->supplies);
-   if (ret)
+   if (ret) {
+   /*
+* Retry on deferrals, else just report the error
+* and try to continue.
+*/
+   if (ret == -EPROBE_DEFER)
+   return ret;
netdev_err(ndev, "couldn't get regulators %d\n",
ret);
+   }
 
/* Request optional RESET GPIO */
pdata->reset_gpiod = devm_gpiod_get_optional(>dev,
-- 
2.7.4

Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-12-09 Thread Mark Lord

On 16-12-08 10:23 PM, Hayes Wang wrote:
> Mark Lord 
> 
> I find an issue about autosuspend, and it may result in the same
> problem with you. I don't sure if this is helpful to you, because
> it only occurs when enabling the autosuspend.

Thanks.  I am using ASIX adapters now.

I did try the latest 4.9-rc8, and 4.8.12 kernels with the r8152 dongle 
yesterday,
in hope that perhaps the many EHCI fixes from those kernels might help out.

The dongle was unusable with those newer kernels.
Most of the time it failed with "Get ether addr fail\n" at startup.

On the occasions where it got past that point, it often failed
the DHCP negotiation, but this looks more like a bug elsewhere in
the kernel, possibly racing against initialization of the random
number generators.  Adding a 2-second sleep the the r8151 probe
function made this error mostly go away.

Cheers
-- 
Mark Lord

[PATCH] net:ethernet:samsung:initialize cur_rx_qnum

2016-12-09 Thread Rayagond Kokatanur

This patch initialize the cur_rx_qnum upon occurence of rx interrupt,
without this initialization driver will not work with multiple rx queues 
configurations.

NOTE: This patch is not tested on actual hw.
---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c 
b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
index ea44a24..580a1a4 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
@@ -1681,6 +1681,7 @@ static irqreturn_t sxgbe_rx_interrupt(int irq, void 
*dev_id)
struct sxgbe_rx_queue *rxq = (struct sxgbe_rx_queue *)dev_id;
struct sxgbe_priv_data *priv = rxq->priv_ptr;
 
+   priv->cur_rx_qnum = rxq->queue_no;
/* get the channel status */
status = priv->hw->dma->rx_dma_int_status(priv->ioaddr, rxq->queue_no,
  >xstats);
-- 
1.9.1

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Richard Guy Briggs

On 2016-12-09 12:53, Dmitry Vyukov wrote:
> On Fri, Dec 9, 2016 at 12:48 PM, Richard Guy Briggs  wrote:
> > On 2016-12-09 11:49, Dmitry Vyukov wrote:
> >> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs  wrote:
> >> > On 2016-11-29 23:52, Richard Guy Briggs wrote:
> >> > I tried a quick compile attempt on the test case (I assume it is a
> >> > socket fuzzer) and get the following compile error:
> >> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
> >> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
> >> > : warning: this is the location of the previous definition
> >> > socket_fuzz.c: In function ‘segv_handler’:
> >> > socket_fuzz.c:89: warning: implicit declaration of function 
> >> > ‘__atomic_load_n’
> >> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in 
> >> > this function)
> >> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only 
> >> > once
> >> > socket_fuzz.c:89: error: for each function it appears in.)
> >> > socket_fuzz.c: In function ‘loop’:
> >> > socket_fuzz.c:280: warning: unused variable ‘errno0’
> >> > socket_fuzz.c: In function ‘test’:
> >> > socket_fuzz.c:303: warning: implicit declaration of function 
> >> > ‘__atomic_fetch_add’
> >> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in 
> >> > this function)
> >> > socket_fuzz.c:303: warning: implicit declaration of function 
> >> > ‘__atomic_fetch_sub’
> >>
> >> -std=gnu99 should help
> >> ignore warnings
> >
> > I got a little further, left with "__ATOMIC_RELAXED undeclared", 
> > "__ATOMIC_SEQ_CST
> > undeclared" under gcc 4.4.7-16.
> >
> > gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'"
> 
> add -lrt

Ok, that helped.  Thanks!

> > What compiler version do you recommend?
> 
> 6.x sounds reasonable
> 4.4 branch is 7.5 years old, surprised that it does not disintegrate
> into dust yet :)

  These are under RHEL6...  so there are updates to them, but yeah, they are 
old.

> >> >> - RGB
> >> >
> >> > - RGB
> >
> > - RGB
> >
> > --
> > Richard Guy Briggs 
> > Kernel Security Engineering, Base Operating Systems, Red Hat
> > Remote, Ottawa, Canada
> > Voice: +1.647.777.2635, Internal: (81) 32635

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

Re: [PATCH V2 21/22] bnxt_re: Add QP event handling

2016-12-09 Thread Sergei Shtylyov


Hello!

On 12/9/2016 9:48 AM, Selvin Xavier wrote:


Implements callback handler for processing affiliated Async events of a QP.
This patch also implements the control path command completion handling.

Signed-off-by: Eddie Wai 
Signed-off-by: Devesh Sharma 
Signed-off-by: Somnath Kotur 
Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c 
b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
index 5b71acd..3e6bb3f 100644
--- a/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
@@ -246,6 +246,46 @@ static int bnxt_qplib_process_func_event(struct 
bnxt_qplib_rcfw *rcfw,
return 0;
 }

+static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
+  struct creq_qp_event *qp_event)
+{
+   struct bnxt_qplib_crsq *crsq = >crsq;
+   struct bnxt_qplib_hwq *cmdq = >cmdq;
+   struct bnxt_qplib_crsqe *crsqe;
+   u16 cbit, cookie, blocked = 0;
+   unsigned long flags;
+   u32 sw_cons;
+
+   switch (qp_event->event) {
+   case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
+   break;
+   default:
+   {
+   /* Command Response */
+   spin_lock_irqsave(>lock, flags);
+   sw_cons = HWQ_CMP(crsq->cons, crsq);
+   crsqe = >crsq[sw_cons];
+   crsq->cons++;
+   memcpy(>qp_event, qp_event, sizeof(crsqe->qp_event));
+
+   cookie = le16_to_cpu(crsqe->qp_event.cookie);
+   blocked = cookie & RCFW_CMD_IS_BLOCKING;
+   cookie &= RCFW_MAX_COOKIE_VALUE;
+   cbit = cookie % RCFW_MAX_OUTSTANDING_CMD;
+   if (!test_and_clear_bit(cbit, rcfw->cmdq_bitmap))
+   dev_warn(>pdev->dev,
+"QPLIB: CMD bit %d was not requested", cbit);
+
+   cmdq->cons += crsqe->req_size;
+   spin_unlock_irqrestore(>lock, flags);
+   if (!blocked)
+   wake_up(>waitq);
+   break;
+   }
+   }


   Hum, strange indentation... Not seeing why you need {} in the *default* at 
all...



+   return 0;
+}
+
 /* SP - CREQ Completion handlers */
 static void bnxt_qplib_service_creq(unsigned long data)
 {
@@ -269,6 +309,15 @@ static void bnxt_qplib_service_creq(unsigned long data)
type = creqe->type & CREQ_BASE_TYPE_MASK;
switch (type) {
case CREQ_BASE_TYPE_QP_EVENT:
+   if (!bnxt_qplib_process_qp_event
+   (rcfw, (struct creq_qp_event *)creqe))
+   rcfw->creq_qp_event_processed++;
+   else {


   CodingStyle: there should be {} used in all branches if it's used on at 
least branch of *if*.



+   dev_warn(>pdev->dev, "QPLIB: crsqe with");
+   dev_warn(>pdev->dev,
+"QPLIB: type = 0x%x not handled",
+type);
+   }
break;
case CREQ_BASE_TYPE_FUNC_EVENT:
if (!bnxt_qplib_process_func_event


MBR, Sergei

Re: [PATCH] ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

2016-12-09 Thread Sergei Shtylyov


Hello!

On 12/9/2016 6:08 AM, Zheng Li wrote:


From: zheng li 

There is an inconsitent conditional judgement in __ip_append_data and


   Inconsistent.


ip_finish_output functions, the variable length in __ip_append_data just
include the length of applicatoin's payload and udp header, don't include


   Application.


the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That cuase some particular applicatoin's udp payload whose length is


   Causes, application.


between (MTU - IP Header) and MTU were framented by ip_fragment even


   Fragmented.


though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li 

[...]

MBR, Sergei

pull-request: mac80211-next 2016-12-09

2016-12-09 Thread Johannes Berg

Hi Dave,

Closing net-next caught me by surprise, so I had to rebase a bit,
but these three patches really should go in soon. I'm not sending
them for 4.9 this late though.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit f81a8a02bb3b3e882ba6aa580230c13b5be64849:

  Merge branch 'mV88e6xxx-interrupt-fixes' (2016-11-20 21:16:14 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
tags/mac80211-next-for-davem-2016-12-09

for you to fetch changes up to e6f462df9acd2a3295e5d34eb29e2823220cf129:

  cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts (2016-12-09 
12:57:49 +0100)


Three fixes:
 * fix a logic bug introduced by a previous cleanup
 * fix nl80211 attribute confusing (trying to use
   a single attribute for two purposes)
 * fix a long-standing BSS leak that happens when an
   association attempt is abandoned


Johannes Berg (2):
  nl80211: fix logic inversion in start_nan()
  cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts

Vamsi Krishna (1):
  nl80211: Use different attrs for BSSID and random MAC addr in scan req

 include/net/cfg80211.h   | 11 +++
 include/uapi/linux/nl80211.h |  7 ++-
 net/mac80211/mlme.c  | 21 -
 net/wireless/core.h  |  1 +
 net/wireless/mlme.c  | 12 
 net/wireless/nl80211.c   | 18 --
 net/wireless/sme.c   | 14 ++
 7 files changed, 72 insertions(+), 12 deletions(-)

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Dmitry Vyukov

On Fri, Dec 9, 2016 at 12:48 PM, Richard Guy Briggs  wrote:
> On 2016-12-09 11:49, Dmitry Vyukov wrote:
>> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs  wrote:
>> > On 2016-11-29 23:52, Richard Guy Briggs wrote:
>> > I tried a quick compile attempt on the test case (I assume it is a
>> > socket fuzzer) and get the following compile error:
>> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
>> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
>> > : warning: this is the location of the previous definition
>> > socket_fuzz.c: In function ‘segv_handler’:
>> > socket_fuzz.c:89: warning: implicit declaration of function 
>> > ‘__atomic_load_n’
>> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this 
>> > function)
>> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once
>> > socket_fuzz.c:89: error: for each function it appears in.)
>> > socket_fuzz.c: In function ‘loop’:
>> > socket_fuzz.c:280: warning: unused variable ‘errno0’
>> > socket_fuzz.c: In function ‘test’:
>> > socket_fuzz.c:303: warning: implicit declaration of function 
>> > ‘__atomic_fetch_add’
>> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this 
>> > function)
>> > socket_fuzz.c:303: warning: implicit declaration of function 
>> > ‘__atomic_fetch_sub’
>>
>> -std=gnu99 should help
>> ignore warnings
>
> I got a little further, left with "__ATOMIC_RELAXED undeclared", 
> "__ATOMIC_SEQ_CST
> undeclared" under gcc 4.4.7-16.
>
> gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'"

add -lrt


> What compiler version do you recommend?

6.x sounds reasonable
4.4 branch is 7.5 years old, surprised that it does not disintegrate
into dust yet :)


>> >> - RGB
>> >
>> > - RGB
>
> - RGB
>
> --
> Richard Guy Briggs 
> Kernel Security Engineering, Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Richard Guy Briggs

On 2016-12-09 11:49, Dmitry Vyukov wrote:
> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs  wrote:
> > On 2016-11-29 23:52, Richard Guy Briggs wrote:
> > I tried a quick compile attempt on the test case (I assume it is a
> > socket fuzzer) and get the following compile error:
> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
> > : warning: this is the location of the previous definition
> > socket_fuzz.c: In function ‘segv_handler’:
> > socket_fuzz.c:89: warning: implicit declaration of function 
> > ‘__atomic_load_n’
> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this 
> > function)
> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once
> > socket_fuzz.c:89: error: for each function it appears in.)
> > socket_fuzz.c: In function ‘loop’:
> > socket_fuzz.c:280: warning: unused variable ‘errno0’
> > socket_fuzz.c: In function ‘test’:
> > socket_fuzz.c:303: warning: implicit declaration of function 
> > ‘__atomic_fetch_add’
> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this 
> > function)
> > socket_fuzz.c:303: warning: implicit declaration of function 
> > ‘__atomic_fetch_sub’
> 
> -std=gnu99 should help
> ignore warnings

I got a little further, left with "__ATOMIC_RELAXED undeclared", 
"__ATOMIC_SEQ_CST
undeclared" under gcc 4.4.7-16.

gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'"

What compiler version do you recommend?

> >> - RGB
> >
> > - RGB

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

[PATCH] uio-hv-generic: store physical addresses instead of virtual

2016-12-09 Thread Arnd Bergmann

gcc warns about the newly added driver when phys_addr_t is wider than
a pointer:

drivers/uio/uio_hv_generic.c: In function 'hv_uio_mmap':
drivers/uio/uio_hv_generic.c:71:17: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT,
drivers/uio/uio_hv_generic.c: In function 'hv_uio_probe':
drivers/uio/uio_hv_generic.c:140:5: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
   = (phys_addr_t)dev->channel->ringbuffer_pages;
drivers/uio/uio_hv_generic.c:147:3: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
   (phys_addr_t)vmbus_connection.int_page;
drivers/uio/uio_hv_generic.c:153:3: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
   (phys_addr_t)vmbus_connection.monitor_pages[1];

I can't see why we store a virtual address in a phys_addr_t here,
as the only user of that variable converts it into a physical
address anyway, so this moves the conversion to where it logically
fits according to the types.

Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus")
Signed-off-by: Arnd Bergmann 
---
 drivers/uio/uio_hv_generic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index ad3ab5805ad8..50958f167305 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -68,7 +68,7 @@ hv_uio_mmap(struct uio_info *info, struct vm_area_struct *vma)
mi = (int)vma->vm_pgoff;
 
return remap_pfn_range(vma, vma->vm_start,
-   virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT,
+   info->mem[mi].addr >> PAGE_SHIFT,
vma->vm_end - vma->vm_start, vma->vm_page_prot);
 }
 
@@ -137,20 +137,20 @@ hv_uio_probe(struct hv_device *dev,
/* mem resources */
pdata->info.mem[TXRX_RING_MAP].name = "txrx_rings";
pdata->info.mem[TXRX_RING_MAP].addr
-   = (phys_addr_t)dev->channel->ringbuffer_pages;
+   = virt_to_phys(dev->channel->ringbuffer_pages);
pdata->info.mem[TXRX_RING_MAP].size
= dev->channel->ringbuffer_pagecount * PAGE_SIZE;
pdata->info.mem[TXRX_RING_MAP].memtype = UIO_MEM_LOGICAL;
 
pdata->info.mem[INT_PAGE_MAP].name = "int_page";
pdata->info.mem[INT_PAGE_MAP].addr =
-   (phys_addr_t)vmbus_connection.int_page;
+   virt_to_phys(vmbus_connection.int_page);
pdata->info.mem[INT_PAGE_MAP].size = PAGE_SIZE;
pdata->info.mem[INT_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
 
pdata->info.mem[MON_PAGE_MAP].name = "monitor_pages";
pdata->info.mem[MON_PAGE_MAP].addr =
-   (phys_addr_t)vmbus_connection.monitor_pages[1];
+   virt_to_phys(vmbus_connection.monitor_pages[1]);
pdata->info.mem[MON_PAGE_MAP].size = PAGE_SIZE;
pdata->info.mem[MON_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
 
-- 
2.9.0

Synopsys Ethernet QoS

2016-12-09 Thread Joao Pinto

Dear David Miller,

These past 2 weeks we have been discussing the right way to go in terms of
Synopsys QoS support in the kernel.

The approach that raised more supporters was:

a) Test /stmicro/stmmac driver in a reference hardware prototyping platform (QoS
IPK) [Status: In Progress | 90% finished]

b) Merge the necessary features from AXIS’ synopsys based qos driver to the
/stmicro/stmmac
[Status: In Queue]

c) Rename /stmicro/stmmac driver to synopsys/ and re-factor the driver if 
necessary
[Status: In Queue]

d) Add QoS features incrementally to the new synopsys/ driver
[Status: In Queue]

This approach has the green light from AXIS and STMicro maintainers (Lars and
Peppe).

I would like to know if you support this plan.

Best Regards,
Joao

Re: Misalignment, MIPS, and ip_hdr(skb)->version

2016-12-09 Thread Jiri Benc

On Wed, 07 Dec 2016 23:34:21 -0500, Daniel Kahn Gillmor wrote:
> fwiw, i'm not convinced that "most protocols of the IETF follow this
> mantra".  we've had multiple discussions in different protocol groups
> about shaving or bloating by a few bytes here or there in different
> protocols, and i don't think anyone has brought up memory alignment as
> an argument in any of the discussions i've followed.

Which is sad. One would expect that this would be well understood for
decades already.

 Jiri

Re: [PATCH V2 22/22] bnxt_re: Add bnxt_re driver build support

2016-12-09 Thread kbuild test robot

Hi Selvin,

[auto build test ERROR on rdma/master]
[also build test ERROR on v4.9-rc8 next-20161208]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Selvin-Xavier/Broadcom-RoCE-Driver-bnxt_re/20161209-154823
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c: In function 
'bnxt_qplib_creq_irq':
>> drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c:359:2: error: implicit 
>> declaration of function 'prefetch' [-Werror=implicit-function-declaration]
 prefetch(_ptr[CREQ_PG(sw_cons)][CREQ_IDX(sw_cons)]);
 ^~~~
   cc1: some warnings being treated as errors
--
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_service_nq':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:145:29: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
   bnxt_qplib_arm_cq_enable((struct bnxt_qplib_cq *)
^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:147:29: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
   if (!nq->cqn_handler(nq, (struct bnxt_qplib_cq *)
^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_nq_irq':
>> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:182:2: error: implicit 
>> declaration of function 'prefetch' [-Werror=implicit-function-declaration]
 prefetch(_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)]);
 ^~~~
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_create_qp':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:484:16: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  psn_search = (unsigned long long int)
   ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_destroy_qp':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1071:22: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
 __clean_cq(qp->scq, (u64)qp);
 ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1073:23: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  __clean_cq(qp->rcq, (u64)qp);
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function '__flush_sq':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1630:20: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  cqe->qp_handle = (u64)qp;
   ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function '__flush_rq':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1664:20: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  cqe->qp_handle = (u64)qp;
   ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_cq_process_req':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1688:7: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
 qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1720:20: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  cqe->qp_handle = (u64)qp;
   ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_cq_process_res_rc':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1782:7: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
 qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1794:19: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
 cqe->qp_handle = (u64)qp;
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_cq_process_res_ud':
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1836:7: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
 qp = (struct bnxt_qplib_qp *)le64_to_cpu(hwcqe->qp_handle);
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1847:19: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
 cqe->qp_handle = (u64)qp;
  ^
   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c: In function 
'bnxt_qplib_cq_process_res_raweth_qp1':
   dri

Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock

2016-12-09 Thread Pavel Machek

On Fri 2016-12-09 00:19:43, Francois Romieu wrote:
> Lino Sanfilippo  :
> [...]
> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if
> > this is caused by that locking scheme (in a way I have not figured out yet)
> > or if it is a different issue.
> 
> stmmac_tx_err races with stmmac_xmit.

Umm, yes, that looks real.

And that means that removing tx_lock will not be completely trivial
:-(. Lino, any ideas there?

netif_tx_lock_irqsave() would help, but afaict that one does not
exist.

Plus, does someone know how to trigger the status == tx_hard_error? I
tried powering down the switch, but that did not do it.

Thanks, Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

signature.asc
Description: Digital signature

Re: [PATCH] net: socket: preferred __aligned(size) for control buffer

2016-12-09 Thread Sergei Shtylyov


Hello!

On 12/8/2016 3:51 PM, kushwah...@samsung.com wrote:


From: Amit Kushwaha 

This patch cleanup checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))

Signed-off-by: Amit Kushwaha 
---
 net/socket.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index e631894..5835383 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1,3 +1,4 @@
+


   Why?


 /*
  * NET An implementation of the SOCKET network access protocol.
  *

[...]

MBR, Sergei

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Richard Guy Briggs

On 2016-12-08 22:57, Cong Wang wrote:
> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs  wrote:
> > I also tried to extend Cong Wang's idea to attempt to proactively respond 
> > to a
> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking 
> > error
> > stack dump using mutex_lock(_cmd_mutex) in the notifier callback.
> > Eliminating the lock since the sock is dead anways eliminates the error.
> >
> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try 
> > to
> > get the test case to compile.
> 
> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
> are updated as a whole and race between audit_receive_msg() and
> NETLINK_URELEASE.

This is what I expected and why I originally added the mutex lock in the
callback...  The dumps I got were bare with no wrapper identifying the
process context or specific error, so I'm at a bit of a loss how to
solve this (without thinking more about it) other than instinctively
removing the mutex.

Another approach might be to look at consolidating the three into one
identifier or derive the other two from one, or serialize their access.

> > @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net 
> > *net)
> >  {
> > struct audit_net *aunet = net_generic(net, audit_net_id);
> > struct sock *sock = aunet->nlsk;
> > +
> > +   mutex_lock(_cmd_mutex);
> > if (sock == audit_sock) {
> > audit_pid = 0;
> > +   audit_nlk_portid = 0;
> > audit_sock = NULL;
> > }
> > +   mutex_unlock(_cmd_mutex);
> 
> If you decide to use NETLINK_URELEASE notifier, the above piece is no
> longer needed, the net_exit path simply releases a refcnt.

Good point.  It would have already killed it off.  So this piece is
arguably too late anyways.

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

[PATCH] bnxt_re: fix itnull.cocci warnings

2016-12-09 Thread Julia Lawall

list_for_each_entry iterator variable cannot be NULL.

Generated by: scripts/coccinelle/iterators/itnull.cocci

CC: Selvin Xavier <selvin.xav...@broadcom.com>
Signed-off-by: Julia Lawall <julia.law...@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang...@intel.com>
---

url:
https://github.com/0day-ci/linux/commits/Selvin-Xavier/Broadcom-RoCE-Driver-bnxt_re/20161209-154823
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
master

I received some other warnings as well.  Not sure if they have been passed
along already:

>> drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c:2455:4-14: code aligned
with following code on line 2456
--
>> drivers/infiniband/hw/bnxtre/bnxt_re_main.c:1047:2-20: code aligned
with following code on line 1048
   drivers/infiniband/hw/bnxtre/bnxt_re_main.c:1188:3-43: code aligned
with following code on line 1190
--
>> drivers/infiniband/hw/bnxtre/bnxt_re_main.c:834:6-8: ERROR: iterator
variable bound on line 832 cannot be NULL
--
>> drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c:2512:5-13: WARNING:
Unsigned expression compared with zero: pkt_type < 0


 bnxt_re_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
+++ b/drivers/infiniband/hw/bnxtre/bnxt_re_main.c
@@ -831,7 +831,7 @@ static void bnxt_re_dev_stop(struct bnxt
mutex_lock(>qp_lock);
list_for_each_entry(qp, >qp_list, list) {
/* Modify the state of all QPs except QP1/Shadow QP */
-   if (qp && !bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) {
+   if (!bnxt_re_is_qp1_or_shadow_qp(rdev, qp)) {
if (qp->qplib_qp.state !=
CMDQ_MODIFY_QP_NEW_STATE_RESET ||
qp->qplib_qp.state !=

Re: stmmac driver...

2016-12-09 Thread Niklas Cassel

Hello Jie Deng


In your cover letter you wrote

dwc-eth-xxx.x
  The DWC ethernet core layer (DWC ECL). This layer contains codes
can be shared by different DWC series ethernet cores

Does this mean that code in dwc-eth-xxx.x is common to all
the different Synopsys IPs, GMAC, XGMAC and XLGMAC ?


Regards,
Niklas

On Fri, Dec 9, 2016 at 11:05 AM, Jie Deng  wrote:
>
>
> On 2016/12/8 23:25, David Miller wrote:
>> From: Alexandre Torgue 
>> Date: Thu, 8 Dec 2016 14:55:04 +0100
>>
>>> Maybe I forget some series. Do you have others in mind ?
>> Please see the thread titled:
>>
>> "net: ethernet: Initial driver for Synopsys DWC XLGMAC"
>>
>> which seems to be discussing consolidation of various drivers
>> for the same IP core, of which stmmac is one.
>>
>> I personally am against any change of the driver name and
>> things like this, and wish the people doing that work would
>> simply contribute to making whatever changes they need directly
>> to the stmmac driver.
>>
>> You really need to voice your opinion when major changes are being
>> proposed for the driver you maintain.
>>
> Hi David and Alex,
>
> XLGMAC is not a version of GMAC. Synopsys has several IPs and each IP has
> several versions.
>
> GMAC(QoS): 3.5, 3.7, 4.0, 4.10, 4.20...
> XGMAC: 1.00, 1.10, 1.20, 2.00, 2.10, 2.11...
> XLGMAC (Synopsys DesignWare Core Enterprise Ethernet): this is a new IP.
>
> Regards,
> Jie
>

Re: netlink: GPF in sock_sndtimeo

2016-12-09 Thread Dmitry Vyukov

On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs  wrote:
> On 2016-11-29 23:52, Richard Guy Briggs wrote:
>> On 2016-11-29 15:13, Cong Wang wrote:
>> > On Tue, Nov 29, 2016 at 8:48 AM, Richard Guy Briggs  
>> > wrote:
>> > > On 2016-11-26 17:11, Cong Wang wrote:
>> > >> It is racy on audit_sock, especially on the netns exit path.
>> > >
>> > > I think that is the only place it is racy.  The other places audit_sock
>> > > is set is when the socket failure has just triggered a reset.
>> > >
>> > > Is there a notifier callback for failed or reaped sockets?
>> >
>> > Is NETLINK_URELEASE event what you are looking for?
>>
>> Possibly, yes.  Thanks, I'll have a look.
>
> I tried a quick compile attempt on the test case (I assume it is a
> socket fuzzer) and get the following compile error:
> cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
> socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
> : warning: this is the location of the previous definition
> socket_fuzz.c: In function ‘segv_handler’:
> socket_fuzz.c:89: warning: implicit declaration of function ‘__atomic_load_n’
> socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this 
> function)
> socket_fuzz.c:89: error: (Each undeclared identifier is reported only once
> socket_fuzz.c:89: error: for each function it appears in.)
> socket_fuzz.c: In function ‘loop’:
> socket_fuzz.c:280: warning: unused variable ‘errno0’
> socket_fuzz.c: In function ‘test’:
> socket_fuzz.c:303: warning: implicit declaration of function 
> ‘__atomic_fetch_add’
> socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this 
> function)
> socket_fuzz.c:303: warning: implicit declaration of function 
> ‘__atomic_fetch_sub’

-std=gnu99 should help
ignore warnings



> I also tried to extend Cong Wang's idea to attempt to proactively respond to a
> NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
> stack dump using mutex_lock(_cmd_mutex) in the notifier callback.
> Eliminating the lock since the sock is dead anways eliminates the error.
>
> Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
> get the test case to compile.
>
> This is being tracked as https://github.com/linux-audit/audit-kernel/issues/30
>
> Subject: [PATCH] audit: proactively reset audit_sock on matching 
> NETLINK_URELEASE
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f1ca116..91d222d 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -423,6 +423,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
> snprintf(s, sizeof(s), "audit_pid=%d reset", 
> audit_pid);
> audit_log_lost(s);
> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
> } else {
> pr_warn("re-scheduling(#%d) write to 
> audit_pid=%d\n",
> @@ -1143,6 +1144,28 @@ static int audit_bind(struct net *net, int group)
> return 0;
>  }
>
> +static int audit_sock_netlink_notify(struct notifier_block *nb,
> +unsigned long event,
> +void *_notify)
> +{
> +   struct netlink_notify *notify = _notify;
> +   struct audit_net *aunet = net_generic(notify->net, audit_net_id);
> +
> +   if (event == NETLINK_URELEASE && notify->protocol == NETLINK_AUDIT) {
> +   if (audit_nlk_portid == notify->portid &&
> +   audit_sock == aunet->nlsk) {
> +   audit_pid = 0;
> +   audit_nlk_portid = 0;
> +   audit_sock = NULL;
> +   }
> +   }
> +   return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block audit_netlink_notifier = {
> +   .notifier_call = audit_sock_netlink_notify,
> +};
> +
>  static int __net_init audit_net_init(struct net *net)
>  {
> struct netlink_kernel_cfg cfg = {
> @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net *net)
>  {
> struct audit_net *aunet = net_generic(net, audit_net_id);
> struct sock *sock = aunet->nlsk;
> +
> +   mutex_lock(_cmd_mutex);
> if (sock == audit_sock) {
> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
> }
> +   mutex_unlock(_cmd_mutex);
>
> RCU_INIT_POINTER(aunet->nlsk, NULL);
> synchronize_net();
> @@ -1202,6 +1229,7 @@ static int __init audit_init(void)
> audit_enabled = audit_default;
> audit_ever_enabled |= !!audit_default;
>
> +   netlink_register_notifier(_netlink_notifier);
> audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");
>
> for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> --
> 1.7.1
>
>
>> - RGB
>
> - RGB
>
> --
> Richard Guy Briggs 
> Kernel Security

Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects

2016-12-09 Thread Pablo Neira Ayuso

Hi Paul,

On Thu, Dec 08, 2016 at 07:40:14PM -0500, Paul Gortmaker wrote:
> On Wed, Dec 7, 2016 at 4:52 PM, Pablo Neira Ayuso  wrote:
> > This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
> > dump-and-reset of the stateful object. This also comes with add support
> > for atomic dump and reset for counter and quota objects.
> 
> This triggered a new build failure in linux-next on parisc-32, which a
> hands-off bisect
> run lists as resulting from this:
> 
> ERROR: "__cmpxchg_u64" [net/netfilter/nft_counter.ko] undefined!
> make[2]: *** [__modpost] Error 1
> make[1]: *** [modules] Error 2
> make: *** [sub-make] Error 2
> 43da04a593d8b2626f1cf4b56efe9402f6b53652 is the first bad commit
> commit 43da04a593d8b2626f1cf4b56efe9402f6b53652
> Author: Pablo Neira Ayuso 
> Date:   Mon Nov 28 00:05:44 2016 +0100
> 
> netfilter: nf_tables: atomic dump and reset for stateful objects
> 
> This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
> dump-and-reset of the stateful object. This also comes with add support
> for atomic dump and reset for counter and quota objects.
> 
> Signed-off-by: Pablo Neira Ayuso 
> 
> :04 04 6cd4554f69247e5c837db52342f26888beda1623
> 5908aca93c89e7922336546c3753bfcf2aceefba M  include
> :04 04 f25d5831eb30972436bd198c5bb237a0cb0b4856
> 4ee5751c8de02bb5a8dcaadb2a2df7986d90f8e9 M  net
> bisect run success
> 
> Guessing this is more an issue with parisc than it is with netfilter, but I
> figured I'd mention it anyway.

I'm planning to submit this patch to parisc, I'm attaching it to this
email.
>From c9d320ac0be2a32a7b2bfad398be549865088ecf Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso 
Date: Thu, 8 Dec 2016 22:55:33 +0100
Subject: [PATCH] parisc: export symbol __cmpxchg_u64()

kbuild test robot reports:

>> ERROR: "__cmpxchg_u64" [net/netfilter/nft_counter.ko] undefined!

Commit 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for
stateful objects") introduces the first client of cmpxchg64() from
modules.

Patch 54b668009076 ("parisc: Add native high-resolution sched_clock()
implementation") removed __cmpxchg_u64() dependency on CONFIG_64BIT.
So, let's fix this problem by exporting this symbol unconditionally.

Reported-by: kbuild test robot 
Signed-off-by: Pablo Neira Ayuso 
---
 arch/parisc/kernel/parisc_ksyms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index 3cad8aadc69e..cfa704548cf3 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -40,8 +40,8 @@ EXPORT_SYMBOL(__atomic_hash);
 #endif
 #ifdef CONFIG_64BIT
 EXPORT_SYMBOL(__xchg64);
-EXPORT_SYMBOL(__cmpxchg_u64);
 #endif
+EXPORT_SYMBOL(__cmpxchg_u64);
 
 #include 
 EXPORT_SYMBOL(lclear_user);
-- 
2.1.4

Re: [PATCH v3 0/4] vsock: cancel connect packets when failing to connect

2016-12-09 Thread Stefan Hajnoczi

On Fri, Dec 09, 2016 at 01:12:32AM +0800, Peng Tao wrote:
> Currently, if a connect call fails on a signal or timeout (e.g., guest is 
> still
> in the process of starting up), we'll just return to caller and leave the 
> connect
> packet queued and they are sent even though the connection is considered a 
> failure,
> which can confuse applications with unwanted false connect attempt.
> 
> The patchset enables vsock (both host and guest) to cancel queued packets when
> a connect attempt is considered to fail.
> 
> v3 changelog:
>   - define cancel_pkt callback in struct vsock_transport rather than struct 
> virtio_transport
>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
> v2 changelog:
>   - fix queued_replies counting and resume tx/rx when necessary
> 
> 
> Peng Tao (4):
>   vsock: track pkt owner vsock
>   vhost-vsock: add pkt cancel capability
>   vsock: add pkt cancel capability
>   vsock: cancel packets when failing to connect
> 
>  drivers/vhost/vsock.c   | 41 
>  include/linux/virtio_vsock.h|  2 ++
>  include/net/af_vsock.h  |  3 +++
>  net/vmw_vsock/af_vsock.c| 14 +++
>  net/vmw_vsock/virtio_transport.c| 42 
> +
>  net/vmw_vsock/virtio_transport_common.c |  7 ++
>  6 files changed, 109 insertions(+)

I'm happy although I pointed out two unnecessary (void*) casts.

Please wait for Jorgen to go happy on the af_vsock.c changes before
applying.


signature.asc
Description: PGP signature

Re: [PATCH v3 4/4] vsock: cancel packets when failing to connect

2016-12-09 Thread Stefan Hajnoczi

On Fri, Dec 09, 2016 at 01:12:36AM +0800, Peng Tao wrote:
> Otherwise we'll leave the packets queued until releasing vsock device.
> E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
> will get the connect requests from failed host sockets.
> 
> Reviewed-by: Stefan Hajnoczi 

Please do not include Reviewed-by: if the patch has undergone
substantial changes.

I am happy with this latest version:
Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v3 2/4] vhost-vsock: add pkt cancel capability

2016-12-09 Thread Stefan Hajnoczi

On Fri, Dec 09, 2016 at 01:12:34AM +0800, Peng Tao wrote:
> To allow canceling all packets of a connection.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Peng Tao 
> ---
>  drivers/vhost/vsock.c  | 41 +
>  include/net/af_vsock.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index a504e2e0..db64d51 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>   return len;
>  }
>  
> +static int
> +vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> + struct vhost_vsock *vsock;
> + struct virtio_vsock_pkt *pkt, *n;
> + int cnt = 0;
> + LIST_HEAD(freeme);
> +
> + /* Find the vhost_vsock according to guest context id  */
> + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
> + if (!vsock)
> + return -ENODEV;
> +
> + spin_lock_bh(>send_pkt_list_lock);
> + list_for_each_entry_safe(pkt, n, >send_pkt_list, list) {
> + if (pkt->cancel_token != (void *)vsk)

It's not necessary to cast to void* in C.  All pointers cast to void*
automatically without compiler warnings.  The warnings and explicit
casts are a C++ thing.


signature.asc
Description: PGP signature

Re: [PATCH v3 3/4] vsock: add pkt cancel capability

2016-12-09 Thread Stefan Hajnoczi

On Fri, Dec 09, 2016 at 01:12:35AM +0800, Peng Tao wrote:
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Peng Tao 
> ---
>  net/vmw_vsock/virtio_transport.c | 42 
> 
>  1 file changed, 42 insertions(+)
> 
> diff --git a/net/vmw_vsock/virtio_transport.c 
> b/net/vmw_vsock/virtio_transport.c
> index 936d7ee..95c1162 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -170,6 +170,47 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>   return len;
>  }
>  
> +static int
> +virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> + struct virtio_vsock *vsock;
> + struct virtio_vsock_pkt *pkt, *n;
> + int cnt = 0;
> + LIST_HEAD(freeme);
> +
> + vsock = virtio_vsock_get();
> + if (!vsock) {
> + return -ENODEV;
> + }
> +
> + spin_lock_bh(>send_pkt_list_lock);
> + list_for_each_entry_safe(pkt, n, >send_pkt_list, list) {
> + if (pkt->cancel_token != (void *)vsk)

The cast is unnecessary here.


signature.asc
Description: PGP signature

[PATCH net-next] net: macb: Added PCI wrapper for Platform Driver.

2016-12-09 Thread Bartosz Folta

There are hardware PCI implementations of Cadence GEM network controller. This 
patch will allow to use such hardware with reuse of existing Platform Driver.

Signed-off-by: Bartosz Folta 
---
 drivers/net/ethernet/cadence/Kconfig|   9 ++
 drivers/net/ethernet/cadence/Makefile   |   1 +
 drivers/net/ethernet/cadence/macb.c |  31 +--
 drivers/net/ethernet/cadence/macb_pci.c | 152 
 include/linux/platform_data/macb.h  |   6 ++
 5 files changed, 194 insertions(+), 5 deletions(-)  create mode 100644 
drivers/net/ethernet/cadence/macb_pci.c

diff --git a/drivers/net/ethernet/cadence/Kconfig 
b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..00d833e 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -31,4 +31,13 @@ config MACB
  To compile this driver as a module, choose M here: the module
  will be called macb.
 
+config MACB_PCI
+   tristate "Cadence PCI MACB/GEM support"
+   depends on MACB
+   ---help---
+ This is PCI wrapper for MACB driver.
+
+ To compile this driver as a module, choose M here: the module
+ will be called macb_pci.
+
 endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile 
b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4ba7559 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_MACB) += macb.o
+obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 538544a..c0fb80a 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev)
phy_irq = gpio_to_irq(pdata->phy_irq_pin);
phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
}
+   } else {
+   phydev->irq = PHY_POLL;
}
 
/* attach the mac to the phy */
@@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp)
goto err_out_unregister_bus;
}
} else {
+   for (i = 0; i < PHY_MAX_ADDR; i++)
+   bp->mii_bus->irq[i] = PHY_POLL;
+
if (pdata)
bp->mii_bus->phy_mask = pdata->phy_mask;
 
@@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, 
struct clk **pclk,
 struct clk **hclk, struct clk **tx_clk,
 struct clk **rx_clk)
 {
+   struct macb_platform_data *pdata;
int err;
 
-   *pclk = devm_clk_get(>dev, "pclk");
+   pdata = dev_get_platdata(>dev);
+   if (pdata) {
+   *pclk = pdata->pclk;
+   *hclk = pdata->hclk;
+   } else {
+   *pclk = devm_clk_get(>dev, "pclk");
+   *hclk = devm_clk_get(>dev, "hclk");
+   }
+
if (IS_ERR(*pclk)) {
err = PTR_ERR(*pclk);
dev_err(>dev, "failed to get macb_clk (%u)\n", err);
return err;
}
 
-   *hclk = devm_clk_get(>dev, "hclk");
if (IS_ERR(*hclk)) {
err = PTR_ERR(*hclk);
dev_err(>dev, "failed to get hclk (%u)\n", err); @@ 
-3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev)  
MODULE_DEVICE_TABLE(of, macb_dt_ids);  #endif /* CONFIG_OF */
 
+static const struct macb_config default_gem_config = {
+   .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
+   .dma_burst_length = 16,
+   .clk_init = macb_clk_init,
+   .init = macb_init,
+   .jumbo_max_len = 10240,
+};
+
 static int macb_probe(struct platform_device *pdev)  {
+   const struct macb_config *macb_config = _gem_config;
int (*clk_init)(struct platform_device *, struct clk **,
struct clk **, struct clk **,  struct clk **)
- = macb_clk_init;
-   int (*init)(struct platform_device *) = macb_init;
+ = macb_config->clk_init;
+   int (*init)(struct platform_device *) = macb_config->init;
struct device_node *np = pdev->dev.of_node;
struct device_node *phy_node;
-   const struct macb_config *macb_config = NULL;
struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
unsigned int queue_mask, num_queues;
struct macb_platform_data *pdata;
diff --git a/drivers/net/ethernet/cadence/macb_pci.c 
b/drivers/net/ethernet/cadence/macb_pci.c
new file mode 100644
index 000..b440960
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_pci.c
@@ -0,0 +1,152 @@
+/**
+ * macb_pci.c - Cadence GEM PCI wrapper.
+ *
+ * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com
+ *
+ * Authors: Rafal Ozieblo 
+ *

Re: stmmac driver...

2016-12-09 Thread Jie Deng



On 2016/12/8 23:25, David Miller wrote:
> From: Alexandre Torgue 
> Date: Thu, 8 Dec 2016 14:55:04 +0100
>
>> Maybe I forget some series. Do you have others in mind ?
> Please see the thread titled:
>
> "net: ethernet: Initial driver for Synopsys DWC XLGMAC"
>
> which seems to be discussing consolidation of various drivers
> for the same IP core, of which stmmac is one.
>
> I personally am against any change of the driver name and
> things like this, and wish the people doing that work would
> simply contribute to making whatever changes they need directly
> to the stmmac driver.
>
> You really need to voice your opinion when major changes are being
> proposed for the driver you maintain.
>
Hi David and Alex,

XLGMAC is not a version of GMAC. Synopsys has several IPs and each IP has
several versions.

GMAC(QoS): 3.5, 3.7, 4.0, 4.10, 4.20...
XGMAC: 1.00, 1.10, 1.20, 2.00, 2.10, 2.11...
XLGMAC (Synopsys DesignWare Core Enterprise Ethernet): this is a new IP.

Regards,
Jie

1 2 >

1 - 100 of 112 matches

Mail list logo