date:20160929

Re: [PATCH v8 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking

2016-09-29 Thread David Miller

From: Greg KH 
Date: Fri, 30 Sep 2016 08:30:05 +0200

> On Fri, Sep 30, 2016 at 01:55:55AM -0400, David Miller wrote:
>> From: Amir Levy 
>> Date: Wed, 28 Sep 2016 17:44:22 +0300
>> 
>> > This driver enables Thunderbolt Networking on non-Apple platforms
>> > running Linux.
>> 
>> Greg, any idea where this should get merged once fully vetted?  I can
>> take it through the net-next tree, but I'm fine with another more
>> appropriate tree taking it as well.
> 
> I am supposed to be taking thunderbolt patches, but if this really is a
> network driver, it should go under drivers/net/ somewhere.  It needs
> more review though, it's not ready to go through anyone's tree just yet :)
> 
> I'll let the thunderbolt maintainer go through it first before asking
> for a netdev review.

Ok, thanks Greg.

Re: [net-next PATCH] i40e: Clean up handling of msg_enable flags and debug parameter

2016-09-29 Thread Stefan Assmann

On 29.09.2016 14:05, Alexander Duyck wrote:
> So the i40e driver had a really convoluted configuration for how to handle
> the debug flags contained in msg_enable.  Part of the issue is that the
> driver has its own 32 bit mask that it was using to track a separate set of
> debug features.  From what I can tell it was trying to use the upper 4 bits
> to determine if the value was meant to represent a bit-mask or the numeric
> value provided by debug level.
> 
> What this patch does is clean this up by compressing those 4 bits into bit
> 31, as a result we just have to perform a check against the value being
> negative to determine if we are looking at a debug level (positive), or a
> debug mask (negative).  The debug level will populate the msg_level, and
> the debug mask will populate the debug_mask in the hardware struct.
> 
> I added similar logic for ethtool.  If the value being provided has bit 31
> set we assume the value being provided is a debug mask, otherwise we assume
> it is a msg_enable mask.  For displaying we only provide the msg_enable,
> and if debug_mask is in use we will print it to the dmesg log.
> 
> Lastly I removed the debugfs interface.  It is redundant with what we
> already have in ethtool and really doesn't belong anyway.
> 
> Signed-off-by: Alexander Duyck 
> ---
> 
> So I am running this through a slightly different process than standard
> because there are some items here that might be objectionable so I want to
> have that ironed out before I deal with the out-of-tree Intel driver.
> 
> I just want to verify if this fix for this will work for net-next or if I
> need to look at taking a different approach.  Once I get the go-no-go, I
> will back-port this into the out-of-tree i40e driver.
> 
>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c |   18 ---
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |7 +-
>  drivers/net/ethernet/intel/i40e/i40e_main.c|   28 
> 
>  3 files changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
> b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> index 0c1875b..acb0f13 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> @@ -1210,24 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file 
> *filp,
>   dev_info(&pf->pdev->dev,
>"dump debug fwdata   
> \n");
>   }
> -
> - } else if (strncmp(cmd_buf, "msg_enable", 10) == 0) {
> - u32 level;
> - cnt = sscanf(&cmd_buf[10], "%i", &level);
> - if (cnt) {
> - if (I40E_DEBUG_USER & level) {
> - pf->hw.debug_mask = level;
> - dev_info(&pf->pdev->dev,
> -  "set hw.debug_mask = 0x%08x\n",
> -  pf->hw.debug_mask);
> - }
> - pf->msg_enable = level;
> - dev_info(&pf->pdev->dev, "set msg_enable = 0x%08x\n",
> -  pf->msg_enable);
> - } else {
> - dev_info(&pf->pdev->dev, "msg_enable = 0x%08x\n",
> -  pf->msg_enable);
> - }
>   } else if (strncmp(cmd_buf, "pfr", 3) == 0) {
>   dev_info(&pf->pdev->dev, "debugfs: forcing PFR\n");
>   i40e_do_reset_safe(pf, BIT(__I40E_PF_RESET_REQUESTED));
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index e79a920..b133a77 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -978,6 +978,10 @@ static u32 i40e_get_msglevel(struct net_device *netdev)
>  {
>   struct i40e_netdev_priv *np = netdev_priv(netdev);
>   struct i40e_pf *pf = np->vsi->back;
> + u32 debug_mask = pf->hw.debug_mask;
> +
> + if (debug_mask)
> + netdev_info(netdev, "i40e debug_mask: 0x%08X\n", debug_mask);
>  
>   return pf->msg_enable;
>  }
> @@ -989,7 +993,8 @@ static void i40e_set_msglevel(struct net_device *netdev, 
> u32 data)
>  
>   if (I40E_DEBUG_USER & data)
>   pf->hw.debug_mask = data;
> - pf->msg_enable = data;
> + else
> + pf->msg_enable = data;
>  }
>  
>  static int i40e_get_regs_len(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index b93fa2a..b24c6eb 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -93,8 +93,8 @@ MODULE_DEVICE_TABLE(pci, i40e_pci_tbl);
>  
>  #define I40E_MAX_VF_COUNT 128
>  static int debug = -1;
> -module_param(debug, int, 0);
> -MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
> +module_param(debug, uint, 0);

Re: [PATCH v8 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking

2016-09-29 Thread Greg KH

On Fri, Sep 30, 2016 at 01:55:55AM -0400, David Miller wrote:
> From: Amir Levy 
> Date: Wed, 28 Sep 2016 17:44:22 +0300
> 
> > This driver enables Thunderbolt Networking on non-Apple platforms
> > running Linux.
> 
> Greg, any idea where this should get merged once fully vetted?  I can
> take it through the net-next tree, but I'm fine with another more
> appropriate tree taking it as well.

I am supposed to be taking thunderbolt patches, but if this really is a
network driver, it should go under drivers/net/ somewhere.  It needs
more review though, it's not ready to go through anyone's tree just yet :)

I'll let the thunderbolt maintainer go through it first before asking
for a netdev review.

thanks,

greg k-h

Loan

2016-09-29 Thread Sarah Collins

Do You Need A Loan Of Any Kind ? If Yes Email Now For More Info on 
internationalloanp...@gmail.com

Re: [PATCH v6 0/7] Reduce cache miss for snmp_fold_field

2016-09-29 Thread David Miller

From: Jia He 
Date: Fri, 30 Sep 2016 11:28:57 +0800

> In a PowerPc server with large cpu number(160), besides commit
> a3a773726c9f ("net: Optimize snmp stat aggregation by walking all
> the percpu data at once"), I watched several other snmp_fold_field
> callsites which would cause high cache miss rate.
 ...

Series applied, thank you.

Re: [PATCH v2 net-next] mlx5: Add ndo_poll_controller() implementation

2016-09-29 Thread David Miller

From: Calvin Owens 
Date: Wed, 28 Sep 2016 21:46:39 -0700

> This implements ndo_poll_controller in net_device_ops callbacks for mlx5,
> which is necessary to use netconsole with this driver.
> 
> Acked-By: Saeed Mahameed 
> Signed-off-by: Calvin Owens 
> ---
> Changes in v2:
>   * Only iterate channels to avoid redundant napi_schedule() calls

Applied, thanks.

Re: [PATCH net-next] nfp: bpf: zero extend 4 byte context loads

2016-09-29 Thread David Miller

From: Jakub Kicinski 
Date: Wed, 28 Sep 2016 23:47:37 +0100

> Set upper 32 bits of destination register to zeros after
> load from the context structure.
> 
> Signed-off-by: Jakub Kicinski 

Applied, thanks.

Re: [PATCH net] sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock

2016-09-29 Thread David Miller

From: Xin Long 
Date: Thu, 29 Sep 2016 02:55:44 +0800

> When sctp dumps all the ep->assocs, it needs to lock_sock first,
> but now it locks sock in rcu_read_lock, and lock_sock may sleep,
> which would break rcu_read_lock.
> 
> This patch is to get and hold one sock when traversing the list.
> After that and get out of rcu_read_lock, lock and dump it. Then
> it will traverse the list again to get the next one until all
> sctp socks are dumped.
> 
> For sctp_diag_dump_one, it fixes this issue by holding asoc and
> moving cb() out of rcu_read_lock in sctp_transport_lookup_process.
> 
> Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
> Signed-off-by: Xin Long 

Applied and queued up for -stable, thanks.

Re: [PATCHv3 net 0/3] sctp: a bunch of fixes for prsctp polices

2016-09-29 Thread David Miller

From: Xin Long 
Date: Thu, 29 Sep 2016 02:37:25 +0800

> This patchset is to fix 2 issues for prsctp polices:
> 
>   1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
>  when overloading the CPU.
> 
>   2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
>  instead of only local side".

Series applied, thanks.

Re: [PATCH v1] mlx4: remove unused fields

2016-09-29 Thread David Miller

From: David Decotigny 
Date: Wed, 28 Sep 2016 11:00:04 -0700

> From: David Decotigny 
> 
> This also can address following UBSAN warnings:
> [   36.640343] 
> 
> [   36.648772] UBSAN: Undefined behaviour in 
> drivers/net/ethernet/mellanox/mlx4/fw.c:857:26
> [   36.656853] shift exponent 64 is too large for 32-bit type 'int'
> [   36.663348] 
> 
> [   36.671783] 
> 
> [   36.680213] UBSAN: Undefined behaviour in 
> drivers/net/ethernet/mellanox/mlx4/fw.c:861:27
> [   36.688297] shift exponent 35 is too large for 32-bit type 'int'
> [   36.694702] 
> 
> 
> Tested:
>   reboot with UBSAN, no warning.
> 
> Signed-off-by: David Decotigny 

Applied to net-next, thanks.

Re: [PATCH v8 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking

2016-09-29 Thread David Miller

From: Amir Levy 
Date: Wed, 28 Sep 2016 17:44:22 +0300

> This driver enables Thunderbolt Networking on non-Apple platforms
> running Linux.

Greg, any idea where this should get merged once fully vetted?  I can
take it through the net-next tree, but I'm fine with another more
appropriate tree taking it as well.

Thanks!

Re: [PATCH] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-29 Thread David Miller

From: Maciej Żenczykowski 
Date: Tue, 27 Sep 2016 23:57:58 -0700

> From: Maciej Żenczykowski 
> 
> This implements:
>   https://tools.ietf.org/html/rfc7559
> 
> Backoff is performed according to RFC3315 section 14:
>   https://tools.ietf.org/html/rfc3315#section-14
> 
> We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
> to a negative value meaning an unlimited number of retransmits,
> and we make this the new default (inline with the RFC).
> 
> We also add a new setting:
>   /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
> defaulting to 1 hour (per RFC recommendation).
> 
> Signed-off-by: Maciej Żenczykowski 

Applied to net-next, thanks.

Re: [PATCH net-next 0/6] rxrpc: Fixes and adjustments

2016-09-29 Thread David Miller

From: David Howells 
Date: Thu, 29 Sep 2016 23:15:27 +0100

> This set of patches contains some fixes and adjustments:
 ...
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160929

Pulled, thanks David.

Re: [PATCH V2 for-next 0/8] Bug Fixes and Code Improvement in HNS driver

2016-09-29 Thread David Miller

From: Salil Mehta 
Date: Thu, 29 Sep 2016 18:09:08 +0100

> This patch-set introduces fix to some Bugs, potential problems
> and code improvements identified during internal review and
> testing of Hisilicon Network Subsystem driver.
> 
> Submit Change
> V1->V2: This addresses the feedbacks provided by David Miller
> and Doug Ledford

So Doug my understanding is if this makes it through review
this is going to be merged into your tree, you prepare a
branch for me, and then I pull from that?

Thanks in advance.

Re: pull-request: wireless-drivers-next 2016-09-29

2016-09-29 Thread David Miller

From: Kalle Valo 
Date: Thu, 29 Sep 2016 19:57:28 +0300

> this should be the last wireless-drivers-next pull request for 4.9, from
> now on only important bugfixes. Nothing really special stands out,
> iwlwifi being most active but other drivers also getting attention. More
> details in the signed tag. Please let me know if there are any problems.

Pulled, thanks Kalle.

> Or actually I had one problem. While doing a test merge I noticed that
> net-next fails to compile for me, but I don't think this is anything
> wireless related:
> 
>   CC  net/netfilter/core.o
> net/netfilter/core.c: In function 'nf_set_hooks_head':
> net/netfilter/core.c:96:149: error: 'struct net_device' has no member named 
> 'nf_hooks_ingress'

Yes, I am aware of this build issue and will tackle it myself if someone
doesn't beat me to it.

Thanks again.

Re: [PATCH net v2] tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

2016-09-29 Thread David Miller

From: "Guilherme G. Piccoli" 
Date: Thu, 29 Sep 2016 13:24:08 -0300

> From: Milton Miller 
> 
> While the driver is probing the adapter, an error may occur before the
> netdev structure is allocated and attached to pci_dev. In this case,
> not only netdev isn't available, but the tg3 private structure is also
> not available as it is just math from the NULL pointer, so dereferences
> must be skipped.
> 
> The following trace is seen when the error is triggered:
 ...
> This patch avoids the NULL pointer dereference by moving the access after
> the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
> check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].
> 
> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
> Tested-by: Guilherme G. Piccoli 
> Signed-off-by: Milton Miller 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> 
>   * v2 changelog: added netdev NULL check on tg3_io_resume() as per
> Michael Chan's suggestion.

Applied and queued up for -stable, thanks!

Re: [PATCH net-next 00/10] net: dsa: mv88e6xxx: Global (1) cosmetics

2016-09-29 Thread David Miller

From: Vivien Didelot 
Date: Thu, 29 Sep 2016 12:21:52 -0400

> The Global (1) internal SMI device of Marvell switches is a set of
> registers providing support to different units for MAC addresses (ATU),
> VLANs (VTU), PHY polling (PPU), etc.
 ...

Looks like a very nice set of cleanups to me.

Series applied, thanks!

Re: [PATCH RFC net-next] bnx2x: avoid printing unnecessary messages during register dump

2016-09-29 Thread David Miller

From: "Guilherme G. Piccoli" 
Date: Thu, 29 Sep 2016 13:19:39 -0300

> David, thanks for your comment. I confess I didn't understand your
> statement quite well. You say we shouldn't dump registers that will
> cause timeouts, that's it?

Yes, basically.

If this happened infrequently for one or two registers maybe
that would be OK, but this seems to timeout on many registers
especially if the device is up and operational during the
"ethtool -d", and that's a bit much.

Re: [PATCH v2 3/3] net: make net namespace sysctls belong to container's owner

2016-09-29 Thread David Miller

From: Dmitry Torokhov 
Date: Thu, 29 Sep 2016 08:46:05 -0700

> Hi David,
> 
> On Wed, Aug 10, 2016 at 2:36 PM, Dmitry Torokhov
>  wrote:
>> If net namespace is attached to a user namespace let's make container's
>> root owner of sysctls affecting said network namespace instead of global
>> root.
>>
>> This also allows us to clean up net_ctl_permissions() because we do not
>> need to fudge permissions anymore for the container's owner since it now
>> owns the objects in question.
>>
>> Acked-by: "Eric W. Biederman" 
>> Signed-off-by: Dmitry Torokhov 
> 
> I was looking at linux-next today, and I noticed that, when you merged
> my patch, you basically reverted the following commit:
> 
> commit d6e0d306449bcb5fa3c80e7a3edf11d45abf9ae9
> Author: Tyler Hicks 
> Date:   Thu Jun 2 23:43:22 2016 -0500
> 
> net: Use ns_capable_noaudit() when determining net sysctl permissions

Please send me a fixup patch for this, sorry.

[PATCH v6 7/7] net: Suppress the "Comparison to NULL could be written" warnings

2016-09-29 Thread Jia He

This is to suppress the checkpatch.pl warning "Comparison to NULL
could be written". No functional changes here.

Signed-off-by: Jia He 
---
 net/ipv4/proc.c | 32 
 net/sctp/proc.c |  2 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9d7a39a..b39faf6 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -357,22 +357,22 @@ static void icmp_put(struct seq_file *seq)
atomic_long_t *ptr = net->mib.icmpmsg_statistics->mibs;
 
seq_puts(seq, "\nIcmp: InMsgs InErrors InCsumErrors");
-   for (i = 0; icmpmibmap[i].name != NULL; i++)
+   for (i = 0; icmpmibmap[i].name; i++)
seq_printf(seq, " In%s", icmpmibmap[i].name);
seq_puts(seq, " OutMsgs OutErrors");
-   for (i = 0; icmpmibmap[i].name != NULL; i++)
+   for (i = 0; icmpmibmap[i].name; i++)
seq_printf(seq, " Out%s", icmpmibmap[i].name);
seq_printf(seq, "\nIcmp: %lu %lu %lu",
snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_INMSGS),
snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_INERRORS),
snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_CSUMERRORS));
-   for (i = 0; icmpmibmap[i].name != NULL; i++)
+   for (i = 0; icmpmibmap[i].name; i++)
seq_printf(seq, " %lu",
   atomic_long_read(ptr + icmpmibmap[i].index));
seq_printf(seq, " %lu %lu",
snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_OUTMSGS),
snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_OUTERRORS));
-   for (i = 0; icmpmibmap[i].name != NULL; i++)
+   for (i = 0; icmpmibmap[i].name; i++)
seq_printf(seq, " %lu",
   atomic_long_read(ptr + (icmpmibmap[i].index | 
0x100)));
 }
@@ -389,7 +389,7 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, void 
*v)
memset(buff64, 0, IPSTATS_MIB_MAX * sizeof(u64));
 
seq_puts(seq, "Ip: Forwarding DefaultTTL");
-   for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
+   for (i = 0; snmp4_ipstats_list[i].name; i++)
seq_printf(seq, " %s", snmp4_ipstats_list[i].name);
 
seq_printf(seq, "\nIp: %d %d",
@@ -400,7 +400,7 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, void 
*v)
snmp_get_cpu_field64_batch(buff64, snmp4_ipstats_list,
   net->mib.ip_statistics,
   offsetof(struct ipstats_mib, syncp));
-   for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
+   for (i = 0; snmp4_ipstats_list[i].name; i++)
seq_printf(seq, " %llu", buff64[i]);
 
return 0;
@@ -415,13 +415,13 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, 
void *v)
memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
 
seq_puts(seq, "\nTcp:");
-   for (i = 0; snmp4_tcp_list[i].name != NULL; i++)
+   for (i = 0; snmp4_tcp_list[i].name; i++)
seq_printf(seq, " %s", snmp4_tcp_list[i].name);
 
seq_puts(seq, "\nTcp:");
snmp_get_cpu_field_batch(buff, snmp4_tcp_list,
 net->mib.tcp_statistics);
-   for (i = 0; snmp4_tcp_list[i].name != NULL; i++) {
+   for (i = 0; snmp4_tcp_list[i].name; i++) {
/* MaxConn field is signed, RFC 2012 */
if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
seq_printf(seq, " %ld", buff[i]);
@@ -434,10 +434,10 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, 
void *v)
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udp_statistics);
seq_puts(seq, "\nUdp:");
-   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %s", snmp4_udp_list[i].name);
seq_puts(seq, "\nUdp:");
-   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
@@ -446,10 +446,10 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, 
void *v)
seq_puts(seq, "\nUdpLite:");
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
 net->mib.udplite_statistics);
-   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %s", snmp4_udp_list[i].name);
seq_puts(seq, "\nUdpLite:");
-   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
 
seq_putc(seq, '\n');
@@ -492,21 +492,21 @@ static int netstat_seq_show(struct seq_file *seq, void *v)
struct net *net = seq->private;
 
seq_puts(seq, "TcpExt:");

[PATCH v6 4/7] proc: Reduce cache miss in sctp_snmp_seq_show

2016-09-29 Thread Jia He

This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He 
---
 net/sctp/proc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index ef8ba77..09e16c2 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -73,13 +73,17 @@ static const struct snmp_mib sctp_snmp_list[] = {
 /* Display sctp snmp mib statistics(/proc/net/sctp/snmp). */
 static int sctp_snmp_seq_show(struct seq_file *seq, void *v)
 {
+   unsigned long buff[SCTP_MIB_MAX];
struct net *net = seq->private;
int i;
 
+   memset(buff, 0, sizeof(unsigned long) * SCTP_MIB_MAX);
+
+   snmp_get_cpu_field_batch(buff, sctp_snmp_list,
+net->sctp.sctp_statistics);
for (i = 0; sctp_snmp_list[i].name != NULL; i++)
seq_printf(seq, "%-32s\t%ld\n", sctp_snmp_list[i].name,
-  snmp_fold_field(net->sctp.sctp_statistics,
- sctp_snmp_list[i].entry));
+   buff[i]);
 
return 0;
 }
-- 
2.5.5

[PATCH v6 1/7] net:snmp: Introduce generic interfaces for snmp_get_cpu_field{,64}

2016-09-29 Thread Jia He

This is to introduce the generic interfaces for snmp_get_cpu_field{,64}.
It exchanges the two for-loops for collecting the percpu statistics data.
This can aggregate the data by going through all the items of each cpu
sequentially.

Signed-off-by: Jia He 
Suggested-by: Marcelo Ricardo Leitner 
---
 include/net/ip.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 9742b92..bc43c0f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -219,6 +219,29 @@ static inline u64 snmp_fold_field64(void __percpu *mib, 
int offt, size_t syncp_o
 }
 #endif
 
+#define snmp_get_cpu_field64_batch(buff64, stats_list, mib_statistic, offset) \
+{ \
+   int i, c; \
+   for_each_possible_cpu(c) { \
+   for (i = 0; stats_list[i].name; i++) \
+   buff64[i] += snmp_get_cpu_field64( \
+   mib_statistic, \
+   c, stats_list[i].entry, \
+   offset); \
+   } \
+}
+
+#define snmp_get_cpu_field_batch(buff, stats_list, mib_statistic) \
+{ \
+   int i, c; \
+   for_each_possible_cpu(c) { \
+   for (i = 0; stats_list[i].name; i++) \
+   buff[i] += snmp_get_cpu_field( \
+   mib_statistic, \
+   c, stats_list[i].entry); \
+   } \
+}
+
 void inet_get_local_port_range(struct net *net, int *low, int *high);
 
 #ifdef CONFIG_SYSCTL
-- 
2.5.5

[PATCH v6 5/7] proc: Reduce cache miss in xfrm_statistics_seq_show

2016-09-29 Thread Jia He

This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He 
---
 net/xfrm/xfrm_proc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_proc.c b/net/xfrm/xfrm_proc.c
index 9c4fbd8..ba2b539 100644
--- a/net/xfrm/xfrm_proc.c
+++ b/net/xfrm/xfrm_proc.c
@@ -50,12 +50,18 @@ static const struct snmp_mib xfrm_mib_list[] = {
 
 static int xfrm_statistics_seq_show(struct seq_file *seq, void *v)
 {
+   unsigned long buff[LINUX_MIB_XFRMMAX];
struct net *net = seq->private;
int i;
+
+   memset(buff, 0, sizeof(unsigned long) * LINUX_MIB_XFRMMAX);
+
+   snmp_get_cpu_field_batch(buff, xfrm_mib_list,
+net->mib.xfrm_statistics);
for (i = 0; xfrm_mib_list[i].name; i++)
seq_printf(seq, "%-24s\t%lu\n", xfrm_mib_list[i].name,
-  snmp_fold_field(net->mib.xfrm_statistics,
-  xfrm_mib_list[i].entry));
+   buff[i]);
+
return 0;
 }
 
-- 
2.5.5

[PATCH v6 2/7] proc: Reduce cache miss in snmp_seq_show

2016-09-29 Thread Jia He

This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.
Then snmp_seq_show is split into 2 parts to avoid build warning "the frame
size" larger than 1024.

Signed-off-by: Jia He 
---
 net/ipv4/proc.c | 70 ++---
 1 file changed, 47 insertions(+), 23 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9f665b6..9d7a39a 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,6 +46,8 @@
 #include 
 #include 
 
+#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
+
 /*
  * Report socket allocation statistics [m...@utu.fi]
  */
@@ -378,13 +380,15 @@ static void icmp_put(struct seq_file *seq)
 /*
  * Called from the PROCfs module. This outputs /proc/net/snmp.
  */
-static int snmp_seq_show(struct seq_file *seq, void *v)
+static int snmp_seq_show_ipstats(struct seq_file *seq, void *v)
 {
-   int i;
struct net *net = seq->private;
+   u64 buff64[IPSTATS_MIB_MAX];
+   int i;
 
-   seq_puts(seq, "Ip: Forwarding DefaultTTL");
+   memset(buff64, 0, IPSTATS_MIB_MAX * sizeof(u64));
 
+   seq_puts(seq, "Ip: Forwarding DefaultTTL");
for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
seq_printf(seq, " %s", snmp4_ipstats_list[i].name);
 
@@ -393,57 +397,77 @@ static int snmp_seq_show(struct seq_file *seq, void *v)
   net->ipv4.sysctl_ip_default_ttl);
 
BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0);
+   snmp_get_cpu_field64_batch(buff64, snmp4_ipstats_list,
+  net->mib.ip_statistics,
+  offsetof(struct ipstats_mib, syncp));
for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
-   seq_printf(seq, " %llu",
-  snmp_fold_field64(net->mib.ip_statistics,
-snmp4_ipstats_list[i].entry,
-offsetof(struct ipstats_mib, 
syncp)));
+   seq_printf(seq, " %llu", buff64[i]);
 
-   icmp_put(seq);  /* RFC 2011 compatibility */
-   icmpmsg_put(seq);
+   return 0;
+}
+
+static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
+{
+   unsigned long buff[TCPUDP_MIB_MAX];
+   struct net *net = seq->private;
+   int i;
+
+   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
 
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name != NULL; i++)
seq_printf(seq, " %s", snmp4_tcp_list[i].name);
 
seq_puts(seq, "\nTcp:");
+   snmp_get_cpu_field_batch(buff, snmp4_tcp_list,
+net->mib.tcp_statistics);
for (i = 0; snmp4_tcp_list[i].name != NULL; i++) {
/* MaxConn field is signed, RFC 2012 */
if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
-   seq_printf(seq, " %ld",
-  snmp_fold_field(net->mib.tcp_statistics,
-  snmp4_tcp_list[i].entry));
+   seq_printf(seq, " %ld", buff[i]);
else
-   seq_printf(seq, " %lu",
-  snmp_fold_field(net->mib.tcp_statistics,
-  snmp4_tcp_list[i].entry));
+   seq_printf(seq, " %lu", buff[i]);
}
 
+   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+
+   snmp_get_cpu_field_batch(buff, snmp4_udp_list,
+net->mib.udp_statistics);
seq_puts(seq, "\nUdp:");
for (i = 0; snmp4_udp_list[i].name != NULL; i++)
seq_printf(seq, " %s", snmp4_udp_list[i].name);
-
seq_puts(seq, "\nUdp:");
for (i = 0; snmp4_udp_list[i].name != NULL; i++)
-   seq_printf(seq, " %lu",
-  snmp_fold_field(net->mib.udp_statistics,
-  snmp4_udp_list[i].entry));
+   seq_printf(seq, " %lu", buff[i]);
+
+   memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
 
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
+   snmp_get_cpu_field_batch(buff, snmp4_udp_list,
+net->mib.udplite_statistics);
for (i = 0; snmp4_udp_list[i].name != NULL; i++)
seq_printf(seq, " %s", snmp4_udp_list[i].name);
-
seq_puts(seq, "\nUdpLite:");
for (i = 0; snmp4_udp_list[i].name != NULL; i++)
-   seq_printf(seq, " %lu",
-  snmp_fold_field(net->mib.udplite_statistics,
-  snmp4_udp_list[i].entry));
+   seq_printf(seq, " %lu", buff[i]);
 
seq_putc(seq, '\n');
return 0;
 }
 
+static int snmp_seq_show(struct seq_file *se

[PATCH v6 3/7] proc: Reduce cache miss in snmp6_seq_show

2016-09-29 Thread Jia He

This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He 
---
 net/ipv6/proc.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 679253d0..cc8e3ae 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,6 +30,11 @@
 #include 
 #include 
 
+#define MAX4(a, b, c, d) \
+   max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
+#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
+   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
struct net *net = seq->private;
@@ -191,25 +196,34 @@ static void snmp6_seq_show_item(struct seq_file *seq, 
void __percpu *pcpumib,
atomic_long_t *smib,
const struct snmp_mib *itemlist)
 {
+   unsigned long buff[SNMP_MIB_MAX];
int i;
-   unsigned long val;
 
-   for (i = 0; itemlist[i].name; i++) {
-   val = pcpumib ?
-   snmp_fold_field(pcpumib, itemlist[i].entry) :
-   atomic_long_read(smib + itemlist[i].entry);
-   seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name, val);
+   if (pcpumib) {
+   memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX);
+
+   snmp_get_cpu_field_batch(buff, itemlist, pcpumib);
+   for (i = 0; itemlist[i].name; i++)
+   seq_printf(seq, "%-32s\t%lu\n",
+  itemlist[i].name, buff[i]);
+   } else {
+   for (i = 0; itemlist[i].name; i++)
+   seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name,
+  atomic_long_read(smib + itemlist[i].entry));
}
 }
 
 static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu *mib,
  const struct snmp_mib *itemlist, size_t 
syncpoff)
 {
+   u64 buff64[SNMP_MIB_MAX];
int i;
 
+   memset(buff64, 0, sizeof(unsigned long) * SNMP_MIB_MAX);
+
+   snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff);
for (i = 0; itemlist[i].name; i++)
-   seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name,
-  snmp_fold_field64(mib, itemlist[i].entry, syncpoff));
+   seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name, buff64[i]);
 }
 
 static int snmp6_seq_show(struct seq_file *seq, void *v)
-- 
2.5.5

[PATCH v6 6/7] ipv6: Remove useless parameter in __snmp6_fill_statsdev

2016-09-29 Thread Jia He

The parameter items(is always ICMP6_MIB_MAX) is useless for 
__snmp6_fill_statsdev

Signed-off-by: Jia He 
---
 net/ipv6/addrconf.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f1f5d4..35d4baa 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4961,18 +4961,18 @@ static inline size_t inet6_if_nlmsg_size(void)
 }
 
 static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib,
- int items, int bytes)
+   int bytes)
 {
int i;
-   int pad = bytes - sizeof(u64) * items;
+   int pad = bytes - sizeof(u64) * ICMP6_MIB_MAX;
BUG_ON(pad < 0);
 
/* Use put_unaligned() because stats may not be aligned for u64. */
-   put_unaligned(items, &stats[0]);
-   for (i = 1; i < items; i++)
+   put_unaligned(ICMP6_MIB_MAX, &stats[0]);
+   for (i = 1; i < ICMP6_MIB_MAX; i++)
put_unaligned(atomic_long_read(&mib[i]), &stats[i]);
 
-   memset(&stats[items], 0, pad);
+   memset(&stats[ICMP6_MIB_MAX], 0, pad);
 }
 
 static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib,
@@ -5005,7 +5005,7 @@ static void snmp6_fill_stats(u64 *stats, struct inet6_dev 
*idev, int attrtype,
 offsetof(struct ipstats_mib, syncp));
break;
case IFLA_INET6_ICMP6STATS:
-   __snmp6_fill_statsdev(stats, idev->stats.icmpv6dev->mibs, 
ICMP6_MIB_MAX, bytes);
+   __snmp6_fill_statsdev(stats, idev->stats.icmpv6dev->mibs, 
bytes);
break;
}
 }
-- 
2.5.5

[PATCH v6 0/7] Reduce cache miss for snmp_fold_field

2016-09-29 Thread Jia He

In a PowerPc server with large cpu number(160), besides commit
a3a773726c9f ("net: Optimize snmp stat aggregation by walking all
the percpu data at once"), I watched several other snmp_fold_field
callsites which would cause high cache miss rate.

test source code:

My simple test case, which read from the procfs items endlessly:
/***/
#include 
#include 
#include 
#include 
#include 
#define LINELEN  2560
int main(int argc, char **argv)
{
int i;
int fd = -1 ;
int rdsize = 0;
char buf[LINELEN+1];

buf[LINELEN] = 0;
memset(buf,0,LINELEN);

if(1 >= argc) {
printf("file name empty\n");
return -1;
}

fd = open(argv[1], O_RDWR, 0644);
if(0 > fd){
printf("open error\n");
return -2;
}

for(i=0;i<0x;i++) {
while(0 < (rdsize = read(fd,buf,LINELEN))){
//nothing here
}

lseek(fd, 0, SEEK_SET);
}

close(fd);
return 0;
}
/**/

compile and run:

gcc test.c -o test

perf stat -d -e cache-misses ./test /proc/net/snmp
perf stat -d -e cache-misses ./test /proc/net/snmp6
perf stat -d -e cache-misses ./test /proc/net/sctp/snmp
perf stat -d -e cache-misses ./test /proc/net/xfrm_stat

before the patch set:

 Performance counter stats for 'system wide':

 355911097  cache-misses
 [40.08%]
2356829300  L1-dcache-loads 
 [60.04%]
 355642645  L1-dcache-load-misses #   15.09% of all L1-dcache 
hits   [60.02%]
 346544541  LLC-loads   
 [59.97%]
389763  LLC-load-misses   #0.11% of all LL-cache 
hits[40.02%]

   6.245162638 seconds time elapsed

After the patch set:
===
 Performance counter stats for 'system wide':

 194992476  cache-misses
 [40.03%]
6718051877  L1-dcache-loads 
 [60.07%]
 194871921  L1-dcache-load-misses #2.90% of all L1-dcache 
hits   [60.11%]
 187632232  LLC-loads   
 [60.04%]
464466  LLC-load-misses   #0.25% of all LL-cache 
hits[39.89%]

   6.868422769 seconds time elapsed
The cache-miss rate can be reduced from 15% to 2.9%

changelog
=
v6:
- correct v5 
v5:
- order local variables from longest to shortest line
v4:
- move memset into one block of if statement in snmp6_seq_show_item
- remove the changes in netstat_seq_show considerred the stack usage is too 
large
v3:
- introduce generic interface (suggested by Marcelo Ricardo Leitner)
- use max_t instead of self defined macro (suggested by David Miller)
v2:
- fix bug in udplite statistics.
- snmp_seq_show is split into 2 parts

Jia He (7):
  net:snmp: Introduce generic interfaces for snmp_get_cpu_field{,64}
  proc: Reduce cache miss in snmp_seq_show
  proc: Reduce cache miss in snmp6_seq_show
  proc: Reduce cache miss in sctp_snmp_seq_show
  proc: Reduce cache miss in xfrm_statistics_seq_show
  ipv6: Remove useless parameter in __snmp6_fill_statsdev
  net: Suppress the "Comparison to NULL could be written" warnings

 include/net/ip.h |  23 
 net/ipv4/proc.c  | 102 +++
 net/ipv6/addrconf.c  |  12 +++---
 net/ipv6/proc.c  |  30 +++
 net/sctp/proc.c  |  10 +++--
 net/xfrm/xfrm_proc.c |  10 -
 6 files changed, 129 insertions(+), 58 deletions(-)

-- 
2.5.5

Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-09-29 Thread R. Parameswaran

Hi James,

On Thu, 29 Sep 2016, James Chapman wrote:

> On 22/09/16 21:52, R. Parameswaran wrote:
> > From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> > From: "R. Parameswaran" 
> > Date: Thu, 22 Sep 2016 13:19:25 -0700
> > Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
> >
> > Take into account all of the tunnel encapsulation headers when setting
> > up the MTU on the L2TP logical interface device. Otherwise, packets
> > created by the applications on top of the L2TP layer are larger
> > than they ought to be, relative to the underlay MTU, leading to
> > needless fragmentation once the outer IP encap is added.
> >
> > Specifically, take into account the (outer, underlay) IP header
> > imposed on the encapsulated L2TP packet, and the Layer 2 header
> > imposed on the inner IP packet prior to L2TP encapsulation.
> >
> > Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> > and the dst entry in the L2TP tunnel socket to directly pull up
> > the underlay MTU (as the baseline number on top of which the
> > encapsulation headers are factored in).  Fall back to Ethernet MTU
> > if this fails.
> >
> > Signed-off-by: R. Parameswaran 
> >
> > Reviewed-by: "N. Prachanda" ,
> > Reviewed-by: "R. Shearman" ,
> > Reviewed-by: "D. Fawcus" 
> > ---
> >  net/l2tp/l2tp_eth.c | 48 
> >  1 file changed, 44 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> > index 57fc5a4..dbcd6bd 100644
> > --- a/net/l2tp/l2tp_eth.c
> > +++ b/net/l2tp/l2tp_eth.c
> > @@ -30,6 +30,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> >  
> >  #include "l2tp_core.h"
> >  
> > @@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void 
> > *arg)
> >  }
> >  #endif
> >  
> > +static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
> > +   struct l2tp_session *session,
> > +   struct net_device *dev)
> > +{
> > +   unsigned int overhead = 0;
> > +   struct dst_entry *dst;
> > +
> > +   if (session->mtu != 0) {
> > +   dev->mtu = session->mtu;
> > +   dev->needed_headroom += session->hdr_len;
> > +   if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> > +   dev->needed_headroom += sizeof(struct udphdr);
> > +   return;
> > +   }
> > +   overhead = session->hdr_len;
> > +   /* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
> > +   if (tunnel->sock->sk_family == AF_INET)
> > +   overhead += (ETH_HLEN + sizeof(struct iphdr));
> > +   else if (tunnel->sock->sk_family == AF_INET6)
> > +   overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
> What about options in the IP header? If certain options are set on the
> socket, the IP header may be larger.
> 

Thanks for the reply - It looks like IP options can only be 
enabled through setsockopt on an application's socket (if there's any 
other way to turn on IP options, please let me know - didn't see any 
sysctl setting for transmit). This scenario would come 
into picture when an application opens a raw IP or UDP socket such that it 
routes into the L2TP logical interface.

If you take the case of a plain IP (ethernet) interface, even if an
application opened a socket turning on IP options, it would not change
the MTU of the underlying interface, and it would not affect other 
applications transacting packets on the same interface. I know its not an 
exact parallel to this case, but since the IP option control is per 
application, we probably should not factor it into the L2TP logical interface?
We cannot affect other applications/processes running on the same L2TP 
tunnel. Also, since the application  using IP options knows that it has turned 
on IP options, maybe we can count on it to factor the size of the options 
into the size of the payload it sends into the socket, or set the mtu on the 
L2TP interface through config? 

Other than this, I don't see keepalives or anything else in which the 
kernel will source its own packet into the L2TP interface, outside of 
an application injected packet - if there is something like that, please
let me know. The user space L2TP daemon would probably fall in the 
category of applications.

thanks,

Ramkumar 

> > +   /* Additionally, if the encap is UDP, account for UDP header size */
> > +   if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> > +   overhead += sizeof(struct udphdr);
> > +   /* If PMTU discovery was enabled, use discovered MTU on L2TP device */
> > +   dst = sk_dst_get(tunnel->sock);
> > +   if (dst) {
> > +   u32 pmtu = dst_mtu(dst);
> > +
> > +   if (pmtu != 0)
> > +   dev->mtu = pmtu;
> > +   dst_release(dst);
> > +   }
> > +   /* else (no PMTUD) L2TP dev MTU defaulted to Ethernet MTU in caller */
> > +   session->mtu = dev->mtu - overhead;
> > +   dev->mtu = session->mtu;
> >

Re: [PATCH net-next v3] netfilter: xt_hashlimit: Fix link error in 32bit arch because of 64bit division

2016-09-29 Thread Maciej Żenczykowski

> +   return div64_u64((user * HZ * CREDITS_PER_JIFFY_v1),
> +XT_HASHLIMIT_SCALE);

spurious ()

> +   return div64_u64((user * HZ * CREDITS_PER_JIFFY),
> +XT_HASHLIMIT_SCALE_v2);

spurious ()

but:

Acked-by: Maciej Żenczykowski

Re: How to submit potential patch in linux kernel

2016-09-29 Thread Shyam Saini

On Thu, 2016-09-29 at 13:15 +0300, Daniel Baluta wrote:
> Hi Shyam,
> 
> On Thu, Sep 29, 2016 at 1:02 PM, Andrew Lunn  wrote:
> > 
> > On Thu, Sep 29, 2016 at 04:32:29AM +0530, Shyam Saini wrote:
> > > 
> > > Hi everyone,
> > > 
> > > I'm Shyam, final year undergraduate student. I wanted to know how
> > > one can
> > > submit potential linux kernel patch in networking subsystem.
> > Documentation/SubmittingPatches is a good starting point.
> > 
> > Then post a patch and we will help you learn the rest of the
> > process.
> 
> You can have a look at https://kernelnewbies.org/FirstKernelPatch.
> There is also this presentation from Greg:
> https://www.youtube.com/watch?v=LLBrBBImJt4
> 
> thanks,
> Daniel.


Thanks for your reply. But I already know how to submit patches. I've
submitted some trivial kernel patches. But Now, I want to submit some
potential or good patches. 
So, I wanted to know how can I move to that direction? 

Thanks, 
Shyam

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-09-29 Thread Eric Dumazet

On Fri, 2016-09-30 at 01:28 +0200, Christian Lamparter wrote:
> On Wednesday, September 28, 2016 7:20:39 PM CEST Jay Smith wrote:
> > Actually, on a little more searching of this list's archives, I think
> > that this discussion:  https://patchwork.kernel.org/patch/9260733/ is
> > about exactly the same issue I've found, except from the TCP side. I'm
> > cc'ing a few of the participants from that discussion.
> > 
> > So is the patch proposed there (copying and restoring the entire
> > iov_iter in skb_copy_and_csum_datagram_msg()) being considered as a
> > fix?
> 
> From Alan's post:
> 
> "My ugly patch fixes this in the most obvious way: make a local copy of
> msg->msg_iter before the call to skb_copy_and_csum_datagram(), and copy
> it back if the checksum is bad, just before goto csum_error;"
> 
> IMHO this meant that the patch is a proof of concept for his problem.
> 
> > If not, would an alternate one that concealed the save-and-restore logic
> > inside iov_iter.c be more acceptable? I'd be happy to produce whatever's
> > needed, or yield to someone with stronger feelings about where it should
> > go...
> Al Viro identified more inconsistencies within the error-paths that deal
> with EFAULT in the whole area (in and around skb_copy_and_csum_datagram()).
> 
> As far as I can tell the original discussion about the data corruption
> issue went off on a tangent and it is stuck in figuring out "How to handle
> the errors in tcp_copy_to_iovec()".
> 
> As for fixing the issue: I'm happy to test and review patches. 
> The trouble is that nobody seem to be able to produce them...
> 

This is doable with a bit of fault injection I believe.

And "ethtool -K eth0 rx off gro off lro off"  to let the TCP receiver
compute the checksum itself.

Re: UDP wierdness around skb_copy_and_csum_datagram_msg()

2016-09-29 Thread Christian Lamparter

On Wednesday, September 28, 2016 7:20:39 PM CEST Jay Smith wrote:
> Actually, on a little more searching of this list's archives, I think
> that this discussion:  https://patchwork.kernel.org/patch/9260733/ is
> about exactly the same issue I've found, except from the TCP side. I'm
> cc'ing a few of the participants from that discussion.
> 
> So is the patch proposed there (copying and restoring the entire
> iov_iter in skb_copy_and_csum_datagram_msg()) being considered as a
> fix?

>From Alan's post:

"My ugly patch fixes this in the most obvious way: make a local copy of
msg->msg_iter before the call to skb_copy_and_csum_datagram(), and copy
it back if the checksum is bad, just before goto csum_error;"

IMHO this meant that the patch is a proof of concept for his problem.

> If not, would an alternate one that concealed the save-and-restore logic
> inside iov_iter.c be more acceptable? I'd be happy to produce whatever's
> needed, or yield to someone with stronger feelings about where it should
> go...
Al Viro identified more inconsistencies within the error-paths that deal
with EFAULT in the whole area (in and around skb_copy_and_csum_datagram()).

As far as I can tell the original discussion about the data corruption
issue went off on a tangent and it is stuck in figuring out "How to handle
the errors in tcp_copy_to_iovec()".

As for fixing the issue: I'm happy to test and review patches. 
The trouble is that nobody seem to be able to produce them...

Regards,
Christian

Re: [PATCH net-next 2/2] openvswitch: remove skb_mpls_header

2016-09-29 Thread pravin shelar

On Thu, Sep 29, 2016 at 12:19 PM, Jiri Benc  wrote:
> skb_mpls_header is equivalent to skb_network_header now. There's no reason
> to keep it.
>
> Signed-off-by: Jiri Benc 
> ---
>  include/net/mpls.h| 11 ---
>  net/openvswitch/actions.c | 10 +-
>  2 files changed, 5 insertions(+), 16 deletions(-)
>
> diff --git a/include/net/mpls.h b/include/net/mpls.h
> index 5b3b5addfb08..fde22d0b0ec1 100644
> --- a/include/net/mpls.h
> +++ b/include/net/mpls.h
> @@ -25,15 +25,4 @@ static inline bool eth_p_mpls(__be16 eth_type)
> eth_type == htons(ETH_P_MPLS_MC);
>  }
>
> -/*
> - * For non-MPLS skbs this will correspond to the network header.
> - * For MPLS skbs it will be before the network_header as the MPLS
> - * label stack lies between the end of the mac header and the network
> - * header. That is, for MPLS skbs the end of the mac header
> - * is the top of the MPLS label stack.
> - */
> -static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
> -{
> -   return skb_mac_header(skb) + skb->mac_len;
> -}

I think we should keep this API, so that it is clear that MPLS header
mapped skb network header.

Re: [PATCH net-next 1/2] openvswitch: mpls: set network header correctly on key extract

2016-09-29 Thread pravin shelar

On Thu, Sep 29, 2016 at 12:19 PM, Jiri Benc  wrote:
> After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
> openvswitch was changed to have network header pointing to the start of the
> MPLS headers and inner_network_header pointing after the MPLS headers.
>
> However, key_extract was missed by the mentioned commit, causing incorrect
> headers to be set when a MPLS packet just enters the bridge or after it is
> recirculated.
>
> Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
> Signed-off-by: Jiri Benc 

Acked-by: Pravin B Shelar

Re: next-20160929 build: 2 failures 4 warnings (next-20160929)

2016-09-29 Thread Arnd Bergmann

On Thursday 29 September 2016, Vishwanath Pai wrote:
> I have sent a patch for this a couple of days ago to netdev, it hasn't
> made it to net-next yet. Here's the latest one:
> 
> [PATCH net-next v3] netfilter: xt_hashlimit: Fix link error in 32bit
> arch because of 64bit division
> 
> This should fix the link error.

I also did a patch (not submitted yet), but my solution used 32-bit
math for the version 1 case. I think that would be better so we
don't slow down 32-bit architectures too much (div_u64_u64
is very slow).

Arnd

[PATCH net-next 1/6] rxrpc: Fix exclusive client connections

2016-09-29 Thread David Howells

Exclusive connections are currently reusable (which they shouldn't be)
because rxrpc_alloc_client_connection() checks the exclusive flag in the
rxrpc_connection struct before it's initialised from the function
parameters.  This means that the DONT_REUSE flag doesn't get set.

Fix this by checking the function parameters for the exclusive flag.

Signed-off-by: David Howells 
---

 net/rxrpc/conn_client.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index c76a125df891..f5ee8bfa5bef 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -200,7 +200,7 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters 
*cp, gfp_t gfp)
}
 
atomic_set(&conn->usage, 1);
-   if (conn->params.exclusive)
+   if (cp->exclusive)
__set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags);
 
conn->params= *cp;

[PATCH net-next 2/6] rxrpc: Make Tx loss-injection go through normal return and adjust tracing

2016-09-29 Thread David Howells

In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.

Whilst we're at it:

 (1) Add to the tx_data tracepoint an indication of whether or not we're
 retransmitting a data packet.

 (2) When we're deciding whether or not to request an ACK, rather than
 checking if we're in fast-retransmit mode check instead if we're
 retransmitting.

 (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
 not altering the sk_buff refcount nor are we just seeing it after
 getting it off the Tx list.

 (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

 (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |6 --
 net/rxrpc/ar-internal.h  |3 +--
 net/rxrpc/call_event.c   |2 +-
 net/rxrpc/misc.c |1 -
 net/rxrpc/output.c   |   17 +
 net/rxrpc/sendmsg.c  |2 +-
 net/rxrpc/skbuff.c   |   11 +++
 7 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index ada12d00118c..8ba8d76e856a 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -258,15 +258,16 @@ TRACE_EVENT(rxrpc_rx_ack,
 
 TRACE_EVENT(rxrpc_tx_data,
TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq,
-rxrpc_serial_t serial, u8 flags, bool lose),
+rxrpc_serial_t serial, u8 flags, bool retrans, bool lose),
 
-   TP_ARGS(call, seq, serial, flags, lose),
+   TP_ARGS(call, seq, serial, flags, retrans, lose),
 
TP_STRUCT__entry(
__field(struct rxrpc_call *,call)
__field(rxrpc_seq_t,seq )
__field(rxrpc_serial_t, serial  )
__field(u8, flags   )
+   __field(bool,   retrans )
__field(bool,   lose)
 ),
 
@@ -275,6 +276,7 @@ TRACE_EVENT(rxrpc_tx_data,
__entry->seq = seq;
__entry->serial = serial;
__entry->flags = flags;
+   __entry->retrans = retrans;
__entry->lose = lose;
   ),
 
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ca96e547cb9a..6aadaa7d8b43 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -603,7 +603,6 @@ enum rxrpc_skb_trace {
rxrpc_skb_tx_cleaned,
rxrpc_skb_tx_freed,
rxrpc_skb_tx_got,
-   rxrpc_skb_tx_lost,
rxrpc_skb_tx_new,
rxrpc_skb_tx_rotated,
rxrpc_skb_tx_seen,
@@ -1073,7 +1072,7 @@ extern const s8 rxrpc_ack_priority[];
  * output.c
  */
 int rxrpc_send_call_packet(struct rxrpc_call *, u8);
-int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *);
+int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *, bool);
 void rxrpc_reject_packets(struct rxrpc_local *);
 
 /*
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 0e8478012212..1f6c7633b964 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -256,7 +256,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
rxrpc_get_skb(skb, rxrpc_skb_tx_got);
spin_unlock_bh(&call->lock);
 
-   if (rxrpc_send_data_packet(call, skb) < 0) {
+   if (rxrpc_send_data_packet(call, skb, true) < 0) {
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
return;
}
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index aedb8978226d..47dddacdbb91 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -108,7 +108,6 @@ const char rxrpc_skb_traces[rxrpc_skb__nr_trace][7] = {
[rxrpc_skb_tx_cleaned]  = "Tx CLN",
[rxrpc_skb_tx_freed]= "Tx FRE",
[rxrpc_skb_tx_got]  = "Tx GOT",
-   [rxrpc_skb_tx_lost] = "Tx *L*",
[rxrpc_skb_tx_new]  = "Tx NEW",
[rxrpc_skb_tx_rotated]  = "Tx ROT",
[rxrpc_skb_tx_seen] = "Tx SEE",
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index cf43a715685e..ac9a58b619a6 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -238,7 +238,8 @@ out:
 /*
  * send a packet through the transport endpoint
  */
-int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb)
+int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb,
+  bool retrans)
 {
struct rxrpc_connection *conn = call->conn;

[PATCH net-next 3/6] rxrpc: When activating client conn channels, do state check inside lock

2016-09-29 Thread David Howells

In rxrpc_activate_channels(), the connection cache state is checked outside
of the lock, which means it can change whilst we're waking calls up,
thereby changing whether or not we're allowed to wake calls up.

Fix this by moving the check inside the locked region.  The check to see if
all the channels are currently busy can stay outside of the locked region.

Whilst we're at it:

 (1) Split the locked section out into its own function so that we can call
 it from other places in a later patch.

 (2) Determine the mask of channels dependent on the state as we're going
 to add another state in a later patch that will restrict the number of
 simultaneous calls to 1 on a connection.

Signed-off-by: David Howells 
---

 net/rxrpc/conn_client.c |   36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index f5ee8bfa5bef..60ef9605167e 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -576,28 +576,42 @@ static void rxrpc_activate_one_channel(struct 
rxrpc_connection *conn,
 }
 
 /*
+ * Assign channels and callNumbers to waiting calls with channel_lock
+ * held by caller.
+ */
+static void rxrpc_activate_channels_locked(struct rxrpc_connection *conn)
+{
+   u8 avail, mask;
+
+   switch (conn->cache_state) {
+   case RXRPC_CONN_CLIENT_ACTIVE:
+   mask = RXRPC_ACTIVE_CHANS_MASK;
+   break;
+   default:
+   return;
+   }
+
+   while (!list_empty(&conn->waiting_calls) &&
+  (avail = ~conn->active_chans,
+   avail &= mask,
+   avail != 0))
+   rxrpc_activate_one_channel(conn, __ffs(avail));
+}
+
+/*
  * Assign channels and callNumbers to waiting calls.
  */
 static void rxrpc_activate_channels(struct rxrpc_connection *conn)
 {
-   unsigned char mask;
-
_enter("%d", conn->debug_id);
 
trace_rxrpc_client(conn, -1, rxrpc_client_activate_chans);
 
-   if (conn->cache_state != RXRPC_CONN_CLIENT_ACTIVE ||
-   conn->active_chans == RXRPC_ACTIVE_CHANS_MASK)
+   if (conn->active_chans == RXRPC_ACTIVE_CHANS_MASK)
return;
 
spin_lock(&conn->channel_lock);
-
-   while (!list_empty(&conn->waiting_calls) &&
-  (mask = ~conn->active_chans,
-   mask &= RXRPC_ACTIVE_CHANS_MASK,
-   mask != 0))
-   rxrpc_activate_one_channel(conn, __ffs(mask));
-
+   rxrpc_activate_channels_locked(conn);
spin_unlock(&conn->channel_lock);
_leave("");
 }

[PATCH net-next 6/6] rxrpc: Note serial number being ACK'd in the congestion management trace

2016-09-29 Thread David Howells

Note the serial number of the packet being ACK'd in the congestion
management trace rather than the serial number of the ACK packet.  Whilst
the serial number of the ACK packet is useful for matching ACK packet in
the output of wireshark, the serial number that the ACK is in response to
is of more use in working out how different trace lines relate.

Signed-off-by: David Howells 
---

 net/rxrpc/input.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 094720dd1eaf..1461d30583c9 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -41,10 +41,10 @@ static void rxrpc_proto_abort(const char *why,
  */
 static void rxrpc_congestion_management(struct rxrpc_call *call,
struct sk_buff *skb,
-   struct rxrpc_ack_summary *summary)
+   struct rxrpc_ack_summary *summary,
+   rxrpc_serial_t acked_serial)
 {
enum rxrpc_congest_change change = rxrpc_cong_no_change;
-   struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
unsigned int cumulative_acks = call->cong_cumul_acks;
unsigned int cwnd = call->cong_cwnd;
bool resend = false;
@@ -172,7 +172,7 @@ out_no_clear_ca:
cwnd = RXRPC_RXTX_BUFF_SIZE - 1;
call->cong_cwnd = cwnd;
call->cong_cumul_acks = cumulative_acks;
-   trace_rxrpc_congest(call, summary, sp->hdr.serial, change);
+   trace_rxrpc_congest(call, summary, acked_serial, change);
if (resend && !test_and_set_bit(RXRPC_CALL_EV_RESEND, &call->events))
rxrpc_queue_call(call);
return;
@@ -848,7 +848,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct 
sk_buff *skb,
  false, true,
  rxrpc_propose_ack_ping_for_lost_reply);
 
-   return rxrpc_congestion_management(call, skb, &summary);
+   return rxrpc_congestion_management(call, skb, &summary, acked_serial);
 }
 
 /*

[PATCH net-next 5/6] rxrpc: Request more ACKs in slow-start mode

2016-09-29 Thread David Howells

Set the request-ACK on more DATA packets whilst we're in slow start mode so
that we get sufficient ACKs back to supply information to configure the
window.

Signed-off-by: David Howells 
---

 net/rxrpc/output.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index ac9a58b619a6..0d47db886f6e 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -284,6 +284,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb,
 * ACKs if a DATA packet appears to have been lost.
 */
if (retrans ||
+   call->cong_mode == RXRPC_CALL_SLOW_START ||
(call->peer->rtt_usage < 3 && sp->hdr.seq & 1) ||
ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000),
 ktime_get_real()))

[PATCH net-next 4/6] rxrpc: Reduce the rxrpc_local::services list to a pointer

2016-09-29 Thread David Howells

Reduce the rxrpc_local::services list to just a pointer as we don't permit
multiple service endpoints to bind to a single transport endpoints (this is
excluded by rxrpc_lookup_local()).

The reason we don't allow this is that if you send a request to an AFS
filesystem service, it will try to talk back to your cache manager on the
port you sent from (this is how file change notifications are handled).  To
prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
sockets share a UDP socket if at least one of them has a service bound.

Signed-off-by: David Howells 
---

 net/rxrpc/af_rxrpc.c |   21 -
 net/rxrpc/ar-internal.h  |3 +--
 net/rxrpc/call_accept.c  |8 
 net/rxrpc/local_object.c |3 +--
 net/rxrpc/security.c |8 
 5 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 8dbf7bed2cc4..44c9c2b0b190 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -136,7 +136,8 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr 
*saddr, int len)
struct sockaddr_rxrpc *srx = (struct sockaddr_rxrpc *)saddr;
struct sock *sk = sock->sk;
struct rxrpc_local *local;
-   struct rxrpc_sock *rx = rxrpc_sk(sk), *prx;
+   struct rxrpc_sock *rx = rxrpc_sk(sk);
+   u16 service_id = srx->srx_service;
int ret;
 
_enter("%p,%p,%d", rx, saddr, len);
@@ -160,15 +161,12 @@ static int rxrpc_bind(struct socket *sock, struct 
sockaddr *saddr, int len)
goto error_unlock;
}
 
-   if (rx->srx.srx_service) {
+   if (service_id) {
write_lock(&local->services_lock);
-   hlist_for_each_entry(prx, &local->services, listen_link) {
-   if (prx->srx.srx_service == rx->srx.srx_service)
-   goto service_in_use;
-   }
-
+   if (rcu_access_pointer(local->service))
+   goto service_in_use;
rx->local = local;
-   hlist_add_head_rcu(&rx->listen_link, &local->services);
+   rcu_assign_pointer(local->service, rx);
write_unlock(&local->services_lock);
 
rx->sk.sk_state = RXRPC_SERVER_BOUND;
@@ -599,7 +597,6 @@ static int rxrpc_create(struct net *net, struct socket 
*sock, int protocol,
rx->family = protocol;
rx->calls = RB_ROOT;
 
-   INIT_HLIST_NODE(&rx->listen_link);
spin_lock_init(&rx->incoming_lock);
INIT_LIST_HEAD(&rx->sock_calls);
INIT_LIST_HEAD(&rx->to_be_accepted);
@@ -681,11 +678,9 @@ static int rxrpc_release_sock(struct sock *sk)
sk->sk_state = RXRPC_CLOSE;
spin_unlock_bh(&sk->sk_receive_queue.lock);
 
-   ASSERTCMP(rx->listen_link.next, !=, LIST_POISON1);
-
-   if (!hlist_unhashed(&rx->listen_link)) {
+   if (rx->local && rx->local->service == rx) {
write_lock(&rx->local->services_lock);
-   hlist_del_rcu(&rx->listen_link);
+   rx->local->service = NULL;
write_unlock(&rx->local->services_lock);
}
 
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 6aadaa7d8b43..539db54697f9 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -93,7 +93,6 @@ struct rxrpc_sock {
rxrpc_notify_new_call_t notify_new_call; /* Func to notify of new call 
*/
rxrpc_discard_new_call_t discard_new_call; /* Func to discard a new 
call */
struct rxrpc_local  *local; /* local endpoint */
-   struct hlist_node   listen_link;/* link in the local endpoint's 
listen list */
struct rxrpc_backlog*backlog;   /* Preallocation for services */
spinlock_t  incoming_lock;  /* Incoming call vs service 
shutdown lock */
struct list_headsock_calls; /* List of calls owned by this 
socket */
@@ -216,7 +215,7 @@ struct rxrpc_local {
struct list_headlink;
struct socket   *socket;/* my UDP socket */
struct work_struct  processor;
-   struct hlist_head   services;   /* services listening on this 
endpoint */
+   struct rxrpc_sock __rcu *service;   /* Service(s) listening on this 
endpoint */
struct rw_semaphore defrag_sem; /* control re-enablement of IP 
DF bit */
struct sk_buff_head reject_queue;   /* packets awaiting rejection */
struct sk_buff_head event_queue;/* endpoint event packets 
awaiting processing */
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index a8d39d7cf42c..3cac231d8405 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -331,14 +331,14 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct 
rxrpc_local *local,
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_sock *rx;
struct rxrpc_call *call;
+   u16 service

[PATCH net-next 0/6] rxrpc: Fixes and adjustments

2016-09-29 Thread David Howells


This set of patches contains some fixes and adjustments:

 (1) Connections for exclusive calls are being reused because the check to
 work out whether to set RXRPC_CONN_DONT_REUSE is checking where the
 parameters will be copied to (but haven't yet).

 (2) Make Tx loss-injection go through the normal return, so the state gets
 set correctly for what the code thinks it has done.

 Note lost Tx packets in the tx_data trace rather than the skb
 tracepoint.

 (3) Activate channels according to the current state from within the
 channel_lock to avoid someone changing it on us.

 (4) Reduce the local endpoint's services list to a single pointer as we
 don't allow service AF_RXRPC sockets to share UDP ports with other
 AF_RXRPC sockets, so there can't be more than one element in the list.

 (5) Request more ACKs in slow-start mode to help monitor the state driving
 the window configuration.

 (6) Display the serial number of the packet being ACK'd rather than the
 ACK packet's own serial number in the congestion trace as this can be
 related to other entries in the trace.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160929

David
---
David Howells (6):
  rxrpc: Fix exclusive client connections
  rxrpc: Make Tx loss-injection go through normal return and adjust tracing
  rxrpc: When activating client conn channels, do state check inside lock
  rxrpc: Reduce the rxrpc_local::services list to a pointer
  rxrpc: Request more ACKs in slow-start mode
  rxrpc: Note serial number being ACK'd in the congestion management trace


 include/trace/events/rxrpc.h |6 --
 net/rxrpc/af_rxrpc.c |   21 -
 net/rxrpc/ar-internal.h  |6 ++
 net/rxrpc/call_accept.c  |8 
 net/rxrpc/call_event.c   |2 +-
 net/rxrpc/conn_client.c  |   38 ++
 net/rxrpc/input.c|8 
 net/rxrpc/local_object.c |3 +--
 net/rxrpc/misc.c |1 -
 net/rxrpc/output.c   |   18 ++
 net/rxrpc/security.c |8 
 net/rxrpc/sendmsg.c  |2 +-
 net/rxrpc/skbuff.c   |   11 +++
 13 files changed, 68 insertions(+), 64 deletions(-)

Re: [PATCH] brcmfmac: implement more accurate skb tracking

2016-09-29 Thread Rafał Miłecki

On 27 September 2016 at 11:24, Arend Van Spriel
 wrote:
> On 26-9-2016 14:38, Rafał Miłecki wrote:
>> On 26 September 2016 at 14:13, Rafał Miłecki  wrote:
>>> On 26 September 2016 at 13:46, Arend Van Spriel
>>>  wrote:
 On 26-9-2016 12:23, Rafał Miłecki wrote:
> From: Rafał Miłecki 
>
> We need to track 802.1x packets to know if there are any pending ones
> for transmission. This is required for performing key update in the
> firmware.

 The problem we are trying to solve is a pretty old one. The problem is
 that wpa_supplicant uses two separate code paths: EAPOL messaging
 through data path and key configuration though nl80211.
>>>
>>> Can I find it described/reported somewhere?
>>>
>>>
> Unfortunately our old tracking code wasn't very accurate. It was
> treating skb as pending as soon as it was passed by the netif. Actual
> handling packet to the firmware was happening later as brcmfmac
> internally queues them and uses its own worker(s).

 That does not seem right. As soon as we get a 1x packet we need to wait
 with key configuration regardless whether it is still in the driver or
 handed over to firmware already.
>>>
>>> OK, thanks.
>>
>> Actually, it's not OK. I was trying to report/describe/discuss this
>> problem for over a week. I couldn't get much of answer from you.
>>
>> I had to come with a patch I worked on for quite some time. Only then
>> you decided to react and reply with a reason for a nack. I see this
>> patch may be wrong (but it's still hard to know what's going wrong
>> without a proper hostapd bug report). I'd expect you to somehow work &
>> communicate with open source community.
>
> We do or at least make an honest attempt, but there is more on our plate
> so responses may be delayed. It also does not help when you get anal and
> preachy when we do respond. Also not OK. In this case the delay is
> caused because I had to pick up the thread(s) as Hante is on vacation
> (he needed a break :-p ). However, you started sending patches so I
> decided to look at and respond to those. Sorry if you felt like we left
> you hanging to dry.

I believe I get easily irritated due to my communication experience I
got so far :(

Over a year ago I reported brcmfmac can't recover from failed
register_netdev(ice). This bug remains unfixed.

In 2014 I reported problem with 80 MHz support. I didn't have hardware
to fix & test it on my own (you weren't able/allowed to send me one of
your PCIe cards). In remained broken until I fixed it year later.

You missed my crash bug report about caused by missing eth_type_trans
and came with patch on your own a month later.

Earlier this year I reported you problem with BCM4366 and multiple
interfaces. I didn't get much help. 3 months later I came with patch
to workaround the problem but you said there's a better way to do
this. It took me 2 weeks to figure out a new wlioctl API for that
while all I needed was a simple hint on "interface_remove".

Right now I'm waiting to get any answer from you about 4366c0
firmware. It's still less than 2 weeks since I asked for it, but a
simple ETA would be nice. I'm actually not sure if I should report
more problems to you to don't distract you from pending things.

Problems with brcmf_netdev_wait_pend8021x were reported multiples
times for last few months. When I finally got time for that it took me
a week to debug them.

As you can see, it takes me months to get help on some things. And in
few cases I never got much help at all. Yes, I was hoping to have you
more involved into brcmfmac development and problems solving. I guess
things didn't meet my expectations and I got grumpy & preachy.

-- 
Rafał

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Tom Herbert

On Thu, Sep 29, 2016 at 11:15 AM, Eric Dumazet  wrote:
> On Thu, 2016-09-29 at 07:51 -0700, Eric Dumazet wrote:
>> On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote:
>>
>> > It addresses  the issue that Rick Jones pointed out was happening with
>> > XPS. When packets are sent for a flow that has no socket and XPS is
>> > enabled then each packet uses the XPS queue based on the running CPU.
>> > Since the thread sending on a flow can be rescheduled on different
>> > CPUs this is creating ooo packets. In this case the ooo is being
>> > caused by interaction with XPS.
>> >
>>
>> Nope, your patch does not address the problem properly.
>>
>> I am not sure I want to spend more time explaining the issue.
>>
>> Lets talk about this in Tokyo next week.
>>
>
> Just as a reminder, sorry to bother you, stating some obvious facts for
> both of us. We have public exchanges, so we also need to re-explain how
> things work.
>
> Queue selection on xmit happens before we hit the qdisc and its delays.
>
> So when you access txq->dql.num_completed_ops and
> txq->dql.num_enqueue_ops you can observe values that do not change for a
> while.
>
> Say a thread runs on a VM, and sends 2 packets P1, P2 on the same flow
> (skb_get_hash() returns the same value for these 2 packets)
>
> P1 is sent on behalf of CPU 1, we pickup queue txq1, and queue the
> packet on its qdisc . Transmit does not happen because of some
> constraints like rate limiting or scheduling constraints.
>
> P2 is sent on behalf of CPU 2, we pickup queue txq2, notice that prior
> packet chose txq1. We check txq1->dql and decide it is fine to use txq2,
> since the dql params of txq1 were not changed yet.
>
> ( txq->dql.num_completed_ops == ent.queue_ptr )
>
> Note that in RFS case, we have the guarantee that we observe 'live
> queues' since they are the per cpu backlog.
>
> So input_queue_head_incr() and input_queue_tail_incr_save() are
> correctly doing the OOO prevention, because a queued packet immediately
> changes the state.
>
> So really your patch works if you have no qdisc, or a non congested
> qdisc. (Think if P1 is dropped by a full pfifo or pfifo_fast : We really
> want to avoid steering P2, P3, ..., PN on this full pfifo while maybe
> other txq are idle). Strange attractors are back (check commit
> 9b462d02d6dd6 )
>
Understood.

> You could avoid (ab)using BQL with a different method, grabbing
> skb->destructor for the packets that are socketless
>
> The hash table would simply track the sum of skb->truesize to allow flow
> migration. This would be self contained and not intrusive.
>
Okay, will look that.

>
>
>

[PATCH net-next 2/2] openvswitch: remove skb_mpls_header

2016-09-29 Thread Jiri Benc

skb_mpls_header is equivalent to skb_network_header now. There's no reason
to keep it.

Signed-off-by: Jiri Benc 
---
 include/net/mpls.h| 11 ---
 net/openvswitch/actions.c | 10 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/include/net/mpls.h b/include/net/mpls.h
index 5b3b5addfb08..fde22d0b0ec1 100644
--- a/include/net/mpls.h
+++ b/include/net/mpls.h
@@ -25,15 +25,4 @@ static inline bool eth_p_mpls(__be16 eth_type)
eth_type == htons(ETH_P_MPLS_MC);
 }
 
-/*
- * For non-MPLS skbs this will correspond to the network header.
- * For MPLS skbs it will be before the network_header as the MPLS
- * label stack lies between the end of the mac header and the network
- * header. That is, for MPLS skbs the end of the mac header
- * is the top of the MPLS label stack.
- */
-static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
-{
-   return skb_mac_header(skb) + skb->mac_len;
-}
 #endif
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 863e992dfbc0..bf4211f5c974 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -180,7 +180,7 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb->mac_len);
 
-   new_mpls_lse = (__be32 *)skb_mpls_header(skb);
+   new_mpls_lse = (__be32 *)skb_network_header(skb);
*new_mpls_lse = mpls->mpls_lse;
 
skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
@@ -202,7 +202,7 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key 
*key,
if (unlikely(err))
return err;
 
-   skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN);
+   skb_postpull_rcsum(skb, skb_network_header(skb), MPLS_HLEN);
 
memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
skb->mac_len);
@@ -211,10 +211,10 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb->mac_len);
 
-   /* skb_mpls_header() is used to locate the ethertype
+   /* skb_network_header() is used to locate the ethertype
 * field correctly in the presence of VLAN tags.
 */
-   hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+   hdr = (struct ethhdr *)(skb_network_header(skb) - ETH_HLEN);
update_ethertype(skb, hdr, ethertype);
if (eth_p_mpls(skb->protocol))
skb->protocol = ethertype;
@@ -234,7 +234,7 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
if (unlikely(err))
return err;
 
-   stack = (__be32 *)skb_mpls_header(skb);
+   stack = (__be32 *)skb_network_header(skb);
lse = OVS_MASKED(*stack, *mpls_lse, *mask);
if (skb->ip_summed == CHECKSUM_COMPLETE) {
__be32 diff[] = { ~(*stack), lse };
-- 
1.8.3.1

[PATCH net-next 1/2] openvswitch: mpls: set network header correctly on key extract

2016-09-29 Thread Jiri Benc

After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc 
---
 net/openvswitch/flow.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 634cc10d6dee..c8c82e109c68 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -633,12 +633,7 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
} else if (eth_p_mpls(key->eth.type)) {
size_t stack_len = MPLS_HLEN;
 
-   /* In the presence of an MPLS label stack the end of the L2
-* header and the beginning of the L3 header differ.
-*
-* Advance network_header to the beginning of the L3
-* header. mac_len corresponds to the end of the L2 header.
-*/
+   skb_set_inner_network_header(skb, skb->mac_len);
while (1) {
__be32 lse;
 
@@ -646,12 +641,12 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
if (unlikely(error))
return 0;
 
-   memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+   memcpy(&lse, skb_inner_network_header(skb), MPLS_HLEN);
 
if (stack_len == MPLS_HLEN)
memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
 
-   skb_set_network_header(skb, skb->mac_len + stack_len);
+   skb_set_inner_network_header(skb, skb->mac_len + 
stack_len);
if (lse & htonl(MPLS_LS_S_MASK))
break;
 
-- 
1.8.3.1

[PATCH net-next 0/2] openvswitch: mpls fix and clean up

2016-09-29 Thread Jiri Benc

Convert to the new mpls skb layout the last remaining place in openvswitch,
forgotten on the mpls GSO rework. The GSO rework also allows for some nice
cleanup in the second patch.

Jiri Benc (2):
  openvswitch: mpls: set network header correctly on key extract
  openvswitch: remove skb_mpls_header

 include/net/mpls.h| 11 ---
 net/openvswitch/actions.c | 10 +-
 net/openvswitch/flow.c| 11 +++
 3 files changed, 8 insertions(+), 24 deletions(-)

-- 
1.8.3.1

Re: next-20160929 build: 2 failures 4 warnings (next-20160929)

2016-09-29 Thread Vishwanath Pai

On 09/29/2016 02:47 PM, Mark Brown wrote:
> On Thu, Sep 29, 2016 at 12:40:35PM +0100, Build bot for Mark Brown wrote:
> 
> For the past couple of days -next has been failing to build an ARM
> allmodconfig due to:
> 
>>  arm-allmodconfig
>> ERROR: "__aeabi_uldivmod" [net/netfilter/xt_hashlimit.ko] undefined!
> 
> which appears to be triggered by 11d5f15723c9 (netfilter: xt_hashlimit:
> Create revision 2 to support higher pps rates) introducing a division of
> a 64 bit number which should be done using do_div().
> 

I have sent a patch for this a couple of days ago to netdev, it hasn't
made it to net-next yet. Here's the latest one:

[PATCH net-next v3] netfilter: xt_hashlimit: Fix link error in 32bit
arch because of 64bit division

This should fix the link error.

-Vishwanath

Re: next-20160929 build: 2 failures 4 warnings (next-20160929)

2016-09-29 Thread Mark Brown

On Thu, Sep 29, 2016 at 12:40:35PM +0100, Build bot for Mark Brown wrote:

For the past couple of days -next has been failing to build an ARM
allmodconfig due to:

>   arm-allmodconfig
> ERROR: "__aeabi_uldivmod" [net/netfilter/xt_hashlimit.ko] undefined!

which appears to be triggered by 11d5f15723c9 (netfilter: xt_hashlimit:
Create revision 2 to support higher pps rates) introducing a division of
a 64 bit number which should be done using do_div().

signature.asc
Description: PGP signature

Re: pull-request: wireless-drivers-next 2016-09-29

2016-09-29 Thread Pablo Neira Ayuso

On Thu, Sep 29, 2016 at 07:57:28PM +0300, Kalle Valo wrote:
> Hi Dave,
> 
> this should be the last wireless-drivers-next pull request for 4.9, from
> now on only important bugfixes. Nothing really special stands out,
> iwlwifi being most active but other drivers also getting attention. More
> details in the signed tag. Please let me know if there are any problems.
> 
> Or actually I had one problem. While doing a test merge I noticed that
> net-next fails to compile for me, but I don't think this is anything
> wireless related:
> 
>   CC  net/netfilter/core.o
> net/netfilter/core.c: In function 'nf_set_hooks_head':
> net/netfilter/core.c:96:149: error: 'struct net_device' has no member named 
> 'nf_hooks_ingress'

That's my problem, will be sending a pull request to fix this asap,
thanks.

[net-next PATCH] i40e: Clean up handling of msg_enable flags and debug parameter

2016-09-29 Thread Alexander Duyck

So the i40e driver had a really convoluted configuration for how to handle
the debug flags contained in msg_enable.  Part of the issue is that the
driver has its own 32 bit mask that it was using to track a separate set of
debug features.  From what I can tell it was trying to use the upper 4 bits
to determine if the value was meant to represent a bit-mask or the numeric
value provided by debug level.

What this patch does is clean this up by compressing those 4 bits into bit
31, as a result we just have to perform a check against the value being
negative to determine if we are looking at a debug level (positive), or a
debug mask (negative).  The debug level will populate the msg_level, and
the debug mask will populate the debug_mask in the hardware struct.

I added similar logic for ethtool.  If the value being provided has bit 31
set we assume the value being provided is a debug mask, otherwise we assume
it is a msg_enable mask.  For displaying we only provide the msg_enable,
and if debug_mask is in use we will print it to the dmesg log.

Lastly I removed the debugfs interface.  It is redundant with what we
already have in ethtool and really doesn't belong anyway.

Signed-off-by: Alexander Duyck 
---

So I am running this through a slightly different process than standard
because there are some items here that might be objectionable so I want to
have that ironed out before I deal with the out-of-tree Intel driver.

I just want to verify if this fix for this will work for net-next or if I
need to look at taking a different approach.  Once I get the go-no-go, I
will back-port this into the out-of-tree i40e driver.

 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |   18 ---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |7 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|   28 
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 0c1875b..acb0f13 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1210,24 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
dev_info(&pf->pdev->dev,
 "dump debug fwdata   
\n");
}
-
-   } else if (strncmp(cmd_buf, "msg_enable", 10) == 0) {
-   u32 level;
-   cnt = sscanf(&cmd_buf[10], "%i", &level);
-   if (cnt) {
-   if (I40E_DEBUG_USER & level) {
-   pf->hw.debug_mask = level;
-   dev_info(&pf->pdev->dev,
-"set hw.debug_mask = 0x%08x\n",
-pf->hw.debug_mask);
-   }
-   pf->msg_enable = level;
-   dev_info(&pf->pdev->dev, "set msg_enable = 0x%08x\n",
-pf->msg_enable);
-   } else {
-   dev_info(&pf->pdev->dev, "msg_enable = 0x%08x\n",
-pf->msg_enable);
-   }
} else if (strncmp(cmd_buf, "pfr", 3) == 0) {
dev_info(&pf->pdev->dev, "debugfs: forcing PFR\n");
i40e_do_reset_safe(pf, BIT(__I40E_PF_RESET_REQUESTED));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index e79a920..b133a77 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -978,6 +978,10 @@ static u32 i40e_get_msglevel(struct net_device *netdev)
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_pf *pf = np->vsi->back;
+   u32 debug_mask = pf->hw.debug_mask;
+
+   if (debug_mask)
+   netdev_info(netdev, "i40e debug_mask: 0x%08X\n", debug_mask);
 
return pf->msg_enable;
 }
@@ -989,7 +993,8 @@ static void i40e_set_msglevel(struct net_device *netdev, 
u32 data)
 
if (I40E_DEBUG_USER & data)
pf->hw.debug_mask = data;
-   pf->msg_enable = data;
+   else
+   pf->msg_enable = data;
 }
 
 static int i40e_get_regs_len(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b93fa2a..b24c6eb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -93,8 +93,8 @@ MODULE_DEVICE_TABLE(pci, i40e_pci_tbl);
 
 #define I40E_MAX_VF_COUNT 128
 static int debug = -1;
-module_param(debug, int, 0);
-MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
+module_param(debug, uint, 0);
+MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all), Debug mask 
(0x8XXX)");
 
 MODULE_AUTHOR("Intel Corporation, ");
 MODULE_DESCRIPTION("Intel(R) Ethernet Connection XL710 Network Dri

Re: [PATCH net v2] tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

2016-09-29 Thread Michael Chan

On Thu, Sep 29, 2016 at 9:24 AM, Guilherme G. Piccoli
 wrote:
> From: Milton Miller 
>
> While the driver is probing the adapter, an error may occur before the
> netdev structure is allocated and attached to pci_dev. In this case,
> not only netdev isn't available, but the tg3 private structure is also
> not available as it is just math from the NULL pointer, so dereferences
> must be skipped.
>
> The following trace is seen when the error is triggered:
>
>   [1.402247] Unable to handle kernel paging request for data at address 
> 0x1a99
>   [1.402410] Faulting instruction address: 0xc07e33f8
>   [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
>   [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
>   [1.402513] Modules linked in:
>   [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic 
> #55-Ubuntu
>   [1.402591] task: c01fe4e42a20 ti: c01fe4e88000 task.ti: 
> c01fe4e88000
>   [1.402742] NIP: c07e33f8 LR: c07e3164 CTR: c0595ea0
>   [1.402787] REGS: c01fe4e8b790 TRAP: 0300   Not tainted  
> (4.4.0-36-generic)
>   [1.402832] MSR: 90019033   CR: 28000422  
> XER: 2000
>   [1.403058] CFAR: c0008468 DAR: 1a99 DSISR: 4200 
> SOFTE: 1
>   GPR00: c07e3164 c01fe4e8ba10 c15c5e00 
>   GPR04: 0001  0039 0299
>   GPR08:  0001 c01fe4e88000 0006
>   GPR12:  cfb4 c00e6558 c03ca1bffd00
>   GPR16:    
>   GPR20:    c0d52768
>   GPR24: c0d52740 0100 c03ca1b52000 0002
>   GPR28: 0900  c152a0c0 c03ca1b52000
>   [1.404226] NIP [c07e33f8] tg3_io_error_detected+0x308/0x340
>   [1.404265] LR [c07e3164] tg3_io_error_detected+0x74/0x340
>
> This patch avoids the NULL pointer dereference by moving the access after
> the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
> check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].
>
> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
> Tested-by: Guilherme G. Piccoli 
> Signed-off-by: Milton Miller 
> Signed-off-by: Guilherme G. Piccoli 

Acked-by: Michael Chan

[PATCH net-next v3] netfilter: xt_hashlimit: Fix link error in 32bit arch because of 64bit division

2016-09-29 Thread Vishwanath Pai

v2:
Remove unnecessary div64_u64 around constants

v3:
remove backslashes

--

Fix link error in 32bit arch because of 64bit division

Division of 64bit integers will cause linker error undefined reference
to `__udivdi3'. Fix this by replacing divisions with div64_64

Signed-off-by: Vishwanath Pai 
Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to ...")

---
 net/netfilter/xt_hashlimit.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 44a095e..faf65f1 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -467,17 +467,18 @@ static u64 user2credits(u64 user, int revision)
/* If multiplying would overflow... */
if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
/* Divide first. */
-   return (user / XT_HASHLIMIT_SCALE) *\
-   HZ * CREDITS_PER_JIFFY_v1;
+   return div64_u64(user, XT_HASHLIMIT_SCALE)
+   * HZ * CREDITS_PER_JIFFY_v1;
 
-   return (user * HZ * CREDITS_PER_JIFFY_v1) \
-   / XT_HASHLIMIT_SCALE;
+   return div64_u64((user * HZ * CREDITS_PER_JIFFY_v1),
+XT_HASHLIMIT_SCALE);
} else {
if (user > 0x / (HZ*CREDITS_PER_JIFFY))
-   return (user / XT_HASHLIMIT_SCALE_v2) *\
-   HZ * CREDITS_PER_JIFFY;
+   return div64_u64(user, XT_HASHLIMIT_SCALE_v2)
+   * HZ * CREDITS_PER_JIFFY;
 
-   return (user * HZ * CREDITS_PER_JIFFY) / XT_HASHLIMIT_SCALE_v2;
+   return div64_u64((user * HZ * CREDITS_PER_JIFFY),
+XT_HASHLIMIT_SCALE_v2);
}
 }
 
-- 
1.9.1

Re: [PATCH -next] net: macb: fix missing unlock on error in macb_start_xmit()

2016-09-29 Thread Nicolas Ferre

Le 12/09/2016 à 19:40, Helmut Buchsbaum a écrit :
> On 09/10/2016 01:17 PM, Wei Yongjun wrote:
>> From: Wei Yongjun 
>>
>> Fix missing unlock before return from function macb_start_xmit()
>> in the error handling case.
>>
>> Fixes: 007e4ba3ee13 ("net: macb: initialize checksum when using
>> checksum offloading")
>> Signed-off-by: Wei Yongjun 
>> ---
>>  drivers/net/ethernet/cadence/macb.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.c 
>> b/drivers/net/ethernet/cadence/macb.c
>> index 0294b6a..63144bb 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -1398,7 +1398,7 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
>> net_device *dev)
>>
>>  if (macb_clear_csum(skb)) {
>>  dev_kfree_skb_any(skb);
>> -return NETDEV_TX_OK;
>> +goto unlock;
>>  }
>>
>>  /* Map socket buffer for DMA transfer */
>>
> You are definitely right. Sorry I missed that obvious point and for 
> causing any inconveniences.
> 
> BTW, I see there are obviously quite a few users of MACB 
> implementations. I'm just curious if anybody else ever encountered the 
> checksum problem or if this a matter of Zynq implementation only.

I've just verified that we are affected by this issue as well on sama5d2
(Microchip / Atmel cortex-A5 MPUs).

Thanks for the fix,
-- 
Nicolas Ferre

[PATCH V2 for-next 7/8] net: hns: bug fix about broadcast/multicast packets

2016-09-29 Thread Salil Mehta

From: Daode Huang 

When the dsaf mode receives a broadcast packet, it will filter
the packet by comparing the received queue number and destination
queue number(get from forwarding table), if they are the same,
the packet will be filtered. Otherwise, the packet will be loopback.
So this patch select queue 0 to send broadcast and multicast packets.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
PATCH v2: Addressed comments by David Miller
  Link: https://lkml.org/lkml/2016/9/28/390
PATCH V1: Initial Submit
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 09ed237..5494e0e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1597,6 +1597,21 @@ struct rtnl_link_stats64 *hns_nic_get_stats64(struct 
net_device *ndev,
return stats;
 }
 
+static u16
+hns_nic_select_queue(struct net_device *ndev, struct sk_buff *skb,
+void *accel_priv, select_queue_fallback_t fallback)
+{
+   struct ethhdr *eth_hdr = (struct ethhdr *)skb->data;
+   struct hns_nic_priv *priv = netdev_priv(ndev);
+
+   /* fix hardware broadcast/multicast packets queue loopback */
+   if (!AE_IS_VER1(priv->enet_ver) &&
+   is_multicast_ether_addr(eth_hdr->h_dest))
+   return 0;
+   else
+   return fallback(ndev, skb);
+}
+
 static const struct net_device_ops hns_nic_netdev_ops = {
.ndo_open = hns_nic_net_open,
.ndo_stop = hns_nic_net_stop,
@@ -1612,6 +1627,7 @@ static const struct net_device_ops hns_nic_netdev_ops = {
.ndo_poll_controller = hns_nic_poll_controller,
 #endif
.ndo_set_rx_mode = hns_nic_set_rx_mode,
+   .ndo_select_queue = hns_nic_select_queue,
 };
 
 static void hns_nic_update_link_status(struct net_device *netdev)
-- 
1.7.9.5

[PATCH V2 for-next 5/8] net: hns: fix port not available after testing loopback

2016-09-29 Thread Salil Mehta

From: Kejian Yan 

After running command "ethtool -t eth0", eth0 can not be connected to
network. It is caused by the changing the inner loopback register and
this register cannot be changed when hns connected to network. The
routine of setting this register needs to be removed and using promisc
mode to let the packet looped back pass by dsaf mode.

Reported-by: Jun He 
Signed-off-by: Kejian Yan 
Reviewed-by: Yisen Zhaung 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |3 ---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |   10 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |1 -
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |7 +++
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index e28d960..e0f9cdc 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -678,9 +678,6 @@ static int hns_ae_config_loopback(struct hnae_handle 
*handle,
ret = -EINVAL;
}
 
-   if (!ret)
-   hns_dsaf_set_inner_lb(mac_cb->dsaf_dev, mac_cb->mac_id, en);
-
return ret;
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 9283bc6..827d8fb 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -768,16 +768,6 @@ void hns_dsaf_set_promisc_mode(struct dsaf_device 
*dsaf_dev, u32 en)
 DSAF_CFG_MIX_MODE_S, !!en);
 }
 
-void hns_dsaf_set_inner_lb(struct dsaf_device *dsaf_dev, u32 mac_id, u32 en)
-{
-   if (AE_IS_VER1(dsaf_dev->dsaf_ver) ||
-   dsaf_dev->mac_cb[mac_id]->mac_type == HNAE_PORT_DEBUG)
-   return;
-
-   dsaf_set_dev_bit(dsaf_dev, DSAFV2_SERDES_LBK_0_REG + 4 * mac_id,
-DSAFV2_SERDES_LBK_EN_B, !!en);
-}
-
 /**
  * hns_dsaf_tbl_stat_en - tbl
  * @dsaf_id: dsa fabric id
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
index 35df187..c494fc5 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
@@ -466,6 +466,5 @@ void hns_dsaf_get_rx_mac_pause_en(struct dsaf_device 
*dsaf_dev, int mac_id,
  u32 *en);
 int hns_dsaf_set_rx_mac_pause_en(struct dsaf_device *dsaf_dev, int mac_id,
 u32 en);
-void hns_dsaf_set_inner_lb(struct dsaf_device *dsaf_dev, u32 mac_id, u32 en);
 
 #endif /* __HNS_DSAF_MAIN_H__ */
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index ab33487..fa91ce3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -342,6 +342,13 @@ static int __lb_setup(struct net_device *ndev,
break;
}
 
+   if (!ret) {
+   if (loop == MAC_LOOP_NONE)
+   h->dev->ops->set_promisc_mode(
+   h, ndev->flags & IFF_PROMISC);
+   else
+   h->dev->ops->set_promisc_mode(h, 1);
+   }
return ret;
 }
 
-- 
1.7.9.5

[PATCH V2 for-next 3/8] net: hns: add fini_process for v2 napi process

2016-09-29 Thread Salil Mehta

From: Daode Huang 

This patch adds fini_process for v2, it handles the packets recevied
by the hardware in the napi porcess. With this patch, the hardware irq
numbers will drop 50% per sec.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   45 +
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 32ff270..e6bfc51 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -823,6 +823,8 @@ static void hns_nic_rx_fini_pro(struct hns_nic_ring_data 
*ring_data)
struct hnae_ring *ring = ring_data->ring;
int num = 0;
 
+   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(ring, 0);
+
/* for hardware bug fixed */
num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
 
@@ -834,6 +836,20 @@ static void hns_nic_rx_fini_pro(struct hns_nic_ring_data 
*ring_data)
}
 }
 
+static void hns_nic_rx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
+{
+   struct hnae_ring *ring = ring_data->ring;
+   int num = 0;
+
+   num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
+
+   if (num == 0)
+   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
+   ring, 0);
+   else
+   napi_schedule(&ring_data->napi);
+}
+
 static inline void hns_nic_reclaim_one_desc(struct hnae_ring *ring,
int *bytes, int *pkts)
 {
@@ -935,7 +951,11 @@ static int hns_nic_tx_poll_one(struct hns_nic_ring_data 
*ring_data,
 static void hns_nic_tx_fini_pro(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
-   int head = readl_relaxed(ring->io_base + RCB_REG_HEAD);
+   int head;
+
+   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(ring, 0);
+
+   head = readl_relaxed(ring->io_base + RCB_REG_HEAD);
 
if (head != ring->next_to_clean) {
ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
@@ -945,6 +965,18 @@ static void hns_nic_tx_fini_pro(struct hns_nic_ring_data 
*ring_data)
}
 }
 
+static void hns_nic_tx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
+{
+   struct hnae_ring *ring = ring_data->ring;
+   int head = readl_relaxed(ring->io_base + RCB_REG_HEAD);
+
+   if (head == ring->next_to_clean)
+   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
+   ring, 0);
+   else
+   napi_schedule(&ring_data->napi);
+}
+
 static void hns_nic_tx_clr_all_bufs(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
@@ -976,10 +1008,7 @@ static int hns_nic_common_poll(struct napi_struct *napi, 
int budget)
 
if (clean_complete >= 0 && clean_complete < budget) {
napi_complete(napi);
-   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
-   ring_data->ring, 0);
-   if (ring_data->fini_process)
-   ring_data->fini_process(ring_data);
+   ring_data->fini_process(ring_data);
return 0;
}
 
@@ -1755,7 +1784,8 @@ static int hns_nic_init_ring_data(struct hns_nic_priv 
*priv)
rd->queue_index = i;
rd->ring = &h->qs[i]->tx_ring;
rd->poll_one = hns_nic_tx_poll_one;
-   rd->fini_process = is_ver1 ? hns_nic_tx_fini_pro : NULL;
+   rd->fini_process = is_ver1 ? hns_nic_tx_fini_pro :
+   hns_nic_tx_fini_pro_v2;
 
netif_napi_add(priv->netdev, &rd->napi,
   hns_nic_common_poll, NIC_TX_CLEAN_MAX_NUM);
@@ -1767,7 +1797,8 @@ static int hns_nic_init_ring_data(struct hns_nic_priv 
*priv)
rd->ring = &h->qs[i - h->q_num]->rx_ring;
rd->poll_one = hns_nic_rx_poll_one;
rd->ex_process = hns_nic_rx_up_pro;
-   rd->fini_process = is_ver1 ? hns_nic_rx_fini_pro : NULL;
+   rd->fini_process = is_ver1 ? hns_nic_rx_fini_pro :
+   hns_nic_rx_fini_pro_v2;
 
netif_napi_add(priv->netdev, &rd->napi,
   hns_nic_common_poll, NIC_RX_CLEAN_MAX_NUM);
-- 
1.7.9.5

Re: [PATCH net] net: pktgen: fix pkt_size

2016-09-29 Thread Sergei Shtylyov


Hello.

On 09/29/2016 05:04 PM, Paolo Abeni wrote:


The commit 879c7220e828 ("net: pktgen: Observe needed_headroom
of the device") increased the 'pkt_overhead' field value by
LL_RESERVED_SPACE.
As a side effect the generated packet size, computed as:

/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
  pkt_dev->pkt_overhead;

is decreased by the same value.
The above changed slightly the behavior of existing pktgen users,
and made the interface somewhat inconsistent.
Fix it by restoring the previous pkt_overhead value and using
LL_RESERVED_SPACE as extralen in skb allocation.
Also, change pktgen_alloc_skb() to only partially reserve
the headroom to allow the caller to prefetch from ll header
start.

Fixes: 879c7220e828 ("net: pktgen: Observe needed_headroom of the device")
Suggested-by: Ben Greear 
Signed-off-by: Paolo Abeni 
---
 net/core/pktgen.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index bbd118b..5c9b397 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c

[...]

@@ -2796,8 +2796,9 @@ static struct sk_buff *pktgen_alloc_skb(struct net_device 
*dev,
 skb = __netdev_alloc_skb(dev, size, GFP_NOWAIT);
}

+   /* the caller prefetch from skb->data and reserve for the mac hdr */


   Pre-fetches and reserves?

[...]

MBR, Sergei

[PATCH V2 for-next 6/8] net: hns: fix the bug of forwarding table

2016-09-29 Thread Salil Mehta

From: Daode Huang 

As the sub queue id in the broadcast forwarding table is always
set to absolute queue 0 rather than the interface's relative queue 0,
this will cause the received broadcast packets loopback to rcb.
This patch sets the sub queue id to relative queue 0 of each port.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
PATCH V2: Addressed comments by David Miller
  Link: https://lkml.org/lkml/2016/9/28/390
PATCH V1: Initial Submit
---
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c |8 ++--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c |   13 ++---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h |2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index e0f9cdc..2d0cb60 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -207,6 +207,7 @@ static int hns_ae_set_multicast_one(struct hnae_handle 
*handle, void *addr)
int ret;
char *mac_addr = (char *)addr;
struct hns_mac_cb *mac_cb = hns_get_mac_cb(handle);
+   u8 port_num;
 
assert(mac_cb);
 
@@ -221,8 +222,11 @@ static int hns_ae_set_multicast_one(struct hnae_handle 
*handle, void *addr)
return ret;
}
 
-   ret = hns_mac_set_multi(mac_cb, DSAF_BASE_INNER_PORT_NUM,
-   mac_addr, true);
+   ret = hns_mac_get_inner_port_num(mac_cb, handle->vf_id, &port_num);
+   if (ret)
+   return ret;
+
+   ret = hns_mac_set_multi(mac_cb, port_num, mac_addr, true);
if (ret)
dev_err(handle->owner_dev,
"mac add mul_mac:%pM port%d  fail, ret = %#x!\n",
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index a68eef0..151fd6e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -141,9 +141,10 @@ void hns_mac_adjust_link(struct hns_mac_cb *mac_cb, int 
speed, int duplex)
  *@port_num:port number
  *
  */
-static int hns_mac_get_inner_port_num(struct hns_mac_cb *mac_cb,
- u8 vmid, u8 *port_num)
+int hns_mac_get_inner_port_num(struct hns_mac_cb *mac_cb, u8 vmid, u8 
*port_num)
 {
+   int q_num_per_vf, vf_num_per_port;
+   int vm_queue_id;
u8 tmp_port;
 
if (mac_cb->dsaf_dev->dsaf_mode <= DSAF_MODE_ENABLE) {
@@ -174,6 +175,12 @@ static int hns_mac_get_inner_port_num(struct hns_mac_cb 
*mac_cb,
return -EINVAL;
}
 
+   q_num_per_vf = mac_cb->dsaf_dev->rcb_common[0]->max_q_per_vf;
+   vf_num_per_port = mac_cb->dsaf_dev->rcb_common[0]->max_vfn;
+
+   vm_queue_id = vmid * q_num_per_vf +
+   vf_num_per_port * q_num_per_vf * mac_cb->mac_id;
+
switch (mac_cb->dsaf_dev->dsaf_mode) {
case DSAF_MODE_ENABLE_FIX:
tmp_port = 0;
@@ -193,7 +200,7 @@ static int hns_mac_get_inner_port_num(struct hns_mac_cb 
*mac_cb,
case DSAF_MODE_DISABLE_6PORT_2VM:
case DSAF_MODE_DISABLE_6PORT_4VM:
case DSAF_MODE_DISABLE_6PORT_16VM:
-   tmp_port = vmid;
+   tmp_port = vm_queue_id;
break;
default:
dev_err(mac_cb->dev, "dsaf mode invalid,%s mac%d!\n",
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index 4cbdf14..d3a1f72 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -461,5 +461,7 @@ void hns_set_led_opt(struct hns_mac_cb *mac_cb);
 int hns_cpld_led_set_id(struct hns_mac_cb *mac_cb,
enum hnae_led_state status);
 void hns_mac_set_promisc(struct hns_mac_cb *mac_cb, u8 en);
+int hns_mac_get_inner_port_num(struct hns_mac_cb *mac_cb,
+  u8 vmid, u8 *port_num);
 
 #endif /* _HNS_DSAF_MAC_H */
-- 
1.7.9.5

[PATCH V2 for-next 8/8] net: hns: delete redundant broadcast packet filter process

2016-09-29 Thread Salil Mehta

From: Daode Huang 

The broadcast packets is filtered in the hardware now, so this process
is no need in the driver, just delete it.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 5494e0e..a7abe11 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -574,7 +574,6 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data 
*ring_data,
struct sk_buff *skb;
struct hnae_desc *desc;
struct hnae_desc_cb *desc_cb;
-   struct ethhdr *eh;
unsigned char *va;
int bnum, length, i;
int pull_len;
@@ -600,7 +599,6 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data 
*ring_data,
ring->stats.sw_err_cnt++;
return -ENOMEM;
}
-   skb_reset_mac_header(skb);
 
prefetchw(skb->data);
length = le16_to_cpu(desc->rx.pkt_len);
@@ -682,14 +680,6 @@ out_bnum_err:
return -EFAULT;
}
 
-   /* filter out multicast pkt with the same src mac as this port */
-   eh = eth_hdr(skb);
-   if (unlikely(is_multicast_ether_addr(eh->h_dest) &&
-ether_addr_equal(ndev->dev_addr, eh->h_source))) {
-   dev_kfree_skb_any(skb);
-   return -EFAULT;
-   }
-
ring->stats.rx_pkts++;
ring->stats.rx_bytes += skb->len;
 
-- 
1.7.9.5

[PATCH V2 for-next 4/8] net: hns: delete repeat read fbd num after while

2016-09-29 Thread Salil Mehta

From: Daode Huang 

Because we handle the received packets after napi, so delete the checking
before submitting. It delete the code of read the fbd number register,
which reduces the cpu usages while receiving packets

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index e6bfc51..09ed237 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -760,7 +760,7 @@ static int hns_nic_rx_poll_one(struct hns_nic_ring_data 
*ring_data,
 {
struct hnae_ring *ring = ring_data->ring;
struct sk_buff *skb;
-   int num, bnum, ex_num;
+   int num, bnum;
 #define RCB_NOF_ALLOC_RX_BUFF_ONCE 16
int recv_pkts, recv_bds, clean_count, err;
int unused_count = hns_desc_unused(ring);
@@ -770,7 +770,7 @@ static int hns_nic_rx_poll_one(struct hns_nic_ring_data 
*ring_data,
 
recv_pkts = 0, recv_bds = 0, clean_count = 0;
num -= unused_count;
-recv:
+
while (recv_pkts < budget && recv_bds < num) {
/* reuse or realloc buffers */
if (clean_count + unused_count >= RCB_NOF_ALLOC_RX_BUFF_ONCE) {
@@ -798,17 +798,6 @@ recv:
recv_pkts++;
}
 
-   /* make all data has been write before submit */
-   if (recv_pkts < budget) {
-   ex_num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
-   ex_num -= unused_count;
-   if (ex_num > clean_count) {
-   num += ex_num - clean_count;
-   rmb(); /*complete read rx ring bd number*/
-   goto recv;
-   }
-   }
-
 out:
/* make all data has been write before submit */
if (clean_count + unused_count > 0)
-- 
1.7.9.5

[PATCH V2 for-next 1/8] net: hns: fix port unavailable after hnae_reserve_buffer_map fail

2016-09-29 Thread Salil Mehta

From: lipeng 

When hnae_reserve_buffer_map fail, it will break cycle and some
buffer description has no available memory, therefore the port will
be unavailable.

Signed-off-by: Peng Li 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index d7e1f8c..32ff270 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -747,6 +747,14 @@ static void hns_nic_rx_up_pro(struct hns_nic_ring_data 
*ring_data,
ndev->last_rx = jiffies;
 }
 
+static int hns_desc_unused(struct hnae_ring *ring)
+{
+   int ntc = ring->next_to_clean;
+   int ntu = ring->next_to_use;
+
+   return ((ntc >= ntu) ? 0 : ring->desc_num) + ntc - ntu;
+}
+
 static int hns_nic_rx_poll_one(struct hns_nic_ring_data *ring_data,
   int budget, void *v)
 {
@@ -755,17 +763,21 @@ static int hns_nic_rx_poll_one(struct hns_nic_ring_data 
*ring_data,
int num, bnum, ex_num;
 #define RCB_NOF_ALLOC_RX_BUFF_ONCE 16
int recv_pkts, recv_bds, clean_count, err;
+   int unused_count = hns_desc_unused(ring);
 
num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
rmb(); /* make sure num taken effect before the other data is touched */
 
recv_pkts = 0, recv_bds = 0, clean_count = 0;
+   num -= unused_count;
 recv:
while (recv_pkts < budget && recv_bds < num) {
/* reuse or realloc buffers */
-   if (clean_count >= RCB_NOF_ALLOC_RX_BUFF_ONCE) {
-   hns_nic_alloc_rx_buffers(ring_data, clean_count);
+   if (clean_count + unused_count >= RCB_NOF_ALLOC_RX_BUFF_ONCE) {
+   hns_nic_alloc_rx_buffers(ring_data,
+clean_count + unused_count);
clean_count = 0;
+   unused_count = hns_desc_unused(ring);
}
 
/* poll one pkt */
@@ -789,7 +801,7 @@ recv:
/* make all data has been write before submit */
if (recv_pkts < budget) {
ex_num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
-
+   ex_num -= unused_count;
if (ex_num > clean_count) {
num += ex_num - clean_count;
rmb(); /*complete read rx ring bd number*/
@@ -799,8 +811,9 @@ recv:
 
 out:
/* make all data has been write before submit */
-   if (clean_count > 0)
-   hns_nic_alloc_rx_buffers(ring_data, clean_count);
+   if (clean_count + unused_count > 0)
+   hns_nic_alloc_rx_buffers(ring_data,
+clean_count + unused_count);
 
return recv_pkts;
 }
-- 
1.7.9.5

[PATCH V2 for-next 0/8] Bug Fixes and Code Improvement in HNS driver

2016-09-29 Thread Salil Mehta

This patch-set introduces fix to some Bugs, potential problems
and code improvements identified during internal review and
testing of Hisilicon Network Subsystem driver.

Submit Change
V1->V2: This addresses the feedbacks provided by David Miller
and Doug Ledford

Daode Huang (6):
  net: hns: bug fix about setting coalsecs-usecs to 0
  net: hns: add fini_process for v2 napi process
  net: hns: delete repeat read fbd num after while
  net: hns: fix the bug of forwarding table
  net: hns: bug fix about broadcast/multicast packets
  net: hns: delete redundant broadcast packet filter process

Kejian Yan (1):
  net: hns: fix port not available after testing loopback

lipeng (1):
  net: hns: fix port unavailable after hnae_reserve_buffer_map fail

 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |   11 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |   13 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h  |2 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |   10 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |1 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   16 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |4 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  |  107 +---
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |7 ++
 9 files changed, 118 insertions(+), 53 deletions(-)

-- 
1.7.9.5

[PATCH V2 for-next 2/8] net: hns: bug fix about setting coalsecs-usecs to 0

2016-09-29 Thread Salil Mehta

From: Daode Huang 

When set rx/tx coalesce usecs to 0, the interrupt coalesce will be
disabled, but there is a interrupt rate limit which set to 1us, it
will cause no interrupt occurs. This patch disable interrupt limit
when sets coalsecs usecs to 0, and restores it to 1 in other case.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |   16 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h |4 
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index ef11077..f0ed80d6 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -543,6 +543,22 @@ int hns_rcb_set_coalesce_usecs(
"error: coalesce_usecs setting supports 0~1023us\n");
return -EINVAL;
}
+
+   if (!AE_IS_VER1(rcb_common->dsaf_dev->dsaf_ver)) {
+   if (timeout == 0)
+   /* set timeout to 0, Disable gap time */
+   dsaf_set_reg_field(rcb_common->io_base,
+  RCB_INT_GAP_TIME_REG + port_idx * 4,
+  PPE_INT_GAPTIME_M, PPE_INT_GAPTIME_B,
+  0);
+   else
+   /* set timeout non 0, restore gap time to 1 */
+   dsaf_set_reg_field(rcb_common->io_base,
+  RCB_INT_GAP_TIME_REG + port_idx * 4,
+  PPE_INT_GAPTIME_M, PPE_INT_GAPTIME_B,
+  1);
+   }
+
hns_rcb_set_port_timeout(rcb_common, port_idx, timeout);
return 0;
 }
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
index 4b8b803..878950a 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
@@ -417,6 +417,7 @@
 #define RCB_CFG_OVERTIME_REG   0x9300
 #define RCB_CFG_PKTLINE_INT_NUM_REG0x9304
 #define RCB_CFG_OVERTIME_INT_NUM_REG   0x9308
+#define RCB_INT_GAP_TIME_REG   0x9400
 #define RCB_PORT_CFG_OVERTIME_REG  0x9430
 
 #define RCB_RING_RX_RING_BASEADDR_L_REG0x0
@@ -898,6 +899,9 @@
 #define PPE_CNT_CLR_CE_B   0
 #define PPE_CNT_CLR_SNAP_EN_B  1
 
+#define PPE_INT_GAPTIME_B  0
+#define PPE_INT_GAPTIME_M  0x3ff
+
 #define PPE_COMMON_CNT_CLR_CE_B0
 #define PPE_COMMON_CNT_CLR_SNAP_EN_B   1
 #define RCB_COM_TSO_MODE_B 0
-- 
1.7.9.5

pull-request: wireless-drivers-next 2016-09-29

2016-09-29 Thread Kalle Valo

Hi Dave,

this should be the last wireless-drivers-next pull request for 4.9, from
now on only important bugfixes. Nothing really special stands out,
iwlwifi being most active but other drivers also getting attention. More
details in the signed tag. Please let me know if there are any problems.

Or actually I had one problem. While doing a test merge I noticed that
net-next fails to compile for me, but I don't think this is anything
wireless related:

  CC  net/netfilter/core.o
net/netfilter/core.c: In function 'nf_set_hooks_head':
net/netfilter/core.c:96:149: error: 'struct net_device' has no member named 
'nf_hooks_ingress'

Kalle

The following changes since commit fd9527f404d51e50f40dac0d9a69f2eff3dac33e:

  Merge branch 'ip_tunnel-collect_md' (2016-09-17 10:13:16 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2016-09-29

for you to fetch changes up to 15b95a15950238eff4d7f24be1716086eea67835:

  Merge ath-next from 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git (2016-09-28 
16:37:33 +0300)



wireless-drivers-next patches for 4.9

Major changes:

iwlwifi

* work for new hardware support continues
* dynamic queue allocation stabilization
* improvements in the MSIx code
* multiqueue support work continues
* new firmware version support (API 26)
* add 8275 series support
* add 9560 series support
* add support for MU-MIMO sniffer
* add support for RRM by scan
* add support for "reverse" rx packet injection faking hw descriptors
* migrate to devm memory allocation handling
* Remove support for older firmwares (API older than -17 and -22)

wl12xx

* support booting the same rootfs with both wl12xx and wl18xx

hostap

* mark the driver as obsolete

ath9k

* disable RNG by default


Amitkumar Karwar (1):
  mwifiex: code rearrangement in mwifiex_usb_host_to_card()

Arend Van Spriel (4):
  brcmfmac: rework pointer trickery in brcmf_proto_bcdc_query_dcmd()
  brcmfmac: fix memory leak in brcmf_flowring_add_tdls_peer()
  brcmfmac: initialize variable in brcmf_sdiod_regrl()
  brcmfmac: remove worker from .ndo_set_mac_address() callback

Arik Nemtsov (1):
  iwlwifi: move BIOS MCC retrieval to common code

Aviya Erenfeld (1):
  iwlwifi: mvm: add support for MU-MIMO air sniffer

Avrahams Stern (1):
  iwlwifi: mvm: Add support for RRM by scan

Ben Greear (3):
  ath10k: fix typo in logging message
  ath10k: document cycle count related counters
  ath10k: support up to 64 vdevs

Bob Copeland (1):
  mwifiex: fix error handling in mwifiex_create_custom_regdomain

Cathy Luo (2):
  mwifiex: fix kernel crash for USB chipsets
  mwifiex: fix race condition causing tx timeout

Chaehyun Lim (1):
  ath6kl: fix return value in ath6kl_wmi_set_pvb_cmd

Colin Ian King (1):
  mwifiex: fix null pointer deference when adapter is null

Emmanuel Grumbach (4):
  iwlwifi: mvm: bump max API to 26
  iwlwifi: don't export trace points that are used in iwlwifi only
  iwlwifi: mvm: fix typo in TC_CMD_SEC_KEY_FROM_TABLE
  iwlwifi: mvm: initialise ADD_STA before sending it to the firmware

Ganapathi Bhat (1):
  mwifiex: cfg80211 set_default_mgmt_key handler

Haim Dreyfuss (4):
  iwlwifi: pcie: Configure shared interrupt vector in MSIX mode
  iwlwifi: pcie: Set affinity mask for rx interrupt vectors per cpu
  iwlwifi: pcie: replace possible_cpus() with online_cpus() in MSIX mode
  iwlwifi: check for valid ethernet address provided by OEM

Hante Meuleman (5):
  brcmfmac: ignore 11d configuration errors
  brcmfmac: remove unnecessary null pointer check
  brcmfmac: fix clearing entry IPv6 address
  brcmfmac: fix out of bound access on clearing wowl wake indicator
  brcmfmac: simplify mapping of auth type

Ido Yariv (1):
  iwlwifi: mvm: Add mem debugfs entry

Jes Sorensen (4):
  rtl8xxxu: Implement 8192e specific power down sequence
  rtl8xxxu: Fix off by one error calculating pubq
  rtl8xxxu: Clean up llt_init() API
  rtl8xxxu: Use a struct rtl8xxxu_fileops * in rtl8xxxu_init_device()

Joe Perches (2):
  ath10k: spelling and miscellaneous neatening
  rtlwifi: Add switch variable to 'switch case not processed' messages

Johannes Berg (11):
  iwlwifi: mvm: make RSS RX more robust
  iwlwifi: mvm: remove pointless _bh from spinlock in timer
  iwlwifi: mvm: tighten BAID range check
  iwlwifi: mvm: compare full command ID
  iwlwifi: mvm: make iwl_mvm_update_sta() an inline
  iwlwifi: mvm: document passing unexpected Block Ack Request frames
  iwlwifi: mvm: move AP-specific code to right function
  iwlwifi: mvm: use LIST_HEAD() macro
  iwlwifi: pcie: use LIST_HEAD() macro
  iwlwifi: pcie: avoid variable shadowin

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Rick Jones


On 09/29/2016 06:18 AM, Eric Dumazet wrote:

Well, then what this patch series is solving ?

You have a producer of packets running on 8 vcpus in a VM.

Packets are exiting the VM and need to be queued on a mq NIC in the
hypervisor.

Flow X can be scheduled on any of these 8 vcpus, so XPS is currently
selecting different TXQ.


Just for completeness, in my testing, the VMs were single-vCPU.

rick jones

[PATCH net-next 09/10] net: dsa: mv88e6xxx: add set_switch_mac to ops

2016-09-29 Thread Vivien Didelot

Add a set_switch_mac chip-wide function to mv88e6xxx_ops and remove
MV88E6XXX_FLAG_G2_SWITCH_MAC flags.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 28 +---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  8 ++--
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 83a3769..e40b71b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2909,14 +2909,11 @@ static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 
*addr)
struct mv88e6xxx_chip *chip = ds->priv;
int err;
 
+   if (!chip->info->ops->set_switch_mac)
+   return -EOPNOTSUPP;
+
mutex_lock(&chip->reg_lock);
-
-   /* Has an indirect Switch MAC/WoL/WoF register in Global 2? */
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G2_SWITCH_MAC))
-   err = mv88e6xxx_g2_set_switch_mac(chip, addr);
-   else
-   err = mv88e6xxx_g1_set_switch_mac(chip, addr);
-
+   err = chip->info->ops->set_switch_mac(chip, addr);
mutex_unlock(&chip->reg_lock);
 
return err;
@@ -3210,86 +3207,103 @@ static int mv88e6xxx_set_eeprom(struct dsa_switch *ds,
 }
 
 static const struct mv88e6xxx_ops mv88e6085_ops = {
+   .set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6xxx_phy_ppu_read,
.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6095_ops = {
+   .set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6xxx_phy_ppu_read,
.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6123_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_read,
.phy_write = mv88e6xxx_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6131_ops = {
+   .set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6xxx_phy_ppu_read,
.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6161_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_read,
.phy_write = mv88e6xxx_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6165_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_read,
.phy_write = mv88e6xxx_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6171_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6172_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6175_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6176_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6185_ops = {
+   .set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6xxx_phy_ppu_read,
.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6240_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6320_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6321_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6350_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6351_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6352_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 8e12902..d04184c 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -423,7 +423,6 @@ enum mv88e6xxx_cap {

[PATCH net-next 00/10] net: dsa: mv88e6xxx: Global (1) cosmetics

2016-09-29 Thread Vivien Didelot

The Global (1) internal SMI device of Marvell switches is a set of
registers providing support to different units for MAC addresses (ATU),
VLANs (VTU), PHY polling (PPU), etc.

Chips (like 88E6060) may use a different address for it, or have
subtleties in the units (e.g. different number of databases, changing
how registers must be accessed), making it hard to maintain properly.

This patchset is a first step to polish the Global (1) support, with no
functional changes though. Here's basically what it does:

  - add helpers to access Global1 registers (same for Global2)
  - remove a few family checks (VTU/STU FID registers)
  - s/mv88e6xxx_vtu_stu_entry/mv88e6xxx_vtu_entry/
  - add a per-chip mv88e6xxx_ops structure of function pointers:

  struct mv88e6xxx_ops {
int (*get_eeprom)(struct mv88e6xxx_chip *chip,
  struct ethtool_eeprom *eeprom, u8 *data);
int (*set_eeprom)(struct mv88e6xxx_chip *chip,
  struct ethtool_eeprom *eeprom, u8 *data);

int (*set_switch_mac)(struct mv88e6xxx_chip *chip, u8 *addr);

int (*phy_read)(struct mv88e6xxx_chip *chip, int addr, int reg,
u16 *val);
int (*phy_write)(struct mv88e6xxx_chip *chip, int addr, int reg,
 u16 val);
  };

Future patchsets will add ATU/VTU ops, software reset, etc.

Vivien Didelot (10):
  net: dsa: mv88e6xxx: add global1 helpers
  net: dsa: mv88e6xxx: abstract REG_GLOBAL2
  net: dsa: mv88e6xxx: add flags for FID registers
  net: dsa: mv88e6xxx: expose mv88e6xxx_num_databases
  net: dsa: mv88e6xxx: add mv88e6xxx_num_ports helper
  net: dsa: mv88e6xxx: rename mv88e6xxx_vtu_stu_entry
  net: dsa: mv88e6xxx: rename mv88e6xxx_ops
  net: dsa: mv88e6xxx: add chip-wide ops
  net: dsa: mv88e6xxx: add set_switch_mac to ops
  net: dsa: mv88e6xxx: add eeprom ops

 drivers/net/dsa/mv88e6xxx/Makefile|   1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 804 +++---
 drivers/net/dsa/mv88e6xxx/global1.c   |  34 ++
 drivers/net/dsa/mv88e6xxx/global1.h   |  23 +
 drivers/net/dsa/mv88e6xxx/global2.c   |  86 ++--
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 102 +++--
 6 files changed, 607 insertions(+), 443 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/global1.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/global1.h

-- 
2.10.0

[PATCH net-next 03/10] net: dsa: mv88e6xxx: add flags for FID registers

2016-09-29 Thread Vivien Didelot

Add flags to describe the presence of Global 1 ATU FID register (0x01)
and VTU FID register (0x02), instead of checking families.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 16 +++-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 24 
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 98dee2c..b7eecc9 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -565,16 +565,6 @@ static unsigned int mv88e6xxx_num_databases(struct 
mv88e6xxx_chip *chip)
return chip->info->num_databases;
 }
 
-static bool mv88e6xxx_has_fid_reg(struct mv88e6xxx_chip *chip)
-{
-   /* Does the device have dedicated FID registers for ATU and VTU ops? */
-   if (mv88e6xxx_6097_family(chip) || mv88e6xxx_6165_family(chip) ||
-   mv88e6xxx_6351_family(chip) || mv88e6xxx_6352_family(chip))
-   return true;
-
-   return false;
-}
-
 /* We expect the switch to perform auto negotiation if there is a real
  * phy. However, in the case of a fixed link phy, we force the port
  * settings from the fixed link settings.
@@ -978,7 +968,7 @@ static int _mv88e6xxx_atu_cmd(struct mv88e6xxx_chip *chip, 
u16 fid, u16 cmd)
u16 val;
int err;
 
-   if (mv88e6xxx_has_fid_reg(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G1_ATU_FID)) {
err = mv88e6xxx_g1_write(chip, GLOBAL_ATU_FID, fid);
if (err)
return err;
@@ -1386,7 +1376,7 @@ static int _mv88e6xxx_vtu_getnext(struct mv88e6xxx_chip 
*chip,
if (err)
return err;
 
-   if (mv88e6xxx_has_fid_reg(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G1_VTU_FID)) {
err = mv88e6xxx_g1_read(chip, GLOBAL_VTU_FID, &val);
if (err)
return err;
@@ -1498,7 +1488,7 @@ static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip 
*chip,
return err;
}
 
-   if (mv88e6xxx_has_fid_reg(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_G1_VTU_FID)) {
reg = entry->fid & GLOBAL_VTU_FID_MASK;
err = mv88e6xxx_g1_write(chip, GLOBAL_VTU_FID, reg);
if (err)
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 2f10108..6c8584f 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -170,8 +170,8 @@
 #define GLOBAL_MAC_01  0x01
 #define GLOBAL_MAC_23  0x02
 #define GLOBAL_MAC_45  0x03
-#define GLOBAL_ATU_FID 0x01/* 6097 6165 6351 6352 */
-#define GLOBAL_VTU_FID 0x02/* 6097 6165 6351 6352 */
+#define GLOBAL_ATU_FID 0x01
+#define GLOBAL_VTU_FID 0x02
 #define GLOBAL_VTU_FID_MASK0xfff
 #define GLOBAL_VTU_SID 0x03/* 6097 6165 6351 6352 */
 #define GLOBAL_VTU_SID_MASK0x3f
@@ -408,6 +408,11 @@ enum mv88e6xxx_cap {
 */
MV88E6XXX_CAP_SERDES,
 
+   /* Switch Global (1) Registers.
+*/
+   MV88E6XXX_CAP_G1_ATU_FID,   /* (0x01) ATU FID Register */
+   MV88E6XXX_CAP_G1_VTU_FID,   /* (0x02) VTU FID Register */
+
/* Switch Global 2 Registers.
 * The device contains a second set of global 16-bit registers.
 */
@@ -460,6 +465,9 @@ enum mv88e6xxx_cap {
 
 #define MV88E6XXX_FLAG_SERDES  BIT_ULL(MV88E6XXX_CAP_SERDES)
 
+#define MV88E6XXX_FLAG_G1_ATU_FID  BIT_ULL(MV88E6XXX_CAP_G1_ATU_FID)
+#define MV88E6XXX_FLAG_G1_VTU_FID  BIT_ULL(MV88E6XXX_CAP_G1_VTU_FID)
+
 #define MV88E6XXX_FLAG_GLOBAL2 BIT_ULL(MV88E6XXX_CAP_GLOBAL2)
 #define MV88E6XXX_FLAG_G2_MGMT_EN_2X   BIT_ULL(MV88E6XXX_CAP_G2_MGMT_EN_2X)
 #define MV88E6XXX_FLAG_G2_MGMT_EN_0X   BIT_ULL(MV88E6XXX_CAP_G2_MGMT_EN_0X)
@@ -519,7 +527,9 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAGS_MULTI_CHIP)
 
 #define MV88E6XXX_FLAGS_FAMILY_6097\
-   (MV88E6XXX_FLAG_GLOBAL2 |   \
+   (MV88E6XXX_FLAG_G1_ATU_FID |\
+MV88E6XXX_FLAG_G1_VTU_FID |\
+MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_POT |\
@@ -531,7 +541,9 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAGS_PVT)
 
 #define MV88E6XXX_FLAGS_FAMILY_6165\
-   (MV88E6XXX_FLAG_GLOBAL2 |   \
+   (MV88E6XXX_FLAG_G1_ATU_FID |\
+MV88E6XXX_FLAG_G1_VTU_FID |\
+MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_SWITCH_MAC | \
@@ -570,6 +582,8 @@ enum mv88e6xxx_cap {
 
 #define MV88E6XXX_FLAGS_FAMILY_6351\
(MV88E6XXX_FLAG_EDSA |  \
+MV88E6XXX_FLAG_G1_ATU_FID |\
+MV88E6XXX_FLAG_G1

[PATCH net-next 02/10] net: dsa: mv88e6xxx: abstract REG_GLOBAL2

2016-09-29 Thread Vivien Didelot

Similarly to the ports, phys, and Global SMI devices, abstract the SMI
device address of the Global 2 registers in a few g2 static helpers.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/global2.c   | 78 ++-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  1 -
 2 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/global2.c 
b/drivers/net/dsa/mv88e6xxx/global2.c
index 99ed028..f31d553 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -14,6 +14,28 @@
 #include "mv88e6xxx.h"
 #include "global2.h"
 
+#define ADDR_GLOBAL2   0x1c
+
+static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)
+{
+   return mv88e6xxx_read(chip, ADDR_GLOBAL2, reg, val);
+}
+
+static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
+{
+   return mv88e6xxx_write(chip, ADDR_GLOBAL2, reg, val);
+}
+
+static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 
update)
+{
+   return mv88e6xxx_update(chip, ADDR_GLOBAL2, reg, update);
+}
+
+static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
+{
+   return mv88e6xxx_wait(chip, ADDR_GLOBAL2, reg, mask);
+}
+
 /* Offset 0x06: Device Mapping Table register */
 
 static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
@@ -21,7 +43,7 @@ static int mv88e6xxx_g2_device_mapping_write(struct 
mv88e6xxx_chip *chip,
 {
u16 val = (target << 8) | (port & 0xf);
 
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING, val);
+   return mv88e6xxx_g2_update(chip, GLOBAL2_DEVICE_MAPPING, val);
 }
 
 static int mv88e6xxx_g2_set_device_mapping(struct mv88e6xxx_chip *chip)
@@ -58,7 +80,7 @@ static int mv88e6xxx_g2_trunk_mask_write(struct 
mv88e6xxx_chip *chip, int num,
if (hask)
val |= GLOBAL2_TRUNK_MASK_HASK;
 
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_TRUNK_MASK, val);
+   return mv88e6xxx_g2_update(chip, GLOBAL2_TRUNK_MASK, val);
 }
 
 /* Offset 0x08: Trunk Mapping Table register */
@@ -69,7 +91,7 @@ static int mv88e6xxx_g2_trunk_mapping_write(struct 
mv88e6xxx_chip *chip, int id,
const u16 port_mask = BIT(chip->info->num_ports) - 1;
u16 val = (id << 11) | (map & port_mask);
 
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_TRUNK_MAPPING, val);
+   return mv88e6xxx_g2_update(chip, GLOBAL2_TRUNK_MAPPING, val);
 }
 
 static int mv88e6xxx_g2_clear_trunk(struct mv88e6xxx_chip *chip)
@@ -105,15 +127,15 @@ static int mv88e6xxx_g2_clear_irl(struct mv88e6xxx_chip 
*chip)
/* Init all Ingress Rate Limit resources of all ports */
for (port = 0; port < chip->info->num_ports; ++port) {
/* XXX newer chips (like 88E6390) have different 2-bit ops */
-   err = mv88e6xxx_write(chip, REG_GLOBAL2, GLOBAL2_IRL_CMD,
- GLOBAL2_IRL_CMD_OP_INIT_ALL |
- (port << 8));
+   err = mv88e6xxx_g2_write(chip, GLOBAL2_IRL_CMD,
+GLOBAL2_IRL_CMD_OP_INIT_ALL |
+(port << 8));
if (err)
break;
 
/* Wait for the operation to complete */
-   err = mv88e6xxx_wait(chip, REG_GLOBAL2, GLOBAL2_IRL_CMD,
-GLOBAL2_IRL_CMD_BUSY);
+   err = mv88e6xxx_g2_wait(chip, GLOBAL2_IRL_CMD,
+   GLOBAL2_IRL_CMD_BUSY);
if (err)
break;
}
@@ -128,7 +150,7 @@ static int mv88e6xxx_g2_switch_mac_write(struct 
mv88e6xxx_chip *chip,
 {
u16 val = (pointer << 8) | data;
 
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_SWITCH_MAC, val);
+   return mv88e6xxx_g2_update(chip, GLOBAL2_SWITCH_MAC, val);
 }
 
 int mv88e6xxx_g2_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
@@ -151,7 +173,7 @@ static int mv88e6xxx_g2_pot_write(struct mv88e6xxx_chip 
*chip, int pointer,
 {
u16 val = (pointer << 8) | (data & 0x7);
 
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE, val);
+   return mv88e6xxx_g2_update(chip, GLOBAL2_PRIO_OVERRIDE, val);
 }
 
 static int mv88e6xxx_g2_clear_pot(struct mv88e6xxx_chip *chip)
@@ -174,16 +196,16 @@ static int mv88e6xxx_g2_clear_pot(struct mv88e6xxx_chip 
*chip)
 
 static int mv88e6xxx_g2_eeprom_wait(struct mv88e6xxx_chip *chip)
 {
-   return mv88e6xxx_wait(chip, REG_GLOBAL2, GLOBAL2_EEPROM_CMD,
- GLOBAL2_EEPROM_CMD_BUSY |
- GLOBAL2_EEPROM_CMD_RUNNING);
+   return mv88e6xxx_g2_wait(chip, GLOBAL2_EEPROM_CMD,
+GLOBAL2_EEPROM_CMD_BUSY |
+GLOBAL2_EEPROM_CMD_RUNNING);
 }
 
 static int mv88e6xxx_g2_eeprom_cmd(struct mv88e6xxx_chip *chip, u16 cmd

[PATCH net v2] tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

2016-09-29 Thread Guilherme G. Piccoli

From: Milton Miller 

While the driver is probing the adapter, an error may occur before the
netdev structure is allocated and attached to pci_dev. In this case,
not only netdev isn't available, but the tg3 private structure is also
not available as it is just math from the NULL pointer, so dereferences
must be skipped.

The following trace is seen when the error is triggered:

  [1.402247] Unable to handle kernel paging request for data at address 
0x1a99
  [1.402410] Faulting instruction address: 0xc07e33f8
  [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
  [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
  [1.402513] Modules linked in:
  [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
  [1.402591] task: c01fe4e42a20 ti: c01fe4e88000 task.ti: 
c01fe4e88000
  [1.402742] NIP: c07e33f8 LR: c07e3164 CTR: c0595ea0
  [1.402787] REGS: c01fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
  [1.402832] MSR: 90019033   CR: 28000422  
XER: 2000
  [1.403058] CFAR: c0008468 DAR: 1a99 DSISR: 4200 
SOFTE: 1
  GPR00: c07e3164 c01fe4e8ba10 c15c5e00 
  GPR04: 0001  0039 0299
  GPR08:  0001 c01fe4e88000 0006
  GPR12:  cfb4 c00e6558 c03ca1bffd00
  GPR16:    
  GPR20:    c0d52768
  GPR24: c0d52740 0100 c03ca1b52000 0002
  GPR28: 0900  c152a0c0 c03ca1b52000
  [1.404226] NIP [c07e33f8] tg3_io_error_detected+0x308/0x340
  [1.404265] LR [c07e3164] tg3_io_error_detected+0x74/0x340

This patch avoids the NULL pointer dereference by moving the access after
the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].

Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
Tested-by: Guilherme G. Piccoli 
Signed-off-by: Milton Miller 
Signed-off-by: Guilherme G. Piccoli 
---

  * v2 changelog: added netdev NULL check on tg3_io_resume() as per
Michael Chan's suggestion.

 drivers/net/ethernet/broadcom/tg3.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index a2551bc..ea967df 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18122,14 +18122,14 @@ static pci_ers_result_t tg3_io_error_detected(struct 
pci_dev *pdev,
 
rtnl_lock();
 
-   /* We needn't recover from permanent error */
-   if (state == pci_channel_io_frozen)
-   tp->pcierr_recovery = true;
-
/* We probably don't have netdev yet */
if (!netdev || !netif_running(netdev))
goto done;
 
+   /* We needn't recover from permanent error */
+   if (state == pci_channel_io_frozen)
+   tp->pcierr_recovery = true;
+
tg3_phy_stop(tp);
 
tg3_netif_stop(tp);
@@ -18226,7 +18226,7 @@ static void tg3_io_resume(struct pci_dev *pdev)
 
rtnl_lock();
 
-   if (!netif_running(netdev))
+   if (!netdev || !netif_running(netdev))
goto done;
 
tg3_full_lock(tp, 0);
-- 
2.1.0

[PATCH net-next 01/10] net: dsa: mv88e6xxx: add global1 helpers

2016-09-29 Thread Vivien Didelot

The Global (1) internal SMI device is an extended set of registers
containing ATU, PPU, VTU, STU, etc.

It is present on every switches, usually at SMI address 0x1B. But old
models such as 88E6060 access it at address 0xF, thus using REG_GLOBAL
is erroneous.

Add a global1_addr info member used by mv88e6xxx_g1_{read,write} and
mv88e6xxx_g1_wait helpers in a new global1.c file.

This patch finally removes _mv88e6xxx_reg_{read,write}, in favor on the
appropriate helpers. No functional changes here.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/Makefile|   1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 505 +-
 drivers/net/dsa/mv88e6xxx/global1.c   |  34 +++
 drivers/net/dsa/mv88e6xxx/global1.h   |  23 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |   2 +-
 5 files changed, 306 insertions(+), 259 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/global1.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/global1.h

diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index 6971039..10ce820 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
 mv88e6xxx-objs := chip.o
+mv88e6xxx-objs += global1.o
 mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b2c25da..98dee2c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -31,6 +31,7 @@
 #include 
 
 #include "mv88e6xxx.h"
+#include "global1.h"
 #include "global2.h"
 
 static void assert_reg_lock(struct mv88e6xxx_chip *chip)
@@ -361,46 +362,27 @@ int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int 
addr, int reg, u16 update)
return mv88e6xxx_write(chip, addr, reg, val);
 }
 
-static int _mv88e6xxx_reg_read(struct mv88e6xxx_chip *chip, int addr, int reg)
+static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip *chip)
 {
u16 val;
-   int err;
+   int i, err;
 
-   err = mv88e6xxx_read(chip, addr, reg, &val);
+   err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
if (err)
return err;
 
-   return val;
-}
-
-static int _mv88e6xxx_reg_write(struct mv88e6xxx_chip *chip, int addr,
-   int reg, u16 val)
-{
-   return mv88e6xxx_write(chip, addr, reg, val);
-}
-
-static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip *chip)
-{
-   int ret;
-   int i;
-
-   ret = _mv88e6xxx_reg_read(chip, REG_GLOBAL, GLOBAL_CONTROL);
-   if (ret < 0)
-   return ret;
-
-   ret = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_CONTROL,
-  ret & ~GLOBAL_CONTROL_PPU_ENABLE);
-   if (ret)
-   return ret;
+   err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
+val & ~GLOBAL_CONTROL_PPU_ENABLE);
+   if (err)
+   return err;
 
for (i = 0; i < 16; i++) {
-   ret = _mv88e6xxx_reg_read(chip, REG_GLOBAL, GLOBAL_STATUS);
-   if (ret < 0)
-   return ret;
+   err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
+   if (err)
+   return err;
 
usleep_range(1000, 2000);
-   if ((ret & GLOBAL_STATUS_PPU_MASK) !=
-   GLOBAL_STATUS_PPU_POLLING)
+   if ((val & GLOBAL_STATUS_PPU_MASK) != GLOBAL_STATUS_PPU_POLLING)
return 0;
}
 
@@ -409,25 +391,25 @@ static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip 
*chip)
 
 static int mv88e6xxx_ppu_enable(struct mv88e6xxx_chip *chip)
 {
-   int ret, err, i;
+   u16 val;
+   int i, err;
 
-   ret = _mv88e6xxx_reg_read(chip, REG_GLOBAL, GLOBAL_CONTROL);
-   if (ret < 0)
-   return ret;
+   err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
+   if (err)
+   return err;
 
-   err = _mv88e6xxx_reg_write(chip, REG_GLOBAL, GLOBAL_CONTROL,
-  ret | GLOBAL_CONTROL_PPU_ENABLE);
+   err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
+val | GLOBAL_CONTROL_PPU_ENABLE);
if (err)
return err;
 
for (i = 0; i < 16; i++) {
-   ret = _mv88e6xxx_reg_read(chip, REG_GLOBAL, GLOBAL_STATUS);
-   if (ret < 0)
-   return ret;
+   err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
+   if (err)
+   return err;
 
usleep_range(1000, 2000);
-   if ((ret & GLOBAL_STATUS_PPU_MASK) ==
-   GLOBAL_STATUS_PPU_POLLING)
+   if ((val & GLOBAL_STATUS_PPU_MASK) == GLOBAL_STATUS_PPU_POLLING)
return 0;
}
 
@@ -663,12 +645,12 @@ out:
 
 static int _mv88e6xxx_stats_wait(struct mv88e6xxx_chip

[PATCH net-next 08/10] net: dsa: mv88e6xxx: add chip-wide ops

2016-09-29 Thread Vivien Didelot

Introduce a mv88e6xxx_ops structure to describe supported chip-wide
functions and assign the correct variant to the chip models.

For the moment, add only PHY access routines. This allows to get rid of
the PHY ops structures and the usage of PHY flags.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 136 +++---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  28 ---
 2 files changed, 121 insertions(+), 43 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ad31d3e..83a3769 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -238,10 +238,10 @@ static int mv88e6xxx_phy_read(struct mv88e6xxx_chip 
*chip, int phy,
 {
int addr = phy; /* PHY devices addresses start at 0x0 */
 
-   if (!chip->phy_ops)
+   if (!chip->info->ops->phy_read)
return -EOPNOTSUPP;
 
-   return chip->phy_ops->read(chip, addr, reg, val);
+   return chip->info->ops->phy_read(chip, addr, reg, val);
 }
 
 static int mv88e6xxx_phy_write(struct mv88e6xxx_chip *chip, int phy,
@@ -249,10 +249,10 @@ static int mv88e6xxx_phy_write(struct mv88e6xxx_chip 
*chip, int phy,
 {
int addr = phy; /* PHY devices addresses start at 0x0 */
 
-   if (!chip->phy_ops)
+   if (!chip->info->ops->phy_write)
return -EOPNOTSUPP;
 
-   return chip->phy_ops->write(chip, addr, reg, val);
+   return chip->info->ops->phy_write(chip, addr, reg, val);
 }
 
 static int mv88e6xxx_phy_page_get(struct mv88e6xxx_chip *chip, int phy, u8 
page)
@@ -515,11 +515,6 @@ static int mv88e6xxx_phy_ppu_write(struct mv88e6xxx_chip 
*chip, int addr,
return err;
 }
 
-static const struct mv88e6xxx_bus_ops mv88e6xxx_phy_ppu_ops = {
-   .read = mv88e6xxx_phy_ppu_read,
-   .write = mv88e6xxx_phy_ppu_write,
-};
-
 static bool mv88e6xxx_6065_family(struct mv88e6xxx_chip *chip)
 {
return chip->info->family == MV88E6XXX_FAMILY_6065;
@@ -3214,6 +3209,91 @@ static int mv88e6xxx_set_eeprom(struct dsa_switch *ds,
return err;
 }
 
+static const struct mv88e6xxx_ops mv88e6085_ops = {
+   .phy_read = mv88e6xxx_phy_ppu_read,
+   .phy_write = mv88e6xxx_phy_ppu_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6095_ops = {
+   .phy_read = mv88e6xxx_phy_ppu_read,
+   .phy_write = mv88e6xxx_phy_ppu_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6123_ops = {
+   .phy_read = mv88e6xxx_read,
+   .phy_write = mv88e6xxx_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6131_ops = {
+   .phy_read = mv88e6xxx_phy_ppu_read,
+   .phy_write = mv88e6xxx_phy_ppu_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6161_ops = {
+   .phy_read = mv88e6xxx_read,
+   .phy_write = mv88e6xxx_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6165_ops = {
+   .phy_read = mv88e6xxx_read,
+   .phy_write = mv88e6xxx_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6171_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6172_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6175_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6176_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6185_ops = {
+   .phy_read = mv88e6xxx_phy_ppu_read,
+   .phy_write = mv88e6xxx_phy_ppu_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6240_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6320_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6321_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6350_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6351_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
+static const struct mv88e6xxx_ops mv88e6352_ops = {
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
 static const struct mv88e6xxx_info mv88e6xxx_table[] = {
[MV88E6085] = {
.prod_num = PORT_SWITCH_ID_PROD_NUM_6085,
@@ -3225,6 +3305,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.global1_addr = 0x1b,
.age_time_coeff = 15000,
.flags = MV88E6XXX_FLAGS_F

[PATCH net-next 06/10] net: dsa: mv88e6xxx: rename mv88e6xxx_vtu_stu_entry

2016-09-29 Thread Vivien Didelot

The STU (if the switch has one) is abstracted and accessed through the
VTU operations and data registers.

Thus rename the mv88e6xxx_vtu_stu_entry struct to mv88e6xxx_vtu_entry.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 46 +--
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  5 +---
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9056d9e..d805661 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1265,7 +1265,7 @@ static int _mv88e6xxx_vtu_stu_flush(struct mv88e6xxx_chip 
*chip)
 }
 
 static int _mv88e6xxx_vtu_stu_data_read(struct mv88e6xxx_chip *chip,
-   struct mv88e6xxx_vtu_stu_entry *entry,
+   struct mv88e6xxx_vtu_entry *entry,
unsigned int nibble_offset)
 {
u16 regs[3];
@@ -1290,19 +1290,19 @@ static int _mv88e6xxx_vtu_stu_data_read(struct 
mv88e6xxx_chip *chip,
 }
 
 static int mv88e6xxx_vtu_data_read(struct mv88e6xxx_chip *chip,
-  struct mv88e6xxx_vtu_stu_entry *entry)
+  struct mv88e6xxx_vtu_entry *entry)
 {
return _mv88e6xxx_vtu_stu_data_read(chip, entry, 0);
 }
 
 static int mv88e6xxx_stu_data_read(struct mv88e6xxx_chip *chip,
-  struct mv88e6xxx_vtu_stu_entry *entry)
+  struct mv88e6xxx_vtu_entry *entry)
 {
return _mv88e6xxx_vtu_stu_data_read(chip, entry, 2);
 }
 
 static int _mv88e6xxx_vtu_stu_data_write(struct mv88e6xxx_chip *chip,
-struct mv88e6xxx_vtu_stu_entry *entry,
+struct mv88e6xxx_vtu_entry *entry,
 unsigned int nibble_offset)
 {
u16 regs[3] = { 0 };
@@ -1327,13 +1327,13 @@ static int _mv88e6xxx_vtu_stu_data_write(struct 
mv88e6xxx_chip *chip,
 }
 
 static int mv88e6xxx_vtu_data_write(struct mv88e6xxx_chip *chip,
-   struct mv88e6xxx_vtu_stu_entry *entry)
+   struct mv88e6xxx_vtu_entry *entry)
 {
return _mv88e6xxx_vtu_stu_data_write(chip, entry, 0);
 }
 
 static int mv88e6xxx_stu_data_write(struct mv88e6xxx_chip *chip,
-   struct mv88e6xxx_vtu_stu_entry *entry)
+   struct mv88e6xxx_vtu_entry *entry)
 {
return _mv88e6xxx_vtu_stu_data_write(chip, entry, 2);
 }
@@ -1345,9 +1345,9 @@ static int _mv88e6xxx_vtu_vid_write(struct mv88e6xxx_chip 
*chip, u16 vid)
 }
 
 static int _mv88e6xxx_vtu_getnext(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_stu_entry *entry)
+ struct mv88e6xxx_vtu_entry *entry)
 {
-   struct mv88e6xxx_vtu_stu_entry next = { 0 };
+   struct mv88e6xxx_vtu_entry next = { 0 };
u16 val;
int err;
 
@@ -1407,7 +1407,7 @@ static int mv88e6xxx_port_vlan_dump(struct dsa_switch 
*ds, int port,
int (*cb)(struct switchdev_obj *obj))
 {
struct mv88e6xxx_chip *chip = ds->priv;
-   struct mv88e6xxx_vtu_stu_entry next;
+   struct mv88e6xxx_vtu_entry next;
u16 pvid;
int err;
 
@@ -1458,7 +1458,7 @@ unlock:
 }
 
 static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
-   struct mv88e6xxx_vtu_stu_entry *entry)
+   struct mv88e6xxx_vtu_entry *entry)
 {
u16 op = GLOBAL_VTU_OP_VTU_LOAD_PURGE;
u16 reg = 0;
@@ -1507,9 +1507,9 @@ loadpurge:
 }
 
 static int _mv88e6xxx_stu_getnext(struct mv88e6xxx_chip *chip, u8 sid,
- struct mv88e6xxx_vtu_stu_entry *entry)
+ struct mv88e6xxx_vtu_entry *entry)
 {
-   struct mv88e6xxx_vtu_stu_entry next = { 0 };
+   struct mv88e6xxx_vtu_entry next = { 0 };
u16 val;
int err;
 
@@ -1549,7 +1549,7 @@ static int _mv88e6xxx_stu_getnext(struct mv88e6xxx_chip 
*chip, u8 sid,
 }
 
 static int _mv88e6xxx_stu_loadpurge(struct mv88e6xxx_chip *chip,
-   struct mv88e6xxx_vtu_stu_entry *entry)
+   struct mv88e6xxx_vtu_entry *entry)
 {
u16 reg = 0;
int err;
@@ -1652,7 +1652,7 @@ static int _mv88e6xxx_port_fid_set(struct mv88e6xxx_chip 
*chip,
 static int _mv88e6xxx_fid_new(struct mv88e6xxx_chip *chip, u16 *fid)
 {
DECLARE_BITMAP(fid_bitmap, MV88E6XXX_N_FID);
-   struct mv88e6xxx_vtu_stu_entry vlan;
+   struct mv88e6xxx_vtu_entry vlan;
int i, err;
 
bitmap_zero(fid_bitmap, MV88E6XXX_N_FID);
@@ -1694,10 +1694,10 @@ static int _mv88e6xxx_fid_new(struct mv88e6xxx_chip 
*chip, u16 *fid)
 }
 
 static int _mv88e6xxx_vtu_new(struct

[PATCH net-next 10/10] net: dsa: mv88e6xxx: add eeprom ops

2016-09-29 Thread Vivien Didelot

Remove EEPROM flags in favor of new {get,set}_eeprom chip-wide
functions in the mv88e6xxx_ops structure.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 34 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 16 +---
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index e40b71b..883fd98 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3168,13 +3168,11 @@ static int mv88e6xxx_get_eeprom(struct dsa_switch *ds,
struct mv88e6xxx_chip *chip = ds->priv;
int err;
 
+   if (!chip->info->ops->get_eeprom)
+   return -EOPNOTSUPP;
+
mutex_lock(&chip->reg_lock);
-
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_EEPROM16))
-   err = mv88e6xxx_g2_get_eeprom16(chip, eeprom, data);
-   else
-   err = -EOPNOTSUPP;
-
+   err = chip->info->ops->get_eeprom(chip, eeprom, data);
mutex_unlock(&chip->reg_lock);
 
if (err)
@@ -3191,16 +3189,14 @@ static int mv88e6xxx_set_eeprom(struct dsa_switch *ds,
struct mv88e6xxx_chip *chip = ds->priv;
int err;
 
+   if (!chip->info->ops->set_eeprom)
+   return -EOPNOTSUPP;
+
if (eeprom->magic != 0xc3ec4951)
return -EINVAL;
 
mutex_lock(&chip->reg_lock);
-
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_EEPROM16))
-   err = mv88e6xxx_g2_set_eeprom16(chip, eeprom, data);
-   else
-   err = -EOPNOTSUPP;
-
+   err = chip->info->ops->set_eeprom(chip, eeprom, data);
mutex_unlock(&chip->reg_lock);
 
return err;
@@ -3249,6 +3245,8 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
 };
 
 static const struct mv88e6xxx_ops mv88e6172_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
@@ -3261,6 +3259,8 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
 };
 
 static const struct mv88e6xxx_ops mv88e6176_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
@@ -3273,18 +3273,24 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 };
 
 static const struct mv88e6xxx_ops mv88e6240_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6320_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
 };
 
 static const struct mv88e6xxx_ops mv88e6321_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
@@ -3303,6 +3309,8 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
 };
 
 static const struct mv88e6xxx_ops mv88e6352_ops = {
+   .get_eeprom = mv88e6xxx_g2_get_eeprom16,
+   .set_eeprom = mv88e6xxx_g2_set_eeprom16,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
.phy_write = mv88e6xxx_g2_smi_phy_write,
@@ -3825,7 +3833,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (IS_ERR(chip->reset))
return PTR_ERR(chip->reset);
 
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_EEPROM16) &&
+   if (chip->info->ops->get_eeprom &&
!of_property_read_u32(np, "eeprom-length", &eeprom_len))
chip->eeprom_len = eeprom_len;
 
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index d04184c..e572121 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -424,8 +424,6 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_G2_PVT_ADDR,  /* (0x0b) Cross Chip Port VLAN Addr */
MV88E6XXX_CAP_G2_PVT_DATA,  /* (0x0c) Cross Chip Port VLAN Data */
MV88E6XXX_CAP_G2_POT,   /* (0x0f) Priority Override Table */
-   MV88E6XXX_CAP_G2_EEPROM_CMD,/* (0x14) EEPROM Command */
-   MV88E6XXX_CAP_G2_EEPROM_DATA,   /* (0x15) EEPROM Data */
 
/* PHY Polling Unit.
 * See GLOBAL_CONTROL_PPU_ENABLE and GLOBAL_STATUS_PPU_POLLING.
@@ -473,8 +471,6 @@ en

[PATCH net-next 04/10] net: dsa: mv88e6xxx: expose mv88e6xxx_num_databases

2016-09-29 Thread Vivien Didelot

The mv88e6xxx_num_databases will be used by shared code, so move it
inline to the header file.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 5 -
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b7eecc9..6a55bba 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -560,11 +560,6 @@ static bool mv88e6xxx_6352_family(struct mv88e6xxx_chip 
*chip)
return chip->info->family == MV88E6XXX_FAMILY_6352;
 }
 
-static unsigned int mv88e6xxx_num_databases(struct mv88e6xxx_chip *chip)
-{
-   return chip->info->num_databases;
-}
-
 /* We expect the switch to perform auto negotiation if there is a real
  * phy. However, in the case of a fixed link phy, we force the port
  * settings from the fixed link settings.
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 6c8584f..20fe6c6 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -732,6 +732,11 @@ static inline bool mv88e6xxx_has(struct mv88e6xxx_chip 
*chip,
return (chip->info->flags & flags) == flags;
 }
 
+static inline unsigned int mv88e6xxx_num_databases(struct mv88e6xxx_chip *chip)
+{
+   return chip->info->num_databases;
+}
+
 int mv88e6xxx_read(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val);
 int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val);
 int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg,
-- 
2.10.0

[PATCH net-next 07/10] net: dsa: mv88e6xxx: rename mv88e6xxx_ops

2016-09-29 Thread Vivien Didelot

The mv88e6xxx_ops is used to describe how to access the chip registers.
It can be through SMI (via an MDIO bus), or via another interface such
as crafted remote management frames.

The correct BUS operations structure is chosen at runtime, depending on
the chip address and connectivity.

We will need the mv88e6xxx_ops name for future chip-wide operation
structure, thus rename mv88e6xxx_ops to more explicit mv88e6xxx_bus_ops.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 10 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d805661..ad31d3e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -98,7 +98,7 @@ static int mv88e6xxx_smi_single_chip_write(struct 
mv88e6xxx_chip *chip,
return 0;
 }
 
-static const struct mv88e6xxx_ops mv88e6xxx_smi_single_chip_ops = {
+static const struct mv88e6xxx_bus_ops mv88e6xxx_smi_single_chip_ops = {
.read = mv88e6xxx_smi_single_chip_read,
.write = mv88e6xxx_smi_single_chip_write,
 };
@@ -180,7 +180,7 @@ static int mv88e6xxx_smi_multi_chip_write(struct 
mv88e6xxx_chip *chip,
return 0;
 }
 
-static const struct mv88e6xxx_ops mv88e6xxx_smi_multi_chip_ops = {
+static const struct mv88e6xxx_bus_ops mv88e6xxx_smi_multi_chip_ops = {
.read = mv88e6xxx_smi_multi_chip_read,
.write = mv88e6xxx_smi_multi_chip_write,
 };
@@ -515,7 +515,7 @@ static int mv88e6xxx_phy_ppu_write(struct mv88e6xxx_chip 
*chip, int addr,
return err;
 }
 
-static const struct mv88e6xxx_ops mv88e6xxx_phy_ppu_ops = {
+static const struct mv88e6xxx_bus_ops mv88e6xxx_phy_ppu_ops = {
.read = mv88e6xxx_phy_ppu_read,
.write = mv88e6xxx_phy_ppu_write,
 };
@@ -3479,12 +3479,12 @@ static struct mv88e6xxx_chip 
*mv88e6xxx_alloc_chip(struct device *dev)
return chip;
 }
 
-static const struct mv88e6xxx_ops mv88e6xxx_g2_smi_phy_ops = {
+static const struct mv88e6xxx_bus_ops mv88e6xxx_g2_smi_phy_ops = {
.read = mv88e6xxx_g2_smi_phy_read,
.write = mv88e6xxx_g2_smi_phy_write,
 };
 
-static const struct mv88e6xxx_ops mv88e6xxx_phy_ops = {
+static const struct mv88e6xxx_bus_ops mv88e6xxx_phy_ops = {
.read = mv88e6xxx_read,
.write = mv88e6xxx_write,
 };
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index ee7873a..b9ef769 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -648,7 +648,7 @@ struct mv88e6xxx_vtu_entry {
u8  data[DSA_MAX_PORTS];
 };
 
-struct mv88e6xxx_ops;
+struct mv88e6xxx_bus_ops;
 
 struct mv88e6xxx_priv_port {
struct net_device *bridge_dev;
@@ -669,14 +669,14 @@ struct mv88e6xxx_chip {
/* The MII bus and the address on the bus that is used to
 * communication with the switch
 */
-   const struct mv88e6xxx_ops *smi_ops;
+   const struct mv88e6xxx_bus_ops *smi_ops;
struct mii_bus *bus;
int sw_addr;
 
/* Handles automatic disabling and re-enabling of the PHY
 * polling unit.
 */
-   const struct mv88e6xxx_ops *phy_ops;
+   const struct mv88e6xxx_bus_ops *phy_ops;
struct mutexppu_mutex;
int ppu_disabled;
struct work_struct  ppu_work;
@@ -705,7 +705,7 @@ struct mv88e6xxx_chip {
struct mii_bus *mdio_bus;
 };
 
-struct mv88e6xxx_ops {
+struct mv88e6xxx_bus_ops {
int (*read)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val);
int (*write)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val);
 };
-- 
2.10.0

[PATCH net-next 05/10] net: dsa: mv88e6xxx: add mv88e6xxx_num_ports helper

2016-09-29 Thread Vivien Didelot

Add an mv88e6xxx_num_ports helper instead of digging in the chip info
structure.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 30 +++---
 drivers/net/dsa/mv88e6xxx/global2.c   |  8 
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  5 +
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6a55bba..9056d9e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -613,7 +613,7 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
reg |= PORT_PCS_CTRL_DUPLEX_FULL;
 
if ((mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip)) &&
-   (port >= chip->info->num_ports - 2)) {
+   (port >= mv88e6xxx_num_ports(chip) - 2)) {
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
reg |= PORT_PCS_CTRL_RGMII_DELAY_RXCLK;
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
@@ -1112,7 +1112,7 @@ static int _mv88e6xxx_port_state(struct mv88e6xxx_chip 
*chip, int port,
 static int _mv88e6xxx_port_based_vlan_map(struct mv88e6xxx_chip *chip, int 
port)
 {
struct net_device *bridge = chip->ports[port].bridge_dev;
-   const u16 mask = (1 << chip->info->num_ports) - 1;
+   const u16 mask = (1 << mv88e6xxx_num_ports(chip)) - 1;
struct dsa_switch *ds = chip->ds;
u16 output_ports = 0;
u16 reg;
@@ -1123,7 +1123,7 @@ static int _mv88e6xxx_port_based_vlan_map(struct 
mv88e6xxx_chip *chip, int port)
if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)) {
output_ports = mask;
} else {
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
/* allow sending frames to every group member */
if (bridge && chip->ports[i].bridge_dev == bridge)
output_ports |= BIT(i);
@@ -1279,7 +1279,7 @@ static int _mv88e6xxx_vtu_stu_data_read(struct 
mv88e6xxx_chip *chip,
return err;
}
 
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
unsigned int shift = (i % 4) * 4 + nibble_offset;
u16 reg = regs[i / 4];
 
@@ -1308,7 +1308,7 @@ static int _mv88e6xxx_vtu_stu_data_write(struct 
mv88e6xxx_chip *chip,
u16 regs[3] = { 0 };
int i, err;
 
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
unsigned int shift = (i % 4) * 4 + nibble_offset;
u8 data = entry->data[i];
 
@@ -1658,7 +1658,7 @@ static int _mv88e6xxx_fid_new(struct mv88e6xxx_chip 
*chip, u16 *fid)
bitmap_zero(fid_bitmap, MV88E6XXX_N_FID);
 
/* Set every FID bit used by the (un)bridged ports */
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
err = _mv88e6xxx_port_fid_get(chip, i, fid);
if (err)
return err;
@@ -1708,7 +1708,7 @@ static int _mv88e6xxx_vtu_new(struct mv88e6xxx_chip 
*chip, u16 vid,
return err;
 
/* exclude all ports except the CPU and DSA ports */
-   for (i = 0; i < chip->info->num_ports; ++i)
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i)
vlan.data[i] = dsa_is_cpu_port(ds, i) || dsa_is_dsa_port(ds, i)
? GLOBAL_VTU_DATA_MEMBER_TAG_UNMODIFIED
: GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
@@ -1797,7 +1797,7 @@ static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch 
*ds, int port,
if (vlan.vid > vid_end)
break;
 
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
if (dsa_is_dsa_port(ds, i) || dsa_is_cpu_port(ds, i))
continue;
 
@@ -1959,7 +1959,7 @@ static int _mv88e6xxx_port_vlan_del(struct mv88e6xxx_chip 
*chip,
 
/* keep the VLAN unless all ports are excluded */
vlan.valid = false;
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
if (dsa_is_cpu_port(ds, i) || dsa_is_dsa_port(ds, i))
continue;
 
@@ -2340,7 +2340,7 @@ static int mv88e6xxx_port_bridge_join(struct dsa_switch 
*ds, int port,
/* Assign the bridge and remap each port's VLANTable */
chip->ports[port].bridge_dev = bridge;
 
-   for (i = 0; i < chip->info->num_ports; ++i) {
+   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
if (chip->ports[i].bridge_dev == bridge) {
err = _mv88e6xxx_port_based_vlan_ma

Re: [PATCH RFC net-next] bnx2x: avoid printing unnecessary messages during register dump

2016-09-29 Thread Guilherme G. Piccoli

On 09/27/2016 11:43 PM, David Miller wrote:
> From: "Guilherme G. Piccoli" 
> Date: Tue, 27 Sep 2016 15:33:54 -0300
> 
>> The bnx2x driver prints multiple error messages during register dump,
>> with "ethtool -d" for example. The driver even warn that many messages
>> might be seen during the register dump, but they are harmless. A typical
>> kernel log after register dump looks like this:
>>
>>   [9.375] bnx2x: [bnx2x_get_regs:987(net0)]Generating register dump. Might 
>> trigger harmless GRC timeouts
>>   [9.439] bnx2x: [bnx2x_attn_int_deasserted3:4342(net0)]LATCHED attention 
>> 0x0400 (masked)
>>   [9.439] bnx2x: [bnx2x_attn_int_deasserted3:4346(net0)]GRC time-out 
>> 0x010580cd
>>   [...]
>>
>> The notation [...] means that some messages were supressed - in our
>> tests we saw 78 more "LATCHED attention" and "GRC time-out" messages,
>> supressed here.
>>
>> This patch avoid these messages to be printed on register dump instead
>> of just warn they are harmless.
>>
>> Signed-off-by: Guilherme G. Piccoli 
> 
> Although "ethtool -d" is really a debugging facility, I still think that
> serious care should be placed into arranging what gets dumped in such
> a way that such access timeouts and errors are minimized.
> 

David, thanks for your comment. I confess I didn't understand your
statement quite well. You say we shouldn't dump registers that will
cause timeouts, that's it?

If yes, I guess this is a valid point. We will however loose some debug
information (as you mentioned, 'ethtool -d' is a debug facility). Now,
since I'm no expert in QLogic adapter hw/fw, I want to ask Yuval/Ariel
why those timeouts are hit anyway. Are they completely harmless?

In my understanding/opinion, hiding the messages entirely (as this patch
does) OR avoid the timeouts by disabling some registers' dump are both
better alternatives than the current behavior of the driver.

Thanks,

Guilherme

Re: [PATCH v2 3/3] net: make net namespace sysctls belong to container's owner

2016-09-29 Thread Dmitry Torokhov

Hi David,

On Wed, Aug 10, 2016 at 2:36 PM, Dmitry Torokhov
 wrote:
> If net namespace is attached to a user namespace let's make container's
> root owner of sysctls affecting said network namespace instead of global
> root.
>
> This also allows us to clean up net_ctl_permissions() because we do not
> need to fudge permissions anymore for the container's owner since it now
> owns the objects in question.
>
> Acked-by: "Eric W. Biederman" 
> Signed-off-by: Dmitry Torokhov 

I was looking at linux-next today, and I noticed that, when you merged
my patch, you basically reverted the following commit:

commit d6e0d306449bcb5fa3c80e7a3edf11d45abf9ae9
Author: Tyler Hicks 
Date:   Thu Jun 2 23:43:22 2016 -0500

net: Use ns_capable_noaudit() when determining net sysctl permissions

The capability check should not be audited since it is only being used
to determine the inode permissions. A failed check does not indicate a
violation of security policy but, when an LSM is enabled, a denial audit
message was being generated.

The denial audit message caused confusion for some application authors
because root-running Go applications always triggered the denial. To
prevent this confusion, the capability check in net_ctl_permissions() is
switched to the noaudit variant.

BugLink: https://launchpad.net/bugs/1465724

Signed-off-by: Tyler Hicks 
Acked-by: Serge E. Hallyn 
Signed-off-by: James Morris 

Thanks!

> ---
>  fs/proc/proc_sysctl.c  |  5 +
>  include/linux/sysctl.h |  4 
>  net/sysctl_net.c   | 29 -
>  3 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 5e57c3e..28f9085 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -430,6 +430,7 @@ static int sysctl_perm(struct ctl_table_header *head, 
> struct ctl_table *table, i
>  static struct inode *proc_sys_make_inode(struct super_block *sb,
> struct ctl_table_header *head, struct ctl_table *table)
>  {
> +   struct ctl_table_root *root = head->root;
> struct inode *inode;
> struct proc_inode *ei;
>
> @@ -457,6 +458,10 @@ static struct inode *proc_sys_make_inode(struct 
> super_block *sb,
> if (is_empty_dir(head))
> make_empty_dir_inode(inode);
> }
> +
> +   if (root->set_ownership)
> +   root->set_ownership(head, table, &inode->i_uid, 
> &inode->i_gid);
> +
>  out:
> return inode;
>  }
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index fa7bc29..55bec2f 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /* For the /proc/sys support */
> @@ -156,6 +157,9 @@ struct ctl_table_root {
> struct ctl_table_set default_set;
> struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
>struct nsproxy *namespaces);
> +   void (*set_ownership)(struct ctl_table_header *head,
> + struct ctl_table *table,
> + kuid_t *uid, kgid_t *gid);
> int (*permissions)(struct ctl_table_header *head, struct ctl_table 
> *table);
>  };
>
> diff --git a/net/sysctl_net.c b/net/sysctl_net.c
> index ed98c1f..5bc1a3d 100644
> --- a/net/sysctl_net.c
> +++ b/net/sysctl_net.c
> @@ -42,26 +42,37 @@ static int net_ctl_permissions(struct ctl_table_header 
> *head,
>struct ctl_table *table)
>  {
> struct net *net = container_of(head->set, struct net, sysctls);
> -   kuid_t root_uid = make_kuid(net->user_ns, 0);
> -   kgid_t root_gid = make_kgid(net->user_ns, 0);
>
> /* Allow network administrator to have same access as root. */
> -   if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
> -   uid_eq(root_uid, current_euid())) {
> +   if (ns_capable(net->user_ns, CAP_NET_ADMIN)) {
> int mode = (table->mode >> 6) & 7;
> return (mode << 6) | (mode << 3) | mode;
> }
> -   /* Allow netns root group to have the same access as the root group */
> -   if (in_egroup_p(root_gid)) {
> -   int mode = (table->mode >> 3) & 7;
> -   return (mode << 3) | mode;
> -   }
> +
> return table->mode;
>  }
>
> +static void net_ctl_set_ownership(struct ctl_table_header *head,
> + struct ctl_table *table,
> + kuid_t *uid, kgid_t *gid)
> +{
> +   struct net *net = container_of(head->set, struct net, sysctls);
> +   kuid_t ns_root_uid;
> +   kgid_t ns_root_gid;
> +
> +   ns_root_uid = make_kuid(net->user_ns, 0);
> +   if (uid_valid(ns_root_uid))
> +   *uid = ns_root_uid;
> +
> +   ns_root_gid = make_kgid(net->user_ns, 0);
> +   if (gid_valid(ns_root_gid))

Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-09-29 Thread James Chapman

On 29/09/16 03:36, R. Parameswaran wrote:
> I agree that something like 2. below would be needed in the long run (it 
> will need some effort and redesign -e.g. how do I lookup the parent tunnel 
> from the socket when receiving a PMTU update, existing pointer chain runs 
> from tunnel to socket).
>> 2) Add code to handle PMTU events that land on the UDP tunnel
>>socket.

Another function pointer could be added to struct udp_sock, similar to
encap_rcv,  such that the pmtu event could be handled by the UDP encap
protocol implementation.

James

Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-09-29 Thread James Chapman

On 22/09/16 21:52, R. Parameswaran wrote:
> From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> From: "R. Parameswaran" 
> Date: Thu, 22 Sep 2016 13:19:25 -0700
> Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
>
> Take into account all of the tunnel encapsulation headers when setting
> up the MTU on the L2TP logical interface device. Otherwise, packets
> created by the applications on top of the L2TP layer are larger
> than they ought to be, relative to the underlay MTU, leading to
> needless fragmentation once the outer IP encap is added.
>
> Specifically, take into account the (outer, underlay) IP header
> imposed on the encapsulated L2TP packet, and the Layer 2 header
> imposed on the inner IP packet prior to L2TP encapsulation.
>
> Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> and the dst entry in the L2TP tunnel socket to directly pull up
> the underlay MTU (as the baseline number on top of which the
> encapsulation headers are factored in).  Fall back to Ethernet MTU
> if this fails.
>
> Signed-off-by: R. Parameswaran 
>
> Reviewed-by: "N. Prachanda" ,
> Reviewed-by: "R. Shearman" ,
> Reviewed-by: "D. Fawcus" 
> ---
>  net/l2tp/l2tp_eth.c | 48 
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index 57fc5a4..dbcd6bd 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -30,6 +30,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "l2tp_core.h"
>  
> @@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
>  }
>  #endif
>  
> +static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
> + struct l2tp_session *session,
> + struct net_device *dev)
> +{
> + unsigned int overhead = 0;
> + struct dst_entry *dst;
> +
> + if (session->mtu != 0) {
> + dev->mtu = session->mtu;
> + dev->needed_headroom += session->hdr_len;
> + if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> + dev->needed_headroom += sizeof(struct udphdr);
> + return;
> + }
> + overhead = session->hdr_len;
> + /* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
> + if (tunnel->sock->sk_family == AF_INET)
> + overhead += (ETH_HLEN + sizeof(struct iphdr));
> + else if (tunnel->sock->sk_family == AF_INET6)
> + overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
What about options in the IP header? If certain options are set on the
socket, the IP header may be larger.

> + /* Additionally, if the encap is UDP, account for UDP header size */
> + if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> + overhead += sizeof(struct udphdr);
> + /* If PMTU discovery was enabled, use discovered MTU on L2TP device */
> + dst = sk_dst_get(tunnel->sock);
> + if (dst) {
> + u32 pmtu = dst_mtu(dst);
> +
> + if (pmtu != 0)
> + dev->mtu = pmtu;
> + dst_release(dst);
> + }
> + /* else (no PMTUD) L2TP dev MTU defaulted to Ethernet MTU in caller */
> + session->mtu = dev->mtu - overhead;
> + dev->mtu = session->mtu;
> + dev->needed_headroom += session->hdr_len;
> + if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> + dev->needed_headroom += sizeof(struct udphdr);
> +}
> +
>  static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, 
> u32 peer_session_id, struct l2tp_session_cfg *cfg)
>  {
>   struct net_device *dev;
> @@ -255,11 +298,8 @@ static int l2tp_eth_create(struct net *net, u32 
> tunnel_id, u32 session_id, u32 p
>   }
>  
>   dev_net_set(dev, net);
> - if (session->mtu == 0)
> - session->mtu = dev->mtu - session->hdr_len;
> - dev->mtu = session->mtu;
> - dev->needed_headroom += session->hdr_len;
>  
> + l2tp_eth_adjust_mtu(tunnel, session, dev);
>   priv = netdev_priv(dev);
>   priv->dev = dev;
>   priv->session = session;

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Eric Dumazet

On Thu, 2016-09-29 at 07:51 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote:
> 
> > It addresses  the issue that Rick Jones pointed out was happening with
> > XPS. When packets are sent for a flow that has no socket and XPS is
> > enabled then each packet uses the XPS queue based on the running CPU.
> > Since the thread sending on a flow can be rescheduled on different
> > CPUs this is creating ooo packets. In this case the ooo is being
> > caused by interaction with XPS.
> > 
> 
> Nope, your patch does not address the problem properly.
> 
> I am not sure I want to spend more time explaining the issue.
> 
> Lets talk about this in Tokyo next week.
> 

Just as a reminder, sorry to bother you, stating some obvious facts for
both of us. We have public exchanges, so we also need to re-explain how
things work.

Queue selection on xmit happens before we hit the qdisc and its delays.

So when you access txq->dql.num_completed_ops and
txq->dql.num_enqueue_ops you can observe values that do not change for a
while.

Say a thread runs on a VM, and sends 2 packets P1, P2 on the same flow
(skb_get_hash() returns the same value for these 2 packets)

P1 is sent on behalf of CPU 1, we pickup queue txq1, and queue the
packet on its qdisc . Transmit does not happen because of some
constraints like rate limiting or scheduling constraints.

P2 is sent on behalf of CPU 2, we pickup queue txq2, notice that prior
packet chose txq1. We check txq1->dql and decide it is fine to use txq2,
since the dql params of txq1 were not changed yet.

( txq->dql.num_completed_ops == ent.queue_ptr )

Note that in RFS case, we have the guarantee that we observe 'live
queues' since they are the per cpu backlog.

So input_queue_head_incr() and input_queue_tail_incr_save() are
correctly doing the OOO prevention, because a queued packet immediately
changes the state.

So really your patch works if you have no qdisc, or a non congested
qdisc. (Think if P1 is dropped by a full pfifo or pfifo_fast : We really
want to avoid steering P2, P3, ..., PN on this full pfifo while maybe
other txq are idle). Strange attractors are back (check commit
9b462d02d6dd6 )

You could avoid (ab)using BQL with a different method, grabbing
skb->destructor for the packets that are socketless

The hash table would simply track the sum of skb->truesize to allow flow
migration. This would be self contained and not intrusive.

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Paolo Abeni

On Thu, 2016-09-29 at 07:49 -0700, Eric Dumazet wrote:
> On Thu, Sep 29, 2016 at 7:34 AM, Paolo Abeni  wrote:
> > On Thu, 2016-09-29 at 07:13 -0700, Eric Dumazet wrote:
> >> On Thu, 2016-09-29 at 16:01 +0200, Paolo Abeni wrote:
> >>
> >> > When we reach __sk_mem_reduce_allocated() we are sure we can free the
> >> > specified amount of memory, so we only need to ensure consistent
> >> > sk_prot->memory_allocated updates. The current atomic operation suffices
> >> > to this.
> >>
> >> Then why are you updating sk->sk_forward_alloc using racy operations ?
> >>
> >> If this is not needed or racy, do not do it.
> >
> > Thank you for all the feedback.
> >
> > The actual forward allocated memory value is:
> >
> > atomic_read(&up->mem_allocated) - atomic_read(&sk->sk_rmem_alloc).
> >
> > sk_forward_alloc is updated only to hint to the user space the forward
> > allocated memory value via the diag interface.
> >
> > If such information is not needed we can drop the update, and
> > sk_forward_alloc will always be seen as 0 even when the socket has some
> > forward allocation.
> 
> The information is needed and we want an accurate one, really.
> 
> Think about debugging on a live server, some stuck or mad sockets ;)
> 
> Please consider adding a proper accessor, able to deal with the UDP
> peculiarities.

Nice suggestion, thanks! I'll try that in v4. Perhaps is worth adding a
struct proto helper for this ?

I'm sorry, but I'll not be in Tokyo, so I'll probably produce some
traffic on netdev in the meanwhile.

Cheers,

Paolo

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Eric Dumazet

On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote:

> It addresses  the issue that Rick Jones pointed out was happening with
> XPS. When packets are sent for a flow that has no socket and XPS is
> enabled then each packet uses the XPS queue based on the running CPU.
> Since the thread sending on a flow can be rescheduled on different
> CPUs this is creating ooo packets. In this case the ooo is being
> caused by interaction with XPS.
> 

Nope, your patch does not address the problem properly.

I am not sure I want to spend more time explaining the issue.

Lets talk about this in Tokyo next week.

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Eric Dumazet

On Thu, Sep 29, 2016 at 7:34 AM, Paolo Abeni  wrote:
> On Thu, 2016-09-29 at 07:13 -0700, Eric Dumazet wrote:
>> On Thu, 2016-09-29 at 16:01 +0200, Paolo Abeni wrote:
>>
>> > When we reach __sk_mem_reduce_allocated() we are sure we can free the
>> > specified amount of memory, so we only need to ensure consistent
>> > sk_prot->memory_allocated updates. The current atomic operation suffices
>> > to this.
>>
>> Then why are you updating sk->sk_forward_alloc using racy operations ?
>>
>> If this is not needed or racy, do not do it.
>
> Thank you for all the feedback.
>
> The actual forward allocated memory value is:
>
> atomic_read(&up->mem_allocated) - atomic_read(&sk->sk_rmem_alloc).
>
> sk_forward_alloc is updated only to hint to the user space the forward
> allocated memory value via the diag interface.
>
> If such information is not needed we can drop the update, and
> sk_forward_alloc will always be seen as 0 even when the socket has some
> forward allocation.

The information is needed and we want an accurate one, really.

Think about debugging on a live server, some stuck or mad sockets ;)

Please consider adding a proper accessor, able to deal with the UDP
peculiarities.


int sk_forward_alloc_get(const struct sock *sk)
{
if (sk is not UDP)
   return sk->sk_forward_alloc;

   return the precise amount using the private fields that UDP maintains,
}


Then use this accessor in these places :

net/ipv4/af_inet.c:155: WARN_ON(sk->sk_forward_alloc);
net/core/sock_diag.c:66:mem[SK_MEMINFO_FWD_ALLOC] =
sk->sk_forward_alloc;
net/ipv4/inet_diag.c:191:   .idiag_fmem =
sk->sk_forward_alloc,
net/sched/em_meta.c:462:dst->value = sk->sk_forward_alloc;

Thanks.

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Paolo Abeni

On Thu, 2016-09-29 at 07:13 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 16:01 +0200, Paolo Abeni wrote:
> 
> > When we reach __sk_mem_reduce_allocated() we are sure we can free the
> > specified amount of memory, so we only need to ensure consistent
> > sk_prot->memory_allocated updates. The current atomic operation suffices
> > to this.
> 
> Then why are you updating sk->sk_forward_alloc using racy operations ?
> 
> If this is not needed or racy, do not do it.

Thank you for all the feedback.

The actual forward allocated memory value is:

atomic_read(&up->mem_allocated) - atomic_read(&sk->sk_rmem_alloc).

sk_forward_alloc is updated only to hint to the user space the forward
allocated memory value via the diag interface.

If such information is not needed we can drop the update, and
sk_forward_alloc will always be seen as 0 even when the socket has some
forward allocation.

Cheers,

Paolo

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Eric Dumazet

On Thu, 2016-09-29 at 16:01 +0200, Paolo Abeni wrote:

> When we reach __sk_mem_reduce_allocated() we are sure we can free the
> specified amount of memory, so we only need to ensure consistent
> sk_prot->memory_allocated updates. The current atomic operation suffices
> to this.

Then why are you updating sk->sk_forward_alloc using racy operations ?

If this is not needed or racy, do not do it.

So that we can remove this annoying thing that a dynamic checker could
very well detect.

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Tom Herbert

On Thu, Sep 29, 2016 at 9:18 AM, Eric Dumazet  wrote:
> On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote:
>> On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet  
>> wrote:
>> > On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote:
>> >> xps_flows maintains a per device flow table that is indexed by the
>> >> skbuff hash. The table is only consulted when there is no queue saved in
>> >> a transmit socket for an skbuff.
>> >>
>> >> Each entry in the flow table contains a queue index and a queue
>> >> pointer. The queue pointer is set when a queue is chosen using a
>> >> flow table entry. This pointer is set to the head pointer in the
>> >> transmit queue (which is maintained by BQL).
>> >>
>> >> The new function get_xps_flows_index that looks up flows in the
>> >> xps_flows table. The entry returned gives the last queue a matching flow
>> >> used. The returned queue is compared against the normal XPS queue. If
>> >> they are different, then we only switch if the tail pointer in the TX
>> >> queue has advanced past the pointer saved in the entry. In this
>> >> way OOO should be avoided when XPS wants to use a different queue.
>> >
>> > There is something I do not understand with this.
>> >
>> > If this OOO avoidance is tied to BQL, it means that packets sitting in a
>> > qdisc wont be part of the detection.
>> >
>> > So packets of flow X could possibly be queued on multiple qdiscs.
>> >
>> Hi Eric,
>>
>> I'm not sure I understand your concern. If packets for flow X can be
>> queued on multiple qdiscs wouldn't that mean that flow will be subject
>> to ooo transmission regardless of this patch? In other words here
>> we're trying to keep packets for the flow in order as they are emitted
>> from the qdiscs which we assume maintains ordering of packets in a
>> flow.
>
> Well, then what this patch series is solving ?
>
It addresses  the issue that Rick Jones pointed out was happening with
XPS. When packets are sent for a flow that has no socket and XPS is
enabled then each packet uses the XPS queue based on the running CPU.
Since the thread sending on a flow can be rescheduled on different
CPUs this is creating ooo packets. In this case the ooo is being
caused by interaction with XPS.

> You have a producer of packets running on 8 vcpus in a VM.
>
> Packets are exiting the VM and need to be queued on a mq NIC in the
> hypervisor.
>
> Flow X can be scheduled on any of these 8 vcpus, so XPS is currently
> selecting different TXQ.
>
> Your patch aims to detect this and steer packets to one TXQ, but not
> using a static hash() % num_real_queues (as RSS would have done on a NIC
> at RX), but trying to please XPS (as : selecting a queue close to the
> CPU producing the packet. VM with one vcpu, and cpu bounded, would
> really be happy to not spread packets all over the queues)
>
You're not describing this patch, you're describing how XPS works in
the first place. This patch is done to make socketless packets work
well with XPS. If all you want is a static hash() % num_real_queues
then just disable XPS to get that.

> Your mechanism relies on counters updated at the time packets are given
> to the NIC, not at the time packets are given to the txq (and eventually
> sit for a while in the qdisc right before BQL layer)
>
> dev_queue_xmit() is not talking to the device directly...
>
>
> All I am saying is that the producer/consumer counters you added are not
> at the right place.
>
> You tried hard to not change the drivers, by adding something to
> existing BQL. But packets can sit in a qdisc for a while...
>
But again, this patch is to prevent ooo being caused by an interaction
with XPS. It does not address ooo being caused by qdiscs or VMs, but
then I don't see that as being the issue reported by Rick.

> So you added 2 potential cache lines misses per outgoing packet, but
> with no guarantee that OOO are really avoided.
>
>
>

[PATCH net] net: pktgen: fix pkt_size

2016-09-29 Thread Paolo Abeni

The commit 879c7220e828 ("net: pktgen: Observe needed_headroom
of the device") increased the 'pkt_overhead' field value by
LL_RESERVED_SPACE.
As a side effect the generated packet size, computed as:

/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
  pkt_dev->pkt_overhead;

is decreased by the same value.
The above changed slightly the behavior of existing pktgen users,
and made the interface somewhat inconsistent.
Fix it by restoring the previous pkt_overhead value and using
LL_RESERVED_SPACE as extralen in skb allocation.
Also, change pktgen_alloc_skb() to only partially reserve
the headroom to allow the caller to prefetch from ll header
start.

Fixes: 879c7220e828 ("net: pktgen: Observe needed_headroom of the device")
Suggested-by: Ben Greear 
Signed-off-by: Paolo Abeni 
---
 net/core/pktgen.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index bbd118b..5c9b397 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2286,7 +2286,7 @@ out:
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
 {
-   pkt_dev->pkt_overhead = LL_RESERVED_SPACE(pkt_dev->odev);
+   pkt_dev->pkt_overhead = 0;
pkt_dev->pkt_overhead += pkt_dev->nr_labels*sizeof(u32);
pkt_dev->pkt_overhead += VLAN_TAG_SIZE(pkt_dev);
pkt_dev->pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
@@ -2777,13 +2777,13 @@ static void pktgen_finalize_skb(struct pktgen_dev 
*pkt_dev, struct sk_buff *skb,
 }
 
 static struct sk_buff *pktgen_alloc_skb(struct net_device *dev,
-   struct pktgen_dev *pkt_dev,
-   unsigned int extralen)
+   struct pktgen_dev *pkt_dev)
 {
+   unsigned int extralen = LL_RESERVED_SPACE(dev);
struct sk_buff *skb = NULL;
-   unsigned int size = pkt_dev->cur_pkt_size + 64 + extralen +
-   pkt_dev->pkt_overhead;
+   unsigned int size;
 
+   size = pkt_dev->cur_pkt_size + 64 + extralen + pkt_dev->pkt_overhead;
if (pkt_dev->flags & F_NODE) {
int node = pkt_dev->node >= 0 ? pkt_dev->node : numa_node_id();
 
@@ -2796,8 +2796,9 @@ static struct sk_buff *pktgen_alloc_skb(struct net_device 
*dev,
 skb = __netdev_alloc_skb(dev, size, GFP_NOWAIT);
}
 
+   /* the caller prefetch from skb->data and reserve for the mac hdr */
if (likely(skb))
-   skb_reserve(skb, LL_RESERVED_SPACE(dev));
+   skb_reserve(skb, extralen - 16);
 
return skb;
 }
@@ -2830,16 +2831,14 @@ static struct sk_buff *fill_packet_ipv4(struct 
net_device *odev,
mod_cur_headers(pkt_dev);
queue_map = pkt_dev->cur_queue_map;
 
-   datalen = (odev->hard_header_len + 16) & ~0xf;
-
-   skb = pktgen_alloc_skb(odev, pkt_dev, datalen);
+   skb = pktgen_alloc_skb(odev, pkt_dev);
if (!skb) {
sprintf(pkt_dev->result, "No memory");
return NULL;
}
 
prefetchw(skb->data);
-   skb_reserve(skb, datalen);
+   skb_reserve(skb, 16);
 
/*  Reserve for ethernet and IP header  */
eth = (__u8 *) skb_push(skb, 14);
@@ -2959,7 +2958,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
mod_cur_headers(pkt_dev);
queue_map = pkt_dev->cur_queue_map;
 
-   skb = pktgen_alloc_skb(odev, pkt_dev, 16);
+   skb = pktgen_alloc_skb(odev, pkt_dev);
if (!skb) {
sprintf(pkt_dev->result, "No memory");
return NULL;
-- 
1.8.3.1

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Paolo Abeni

On Thu, 2016-09-29 at 06:24 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote:
> > On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote:
> > > 
> > > > +static void udp_rmem_release(struct sock *sk, int partial)
> > > > +{
> > > > +   struct udp_sock *up = udp_sk(sk);
> > > > +   int fwd, amt;
> > > > +
> > > > +   if (partial && !udp_under_memory_pressure(sk))
> > > > +   return;
> > > > +
> > > > +   /* we can have concurrent release; if we catch any conflict
> > > > +* we let only one of them do the work
> > > > +*/
> > > > +   if (atomic_dec_if_positive(&up->can_reclaim) < 0)
> > > > +   return;
> > > > +
> > > > +   fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc));
> > > > +   if (fwd < SK_MEM_QUANTUM + partial) {
> > > > +   atomic_inc(&up->can_reclaim);
> > > > +   return;
> > > > +   }
> > > > +
> > > > +   amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > > > +   atomic_sub(amt, &up->mem_allocated);
> > > > +   atomic_inc(&up->can_reclaim);
> > > > +
> > > > +   __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > > > +   sk->sk_forward_alloc = fwd - amt;
> > > > +}
> > > 
> > > 
> > > This is racy... all these atomics make me nervous...
> > 
> > Ah, perhaps I got it: if we have a concurrent memory scheduling, we
> > could end up with a value of mem_allocated below the real need. 
> > 
> > That mismatch will not drift: at worst we can end up with mem_allocated
> > being single SK_MEM_QUANTUM below what is strictly needed.
> > 
> > A possible alternative could be:
> > 
> > static void udp_rmem_release(struct sock *sk, int partial)
> > {
> > struct udp_sock *up = udp_sk(sk);
> > int fwd, amt, alloc_old, alloc;
> > 
> > if (partial && !udp_under_memory_pressure(sk))
> > return;
> > 
> > alloc = atomic_read(&up->mem_allocated);
> > fwd = alloc - atomic_read(&sk->sk_rmem_alloc);
> > if (fwd < SK_MEM_QUANTUM + partial)
> > return;
> > 
> > amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt);
> > /* if a concurrent update is detected, just do nothing; if said update
> >  * is due to another memory release, that release take care of
> >  * reclaiming the memory for us, too.
> >  * Otherwise we will be able to release on later dequeue, since
> >  * we will eventually stop colliding with the writer when it will
> >  * consume all the fwd allocated memory
> >  */
> > if (alloc_old != alloc)
> > return;
> > 
> > __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > sk->sk_forward_alloc = fwd - amt;
> 
> Can still be done from multiple cpus.
> 
> Add some ndelay() or udelay() before to simulate fact that current cpu
> could be interrupted by an NMI handler (perf for example)... or hard IRQ
> handler...
> 
> Then make sure your tests involve 16 concurrent cpus dealing with one
> udp socket...

Thank you again reviewing this.

I'm working to this sort of tests right now.

> 
> > }
> > 
> > which is even more lazy in reclaiming but should never underestimate the
> > needed forward allocation, and under pressure should eventually free the
> > needed memory.
> 
> 
> If this code is rarely used, why don't you simply use a real spinlock,
> so that we do not have to worry about all this ?
> 
> A spinlock  acquisition/release is a _single_ locked operation.
> Faster than the 3 atomic you got in last version.
> spinlock code (ticket or MCS) avoids starvation.

I'd like to avoid adding a lock, if possible, to avoid any possible
source of contention.

> Then, you can safely update multiple fields in the socket.
> 
> And you get nice lockdep support as a bonus.
> 
> cmpxchg() is fine when a single field need an exclusion. But there you
> have multiple fields to update at once :
> 
> sk_memory_allocated_add() and sk_memory_allocated_sub() can work using 
> atomic_long_add_return() and atomic_long_sub() because their caller owns
> the socket lock and can safely update sk->sk_forward_alloc without
> additional locking, but UDP wont have this luxury after your patches.

When we reach __sk_mem_reduce_allocated() we are sure we can free the
specified amount of memory, so we only need to ensure consistent
sk_prot->memory_allocated updates. The current atomic operation suffices
to this.

Paolo

[NetDev] [ANNOUNCE] Netdev 1.2 program announced

2016-09-29 Thread Hajime Tazaki


Hello,

The program of Linux netdev 1.2 is now online !

http://netdevconf.org/1.2/schedule.html

As we stated, all of registered people is now in the mailing
list, people at netdevconf.org.  If you wish to communicate
with the attendees of netdev 1.2, feel free to exchange
messages.


-- Hajime

Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers

2016-09-29 Thread Eric Dumazet

On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote:
> On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote:
> > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote:
> > 
> > > +static void udp_rmem_release(struct sock *sk, int partial)
> > > +{
> > > + struct udp_sock *up = udp_sk(sk);
> > > + int fwd, amt;
> > > +
> > > + if (partial && !udp_under_memory_pressure(sk))
> > > + return;
> > > +
> > > + /* we can have concurrent release; if we catch any conflict
> > > +  * we let only one of them do the work
> > > +  */
> > > + if (atomic_dec_if_positive(&up->can_reclaim) < 0)
> > > + return;
> > > +
> > > + fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc));
> > > + if (fwd < SK_MEM_QUANTUM + partial) {
> > > + atomic_inc(&up->can_reclaim);
> > > + return;
> > > + }
> > > +
> > > + amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > > + atomic_sub(amt, &up->mem_allocated);
> > > + atomic_inc(&up->can_reclaim);
> > > +
> > > + __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > > + sk->sk_forward_alloc = fwd - amt;
> > > +}
> > 
> > 
> > This is racy... all these atomics make me nervous...
> 
> Ah, perhaps I got it: if we have a concurrent memory scheduling, we
> could end up with a value of mem_allocated below the real need. 
> 
> That mismatch will not drift: at worst we can end up with mem_allocated
> being single SK_MEM_QUANTUM below what is strictly needed.
> 
> A possible alternative could be:
> 
> static void udp_rmem_release(struct sock *sk, int partial)
> {
>   struct udp_sock *up = udp_sk(sk);
>   int fwd, amt, alloc_old, alloc;
> 
>   if (partial && !udp_under_memory_pressure(sk))
>   return;
> 
>   alloc = atomic_read(&up->mem_allocated);
>   fwd = alloc - atomic_read(&sk->sk_rmem_alloc);
>   if (fwd < SK_MEM_QUANTUM + partial)
>   return;
> 
>   amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
>   alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt);
>   /* if a concurrent update is detected, just do nothing; if said update
>* is due to another memory release, that release take care of
>* reclaiming the memory for us, too.
>* Otherwise we will be able to release on later dequeue, since
>* we will eventually stop colliding with the writer when it will
>* consume all the fwd allocated memory
>*/
>   if (alloc_old != alloc)
>   return;
> 
>   __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
>   sk->sk_forward_alloc = fwd - amt;

Can still be done from multiple cpus.

Add some ndelay() or udelay() before to simulate fact that current cpu
could be interrupted by an NMI handler (perf for example)... or hard IRQ
handler...

Then make sure your tests involve 16 concurrent cpus dealing with one
udp socket...

> }
> 
> which is even more lazy in reclaiming but should never underestimate the
> needed forward allocation, and under pressure should eventually free the
> needed memory.


If this code is rarely used, why don't you simply use a real spinlock,
so that we do not have to worry about all this ?

A spinlock  acquisition/release is a _single_ locked operation.
Faster than the 3 atomic you got in last version.
spinlock code (ticket or MCS) avoids starvation.

Then, you can safely update multiple fields in the socket.

And you get nice lockdep support as a bonus.

cmpxchg() is fine when a single field need an exclusion. But there you
have multiple fields to update at once :

sk_memory_allocated_add() and sk_memory_allocated_sub() can work using 
atomic_long_add_return() and atomic_long_sub() because their caller owns
the socket lock and can safely update sk->sk_forward_alloc without
additional locking, but UDP wont have this luxury after your patches.

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Eric Dumazet

On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote:
> On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet  wrote:
> > On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote:
> >> xps_flows maintains a per device flow table that is indexed by the
> >> skbuff hash. The table is only consulted when there is no queue saved in
> >> a transmit socket for an skbuff.
> >>
> >> Each entry in the flow table contains a queue index and a queue
> >> pointer. The queue pointer is set when a queue is chosen using a
> >> flow table entry. This pointer is set to the head pointer in the
> >> transmit queue (which is maintained by BQL).
> >>
> >> The new function get_xps_flows_index that looks up flows in the
> >> xps_flows table. The entry returned gives the last queue a matching flow
> >> used. The returned queue is compared against the normal XPS queue. If
> >> they are different, then we only switch if the tail pointer in the TX
> >> queue has advanced past the pointer saved in the entry. In this
> >> way OOO should be avoided when XPS wants to use a different queue.
> >
> > There is something I do not understand with this.
> >
> > If this OOO avoidance is tied to BQL, it means that packets sitting in a
> > qdisc wont be part of the detection.
> >
> > So packets of flow X could possibly be queued on multiple qdiscs.
> >
> Hi Eric,
> 
> I'm not sure I understand your concern. If packets for flow X can be
> queued on multiple qdiscs wouldn't that mean that flow will be subject
> to ooo transmission regardless of this patch? In other words here
> we're trying to keep packets for the flow in order as they are emitted
> from the qdiscs which we assume maintains ordering of packets in a
> flow.

Well, then what this patch series is solving ?

You have a producer of packets running on 8 vcpus in a VM.

Packets are exiting the VM and need to be queued on a mq NIC in the
hypervisor.

Flow X can be scheduled on any of these 8 vcpus, so XPS is currently
selecting different TXQ.

Your patch aims to detect this and steer packets to one TXQ, but not
using a static hash() % num_real_queues (as RSS would have done on a NIC
at RX), but trying to please XPS (as : selecting a queue close to the
CPU producing the packet. VM with one vcpu, and cpu bounded, would
really be happy to not spread packets all over the queues)

Your mechanism relies on counters updated at the time packets are given
to the NIC, not at the time packets are given to the txq (and eventually
sit for a while in the qdisc right before BQL layer)

dev_queue_xmit() is not talking to the device directly...

All I am saying is that the producer/consumer counters you added are not
at the right place.

You tried hard to not change the drivers, by adding something to
existing BQL. But packets can sit in a qdisc for a while...

So you added 2 potential cache lines misses per outgoing packet, but
with no guarantee that OOO are really avoided.

Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

2016-09-29 Thread Tom Herbert

On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet  wrote:
> On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote:
>> xps_flows maintains a per device flow table that is indexed by the
>> skbuff hash. The table is only consulted when there is no queue saved in
>> a transmit socket for an skbuff.
>>
>> Each entry in the flow table contains a queue index and a queue
>> pointer. The queue pointer is set when a queue is chosen using a
>> flow table entry. This pointer is set to the head pointer in the
>> transmit queue (which is maintained by BQL).
>>
>> The new function get_xps_flows_index that looks up flows in the
>> xps_flows table. The entry returned gives the last queue a matching flow
>> used. The returned queue is compared against the normal XPS queue. If
>> they are different, then we only switch if the tail pointer in the TX
>> queue has advanced past the pointer saved in the entry. In this
>> way OOO should be avoided when XPS wants to use a different queue.
>
> There is something I do not understand with this.
>
> If this OOO avoidance is tied to BQL, it means that packets sitting in a
> qdisc wont be part of the detection.
>
> So packets of flow X could possibly be queued on multiple qdiscs.
>
Hi Eric,

I'm not sure I understand your concern. If packets for flow X can be
queued on multiple qdiscs wouldn't that mean that flow will be subject
to ooo transmission regardless of this patch? In other words here
we're trying to keep packets for the flow in order as they are emitted
from the qdiscs which we assume maintains ordering of packets in a
flow.

Tom

>
>

Re: [PATCH RFC iproute2] iplink: Support envhdrlen

2016-09-29 Thread Jiri Benc

On Tue, 27 Sep 2016 17:55:40 +0900, Toshiaki Makita wrote:
> This adds support for envhdrlen.
> 
> Example:
>  # ip link set eno1 envhdrlen 8

I don't see why this should be user visible, let alone requiring user
to set it. This should be transparent, kernel should compute the value
as needed based on the configuration and set it up. Requiring the
administrator to pick up a calculator and sum up all the vlan, mpls and
whatever header lengths is silly.

I realize that we currently have no easy way to do that. Especially
with lwtunnels and stuff line MPLS where we don't easily know the
number of tags. But every uAPI we introduce will have to be supported
forever and going a particular way just because it is easy to implement
is not sustainable.

At the very least, it should be configurable from the other direction.
I.e. telling which interfaces can be used by vlans or MPLS (if it
cannot be inferred automatically) and configuring maximum number of
tags on the given vlan/mpls/whatever interface/route/whatever.

 Jiri

Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-09-29 Thread Jiri Benc

On Wed, 28 Sep 2016 19:36:45 -0700 (PDT), R. Parameswaran wrote:
> I'd like to point out one difference with VXLAN - in VXLAN, the 
> local physical interface is directly specified at the time of 
> creation of the tunnel, and the data structure seems to have the ifindex 
> of the local interface with which it is able to directly pull up the 
> underlay interface device. Whereas in L2TP, we only have the IP
> address of the remote tunnel end-point and thus only the socket and the 
> dst from which we need to derive this. 

Strictly speaking, VXLAN *may* know the underlying interface. It can
also be set up with just local and remote IP address, or even worse, in
metadata mode where we don't know the address nor the interface until
we get a packet (and each packet may have those different).

MTU wise, those cases are not accommodated for in the kernel. The vxlan
interface gets MTU of 1500 and it's up to the administrator to set it
correctly.

Btw, PMTU events won't help with the metadata mode. And even in
"normal" mode, it's not clear what should be done - the tunnel
interface may be in a bridge, thus there may be other interfaces that
depend on the same MTU, up to inside VMs.

 Jiri

[PATCH] af_unix: fix garbage collect vs. MSG_PEEK

2016-09-29 Thread Miklos Szeredi

Gc assumes that in-flight sockets that don't have an external ref can't
gain one while unix_gc_lock is held.  That is true because
unix_notinflight() will be called before detaching fds, which takes
unix_gc_lock.

Only MSG_PEEK was somehow overlooked.  That one also clones the fds, also
keeping them in the skb.  But through MSG_PEEK an external reference can
definitely be gained without ever touching unix_gc_lock.

This patch adds unix_gc_barrier() that waits for a garbage collect run to
finish (if there is one), before actually installing the peeked in-flight
files to file descriptors.  This prevents problems from a pure in-flight
socket having its buffers modified while the garbage collect is taking
place.

Signed-off-by: Miklos Szeredi 
Cc: 
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c| 15 +--
 net/unix/garbage.c|  5 +
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index fd60eccb59a6..f73ce09da2b4 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -10,6 +10,7 @@ void unix_inflight(struct user_struct *user, struct file *fp);
 void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_gc(void);
 void wait_for_unix_gc(void);
+void unix_gc_barrier(void);
 struct sock *unix_get_socket(struct file *filp);
 struct sock *unix_peer_get(struct sock *);
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8309687a56b0..cece344b41b8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1550,6 +1550,17 @@ static int unix_attach_fds(struct scm_cookie *scm, 
struct sk_buff *skb)
return max_level;
 }
 
+static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
+{
+   scm->fp = scm_fp_dup(UNIXCB(skb).fp);
+   /*
+* During garbage collection it is assumed that in-flight sockets don't
+* get a new external reference.  So we need to wait until current run
+* finishes.
+*/
+   unix_gc_barrier();
+}
+
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool 
send_fds)
 {
int err = 0;
@@ -2182,7 +2193,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct 
msghdr *msg,
sk_peek_offset_fwd(sk, size);
 
if (UNIXCB(skb).fp)
-   scm.fp = scm_fp_dup(UNIXCB(skb).fp);
+   unix_peek_fds(&scm, skb);
}
err = (flags & MSG_TRUNC) ? skb->len - skip : size;
 
@@ -2422,7 +2433,7 @@ unlock:
/* It is questionable, see note in unix_dgram_recvmsg.
 */
if (UNIXCB(skb).fp)
-   scm.fp = scm_fp_dup(UNIXCB(skb).fp);
+   unix_peek_fds(&scm, skb);
 
sk_peek_offset_fwd(sk, chunk);
 
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 6a0d48525fcf..d4fb6ff8024d 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -266,6 +266,11 @@ void wait_for_unix_gc(void)
wait_event(unix_gc_wait, gc_in_progress == false);
 }
 
+void unix_gc_barrier(void)
+{
+   spin_unlock_wait(&unix_gc_lock);
+}
+
 /* The external entry point: unix_gc() */
 void unix_gc(void)
 {
-- 
2.5.5

Request for backport to 4.4 stable

2016-09-29 Thread Jorgen S. Hansen

Hello,

It would be great if the following commit could be backported to 4.4 stable. 
The commit fixes a bug that was introduced in 4.3 and fixed in 4.6. The bug may 
cause wrongful termination of vSocket connections, if there is more than one 
active vSocket connection in a given VM. This affects VMware services running 
inside VMs:

commit  8ab18d71de8b07d2c4d6f984b718418c09ea45c5 

VSOCK: Detach QP check should filter out non matching QPs.

The check in vmci_transport_peer_detach_cb should only allow a
detach when the qp handle of the transport matches the one in
the detach message.

Thanks,
Jørgen

Re: [PATCH net-next 1/8] net/ncsi: Avoid unused-value build warning from ia64-linux-gcc

2016-09-29 Thread Gavin Shan

On Thu, Sep 29, 2016 at 01:54:04AM -0400, David Miller wrote:
>From: Gavin Shan 
>Date: Thu, 29 Sep 2016 15:03:08 +1000
>
>> This replaces the atomic access to NCSI channel's state with READ_ONCE()
>> and WRITE_ONCE() to avoid the above build warning. We needn't hold the
>> channel's lock when updating its state as well. No logical changes
>> introduced.
>
>I don't understand this.
>
>If it's important to take the lock for the list add/del, then it must
>be important to make the state change appear atomic wrt. that lock as
>well.
>
>Can parallel threads of control enter these functions which change the
>state?  If so, then you need to make the state changes under the lock.
>In fact, you probably have to make the state tests under the locks as
>well.
>
>If not, please explain what prevents it from happening.
>

Dave, thanks for your comments. I think it's occasionally working on
AST2400 and AST2500 platforms. It's reasonable to grab the lock before
fetching or updating the NCSI channel's state. Adding and removing the
channel from the list also need taking the lock as well. I will modify
the code accordingly in next revision.

AST2400/AST2500 has single CPU. The channel's state (and the linked
list) are changed in softirq context (packet Rx handler or timer),
meaning they are not accessed in parallel mode. However, NCSI stack
cannot make assumption to be run on single CPU platforms only. So
yes, we need the lock to protect them.

Thanks,
Gavin

1 2 >

1 - 100 of 121 matches

Mail list logo