Re: [PATCH] mellanox: mlx5: Use logging functions to reduce text ~10k/5%

2016-06-23 Thread Leon Romanovsky
On Wed, Jun 22, 2016 at 11:23:59AM -0700, Joe Perches wrote:
> The logging macros create a bit of duplicated code/text.
> 
> Use specialized functions to reduce the duplication.
> 
> (defconfig/x86-64)
> $ size drivers/net/ethernet/mellanox/mlx5/core/built-in.o*
>    text      data bss dec hex filename
>  178634      2059  16  180709   2c1e5 
> drivers/net/ethernet/mellanox/mlx5/core/built-in.o.new
>  188679      2059  16  190754   2e922 
> drivers/net/ethernet/mellanox/mlx5/core/built-in.o.old
> 
> The output changes now do not include line #,
> but do include the function offset.
> 
> Signed-off-by: Joe Perches 

As far as I see all these functions are used in error paths, so no
implication on performance is expected.

And I'm fine with function offsets.

Saeed,
What do you think?

Reviewed-by: Leon Romanovsky 


signature.asc
Description: Digital signature


Re: [PATCH V4 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-23 Thread Tien Hock Loh
Hi Arnd,

On Tue, 2016-06-21 at 11:34 +0200, Arnd Bergmann wrote:
> On Tuesday, June 21, 2016 1:46:11 AM CEST th...@altera.com wrote:
> > diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt 
> > b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
> > index 72d82d6..dd10f2f 100644
> > --- a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
> > +++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
> > @@ -17,9 +17,26 @@ Required properties:
> >  Optional properties:
> >  altr,emac-splitter: Should be the phandle to the emac splitter soft IP 
> > node if
> > DWMAC controller is connected emac splitter.
> > +phy-mode: The phy mode the ethernet operates in
> > +altr,sgmii_to_sgmii_converter: phandle to the TSE SGMII converter
> > +
> 
> Please use '-' instead of '_' in the property names.

Overlooked this when I were fixing v4, I'll get this fixed.

> 
> Can you explain in the patch description why you can't reference
> the converter using the normal "phy-handle" property and implement
> the converter as a phy driver?
> 

The converter isn't a PHY, but an adapter that handles data stream, and
the phandle is only used to reset the adapter in software's context,
thus it doesn't seem to be correct to implement it as a phy driver. 
Does that answer your question?
If so, do you think we need to document this in the patch description in
this case?

>   Arnd
> 

Thanks
Tien Hock


Re: [PATCH] mellanox: mlx5: Use logging functions to reduce text ~10k/5%

2016-06-23 Thread Leon Romanovsky
On Thu, Jun 23, 2016 at 08:27:01AM +0300, Leon Romanovsky wrote:
> On Wed, Jun 22, 2016 at 11:23:59AM -0700, Joe Perches wrote:
> > The logging macros create a bit of duplicated code/text.
> > 
> > Use specialized functions to reduce the duplication.
> > 
> > (defconfig/x86-64)
> > $ size drivers/net/ethernet/mellanox/mlx5/core/built-in.o*
> >    text    data bss dec hex filename
> >  178634    2059  16  180709   2c1e5 
> > drivers/net/ethernet/mellanox/mlx5/core/built-in.o.new
> >  188679    2059  16  190754   2e922 
> > drivers/net/ethernet/mellanox/mlx5/core/built-in.o.old
> > 
> > The output changes now do not include line #,
> > but do include the function offset.
> > 
> > Signed-off-by: Joe Perches 
> 
> As far as I see all these functions are used in error paths, so no
> implication on performance is expected.
> 
> And I'm fine with function offsets.
> 
> Saeed,
> What do you think?
> 
> Reviewed-by: Leon Romanovsky 

I continued to play with this patch and it doesn't pass checkpatch.
It looks like corrupted file.

➜  linux-rdma git:(master) ./scripts/checkpatch.pl
~/Downloads/mellanox-mlx5-Use-logging-functions-to-reduce-text-10k-5.patch
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
#21: 
 178634    2059  16  180709   2c1e5
drivers/net/ethernet/mellanox/mlx5/core/built-in.o.new

ERROR: patch seems to be corrupt (line wrapped?)
#46: FILE: drivers/net/ethernet/mellanox/mlx5/core/main.c:1556:

CHECK: Alignment should match open parenthesis
#78: FILE: drivers/net/ethernet/mellanox/mlx5/core/main.c:1586:
+   dev_warn(&dev->pdev->dev, "%s:%pS:(pid %d): %pV",
+    dev->priv.name, __builtin_return_address(0), current->pid,

ERROR: space required before that '&' (ctx:VxV)
#79: FILE: drivers/net/ethernet/mellanox/mlx5/core/main.c:1587:
+    &vaf);
  ^
total: 2 errors, 1 warnings, 1 checks, 58 lines  checked


signature.asc
Description: Digital signature


Re: [PATCH net-next 4/4] net_sched: generalize bulk dequeue

2016-06-23 Thread Paolo Abeni
On Tue, 2016-06-21 at 23:16 -0700, Eric Dumazet wrote:
> When qdisc bulk dequeue was added in linux-3.18 (commit
> 5772e9a3463b "qdisc: bulk dequeue support for qdiscs
> with TCQ_F_ONETXQUEUE"), it was constrained to some
> specific qdiscs.
> 
> With some extra care, we can extend this to all qdiscs,
> so that typical traffic shaping solutions can benefit from
> small batches (8 packets in this patch).
> 
> For example, HTB is often used on some multi queue device.
> And bonding/team are multi queue devices...
> 
> Idea is to bulk-dequeue packets mapping to the same transmit queue.
> 
> This brings between 35 and 80 % performance increase in HTB setup
> under pressure on a bonding setup :
> 
> 1) NUMA node contention :   610,000 pps -> 1,110,000 pps
> 2) No node contention   : 1,380,000 pps -> 1,930,000 pps
> 
> Now we should work to add batches on the enqueue() side ;)
> 
> Signed-off-by: Eric Dumazet 
> Cc: John Fastabend 
> Cc: Jesper Dangaard Brouer 
> Cc: Hannes Frederic Sowa 
> Cc: Florian Westphal 
> Cc: Daniel Borkmann 
> ---
>  include/net/sch_generic.h |  7 ++---
>  net/sched/sch_generic.c   | 68 
> ---
>  2 files changed, 62 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 04e84c07c94f..909aff2db2b3 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -75,13 +75,14 @@ struct Qdisc {
>   /*
>* For performance sake on SMP, we put highly modified fields at the end
>*/
> - struct Qdisc*next_sched cacheline_aligned_in_smp;
> - struct sk_buff  *gso_skb;
> - unsigned long   state;
> + struct sk_buff  *gso_skb cacheline_aligned_in_smp;
>   struct sk_buff_head q;
>   struct gnet_stats_basic_packed bstats;
>   seqcount_t  running;
>   struct gnet_stats_queue qstats;
> + unsigned long   state;
> + struct Qdisc*next_sched;
> + struct sk_buff  *skb_bad_txq;
>   struct rcu_head rcu_head;
>   int padded;
>   atomic_trefcnt;
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index ff86606954f2..e95b67cd5718 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -77,6 +77,34 @@ static void try_bulk_dequeue_skb(struct Qdisc *q,
>   skb->next = NULL;
>  }
>  
> +/* This variant of try_bulk_dequeue_skb() makes sure
> + * all skbs in the chain are for the same txq
> + */
> +static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
> +   struct sk_buff *skb,
> +   int *packets)
> +{
> + int mapping = skb_get_queue_mapping(skb);
> + struct sk_buff *nskb;
> + int cnt = 0;
> +
> + do {
> + nskb = q->dequeue(q);
> + if (!nskb)
> + break;
> + if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
> + q->skb_bad_txq = nskb;
> + qdisc_qstats_backlog_inc(q, nskb);
> + q->q.qlen++;
> + break;
> + }
> + skb->next = nskb;
> + skb = nskb;
> + } while (++cnt < 8);
> + (*packets) += cnt;
> + skb->next = NULL;
> +}
> +
>  /* Note that dequeue_skb can possibly return a SKB list (via skb->next).
>   * A requeued skb (via q->gso_skb) can also be a SKB list.
>   */
> @@ -87,8 +115,9 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool 
> *validate,
>   const struct netdev_queue *txq = q->dev_queue;
>  
>   *packets = 1;
> - *validate = true;
>   if (unlikely(skb)) {
> + /* skb in gso_skb were already validated */
> + *validate = false;
>   /* check the reason of requeuing without tx lock first */
>   txq = skb_get_tx_queue(txq->dev, skb);
>   if (!netif_xmit_frozen_or_stopped(txq)) {
> @@ -97,15 +126,30 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool 
> *validate,
>   q->q.qlen--;
>   } else
>   skb = NULL;
> - /* skb in gso_skb were already validated */
> - *validate = false;
> - } else {
> - if (!(q->flags & TCQ_F_ONETXQUEUE) ||
> - !netif_xmit_frozen_or_stopped(txq)) {
> - skb = q->dequeue(q);
> - if (skb && qdisc_may_bulk(q))
> - try_bulk_dequeue_skb(q, skb, txq, packets);
> + return skb;
> + }
> + *validate = true;
> + skb = q->skb_bad_txq;
> + if (unlikely(skb)) {
> + /* check the reason of requeuing without tx lock first */
> + txq = skb_get_tx_queue(txq->dev, skb);
> + if (!netif_xmit_frozen_or_stopped(txq)) {
> + q->skb_bad_txq = NULL;
> +

Re: [PATCH v2] net/mlx5: use mlx5_buf_alloc_node instead of mlx5_buf_alloc in mlx5_wq_ll_create

2016-06-23 Thread Saeed Mahameed
On Thu, Jun 23, 2016 at 4:04 AM, Wang Sheng-Hui  wrote:
> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
> reader NUMA node")
>

Hi Wang,

I am sorry for the nitpicking, but the commit message needs to be improved.

I prefer putting the "Fixes" line only after the bug description
(commit message) just before the Signed-of-by line. you can find some
examples in the commit log history.

>
> Change since V1:
> * Add Fixes line in commit log
>

This should come only after the "---" below so it won't appear in the
commit log.

http://kernelnewbies.org/FirstKernelPatch#head-5c81b3c517a1d0bbc24f92594cb734e155fcbbcb

> Signed-off-by: Wang Sheng-Hui 
> ---

["Changes since" goes here]

>  drivers/net/ethernet/mellanox/mlx5/core/wq.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>

Thanks,
Saeed


Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-23 Thread David Miller
From: Richard Weinberger 
Date: Thu, 23 Jun 2016 00:15:04 +0200

> On Thu, Jun 16, 2016 at 7:51 PM, Tom Herbert  wrote:
>> Transports over UDP is intended to encapsulate TCP and other transport
>> protocols directly and securely in UDP.
>>
>> The goal of this work is twofold:
>>
>> 1) Allow applications to run their own transport layer stack (i.e.from
>>userspace). This eliminates dependencies on the OS (e.g. solves a
>>major dependency issue for Facebook on clients).
> 
> Facebook on clients would be a Facebook app on mobile devices?
> Does that mean that the Facebook app is so advanced and complicated
> that it needs a special TCP stack?!

No, the TCP stack in the android/iOS/Windows kernel is so out of date
that in order to get even moderately recent TCP features it is
necessary to do this.

That's the point.



Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-23 Thread Richard Weinberger
Am 23.06.2016 um 09:40 schrieb David Miller:
> From: Richard Weinberger 
> Date: Thu, 23 Jun 2016 00:15:04 +0200
> 
>> On Thu, Jun 16, 2016 at 7:51 PM, Tom Herbert  wrote:
>>> Transports over UDP is intended to encapsulate TCP and other transport
>>> protocols directly and securely in UDP.
>>>
>>> The goal of this work is twofold:
>>>
>>> 1) Allow applications to run their own transport layer stack (i.e.from
>>>userspace). This eliminates dependencies on the OS (e.g. solves a
>>>major dependency issue for Facebook on clients).
>>
>> Facebook on clients would be a Facebook app on mobile devices?
>> Does that mean that the Facebook app is so advanced and complicated
>> that it needs a special TCP stack?!
> 
> No, the TCP stack in the android/iOS/Windows kernel is so out of date
> that in order to get even moderately recent TCP features it is
> necessary to do this.

I see.
So the plan is bringing TOU into almost every kernel out there
and then ship Apps with their own TCP stacks since vendors are unable
to deliver decent updates.

I didn't realize that the situation is *that* worse. :(

Thanks,
//richard


Re: [PATCH V4 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-23 Thread Giuseppe CAVALLARO

On 6/23/2016 3:38 AM, Tien Hock Loh wrote:

Hi Peppe,

On Wed, 2016-06-22 at 11:00 +0200, Giuseppe CAVALLARO wrote:

Hello Tien Hock

On 6/21/2016 10:46 AM, th...@altera.com wrote:

From: Tien Hock Loh 

This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
the dwmac is set to sgmii

Signed-off-by: Tien Hock Loh 


IIUC, you are keeping the two timers w/o looking.

Is there any motivation behind? I had understood you wanted
to review it.


I've merged them into one timer, aneg_link_timer and one timer callback
(that invokes individually the auto_nego_timer_callback and
pcs_link_timer_callback) in the patch. Is that not what you were
expecting?


sorry, it is ok, you added the aneg_link_timer_callback
thx for the changes.

Acked-by: Giuseppe Cavallaro 



Let me know

Regards
Peppe



---
v2:
- Refactored the TSE PCS out from the dwmac-socfpga.c file
- Added binding documentation for TSE PCS sgmii adapter
v3:
- Added missing license header for new source files
- Updated tse_pcs.h include headers
- Standardize if statements
v4:
- Reset SGMII adapter on speed change
- Do not enable SGMII adapter if speed is not supported
- On init, if PCS reset fails, do not enable adapter
123
---
 .../devicetree/bindings/net/socfpga-dwmac.txt  |  19 ++
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c | 276 +
 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h |  36 +++
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c| 149 +--
 5 files changed, 460 insertions(+), 22 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.h

diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt 
b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
index 72d82d6..dd10f2f 100644
--- a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
@@ -17,9 +17,26 @@ Required properties:
 Optional properties:
 altr,emac-splitter: Should be the phandle to the emac splitter soft IP node if
DWMAC controller is connected emac splitter.
+phy-mode: The phy mode the ethernet operates in
+altr,sgmii_to_sgmii_converter: phandle to the TSE SGMII converter
+
+This device node has additional phandle dependency, the sgmii converter:
+
+Required properties:
+ - compatible  : Should be altr,gmii-to-sgmii-2.0
+ - reg-names   : Should be "eth_tse_control_port"

 Example:

+gmii_to_sgmii_converter: phy@0x10240 {
+   compatible = "altr,gmii-to-sgmii-2.0";
+   reg = <0x0001 0x0240 0x0008>,
+   <0x0001 0x0200 0x0040>;
+   reg-names = "eth_tse_control_port";
+   clocks = <&sgmii_1_clk_0 &emac1 1 &sgmii_clk_125 &sgmii_clk_125>;
+   clock-names = "tse_pcs_ref_clk_clock_connection", "tse_rx_cdr_refclk";
+};
+
 gmac0: ethernet@ff70 {
compatible = "altr,socfpga-stmmac", "snps,dwmac-3.70a", "snps,dwmac";
altr,sysmgr-syscon = <&sysmgr 0x60 0>;
@@ -30,4 +47,6 @@ gmac0: ethernet@ff70 {
mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
clocks = <&emac_0_clk>;
clock-names = "stmmaceth";
+   phy-mode = "sgmii";
+   altr,gmii-to-sgmii-converter = <&gmii_to_sgmii_converter>;
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 0fb362d..0ff76e8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -11,7 +11,7 @@ obj-$(CONFIG_DWMAC_IPQ806X)   += dwmac-ipq806x.o
 obj-$(CONFIG_DWMAC_LPC18XX)+= dwmac-lpc18xx.o
 obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o
 obj-$(CONFIG_DWMAC_ROCKCHIP)   += dwmac-rk.o
-obj-$(CONFIG_DWMAC_SOCFPGA)+= dwmac-socfpga.o
+obj-$(CONFIG_DWMAC_SOCFPGA)+= dwmac-socfpga.o altr_tse_pcs.o
 obj-$(CONFIG_DWMAC_STI)+= dwmac-sti.o
 obj-$(CONFIG_DWMAC_SUNXI)  += dwmac-sunxi.o
 obj-$(CONFIG_DWMAC_GENERIC)+= dwmac-generic.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c 
b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c
new file mode 100644
index 000..40bfaac
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c
@@ -0,0 +1,276 @@
+/* Copyright Altera Corporation (C) 2016. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see 

Re: [PATCH] mellanox: mlx5: Use logging functions to reduce text ~10k/5%

2016-06-23 Thread Saeed Mahameed
On Wed, Jun 22, 2016 at 9:23 PM, Joe Perches  wrote:
[...]
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1557,3 +1557,37 @@ static void __exit cleanup(void)
>
>  module_init(init);
>  module_exit(cleanup);
> +
> +void mlx5_core_err(struct mlx5_core_dev *dev, const char *fmt, ...)
> +{
> +   struct va_format vaf;
> +   va_list args;
> +
> +   va_start(args, fmt);
> +
> +   vaf.fmt = fmt;
> +   vaf.va = &args;
> +
> +   dev_err(&dev->pdev->dev, "%s:%pS:(pid %d): %pV",
> +   dev->priv.name, __builtin_return_address(0), current->pid,
> +   &vaf);
> +
> +   va_end(args);
> +}
> +
> +void mlx5_core_warn(struct mlx5_core_dev *dev, const char *fmt, ...)
> +{
> +   struct va_format vaf;
> +   va_list args;
> +
> +   va_start(args, fmt);
> +
> +   vaf.fmt = fmt;
> +   vaf.va = &args;
> +
> +   dev_warn(&dev->pdev->dev, "%s:%pS:(pid %d): %pV",
> +dev->priv.name, __builtin_return_address(0), current->pid,
> +&vaf);
> +
> +   va_end(args);
> +}

Hi Joe,

I like to keep the file organized in a bottom-up fashion.  Those
functions need to appear as early as possible in the file, just move
them up to appear after the MACROs defines and static fields
declarations.


Re: [PATCH] mellanox: mlx5: Use logging functions to reduce text ~10k/5%

2016-06-23 Thread Saeed Mahameed
On Thu, Jun 23, 2016 at 8:27 AM, Leon Romanovsky  wrote:
> On Wed, Jun 22, 2016 at 11:23:59AM -0700, Joe Perches wrote:
>> The logging macros create a bit of duplicated code/text.
>>
>> Use specialized functions to reduce the duplication.
>>
>> (defconfig/x86-64)
>> $ size drivers/net/ethernet/mellanox/mlx5/core/built-in.o*
>>text  data bss dec hex filename
>>  178634  2059  16  180709   2c1e5 
>> drivers/net/ethernet/mellanox/mlx5/core/built-in.o.new
>>  188679  2059  16  190754   2e922 
>> drivers/net/ethernet/mellanox/mlx5/core/built-in.o.old
>>
>> The output changes now do not include line #,
>> but do include the function offset.
>>
>> Signed-off-by: Joe Perches 
>
> As far as I see all these functions are used in error paths, so no
> implication on performance is expected.
>
> And I'm fine with function offsets.
>
> Saeed,
> What do you think?

Fine with me, need to fix my comment on functions placement, an your
comment on checkpatch.


Re: r8169 regression: UDP packets dropped intermittantly

2016-06-23 Thread Jonathan Woithe
On Thu, Jun 23, 2016 at 01:22:50AM +0200, Francois Romieu wrote:
> Jonathan Woithe  :
> [...]
> > to mainline (in which case I'll keep watching out for it)?  Or is the
> > out-of-tree workaround mentioned above considered to be the long term
> > fix for those who encounter the problem?
> 
> It's a workaround. Nothing less, nothing more.

I see.  Should I assume therefore that a permanent fix *might* get into the
kernel someday, but that there is no timeframe and no guarantee?

I should mention that while I currently have access to hardware to test for
the problem, this may not be the case in a few months time since the
hardware (the computer and network devices) is slated for deployment.

Thank you for helping me to understand what the situation is regarding
intentions for mainline.

> IIRC the ga311 irq signaling was a bit special. I almost surely broke
> it at some point.

While we have seen the current issue on GA311 cards (and this is what is
currently in our test setup), it's also been observed with other r8169-based
cards (a D-Link from memory).

Regards
  jonathan


RE: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support

2016-06-23 Thread Yuval Mintz
> > My systems are presently in the midst of an install but I should be
> > able to demonstrate it in the morning (US Pacific time, modulo the
> > shuttle service of a car repair place)
> 
> stack@np-cp1-comp0002-mgmt:~$ ./netperf -H np-cp1-comp0001-guest -- -G
> 1400 -P 12867 -O throughput,transport_mss MIGRATED TCP STREAM TEST from
> 0.0.0.0 (0.0.0.0) port 12867 AF_INET to np-cp1-comp0001-guest () port 12867
> AF_INET : demo Throughput Transport
> MSS
> bytes
> 
> 3372.821388
...
> root@np-cp1-comp0001-mgmt:/home/stack# tcpdump -n -r foo.pcap | fgrep -v
> "length 0" | awk '{sum += $NF}END{print "Average:",sum/NR}'
> reading from file foo.pcap, link-type EN10MB (Ethernet)
> Average: 2741.93

Yes, it's a known FW limitation - if MSS is smaller than configured MTU,
aggregation will be closed on 2nd packet; It might get changed in some future
FW version. But I agree it reinforces the need of having some kind of
user-knob for controlling this offloaded feature.



[PATCH 3/3] can: kvaser_usb: Add support for more Kvaser Leaf v2 devices

2016-06-23 Thread Marc Kleine-Budde
From: Jimmy Assarsson 

This patch adds support for Kvaser Leaf Light HS v2 OEM, Mini PCI
Express 2xHS and USBcan Light 2xHS.

Signed-off-by: Jimmy Assarsson 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/usb/Kconfig  | 2 ++
 drivers/net/can/usb/kvaser_usb.c | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/usb/Kconfig b/drivers/net/can/usb/Kconfig
index 2ff0df32b3d1..8483a40e7e9e 100644
--- a/drivers/net/can/usb/Kconfig
+++ b/drivers/net/can/usb/Kconfig
@@ -47,6 +47,8 @@ config CAN_KVASER_USB
- Kvaser USBcan R
- Kvaser Leaf Light v2
- Kvaser Mini PCI Express HS
+   - Kvaser Mini PCI Express 2xHS
+   - Kvaser USBcan Light 2xHS
- Kvaser USBcan II HS/HS
- Kvaser USBcan II HS/LS
- Kvaser USBcan Rugged ("USBcan Rev B")
diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
index 022bfa13ebfa..6f1f3b675ff5 100644
--- a/drivers/net/can/usb/kvaser_usb.c
+++ b/drivers/net/can/usb/kvaser_usb.c
@@ -59,11 +59,14 @@
 #define USB_CAN_R_PRODUCT_ID   39
 #define USB_LEAF_LITE_V2_PRODUCT_ID288
 #define USB_MINI_PCIE_HS_PRODUCT_ID289
+#define USB_LEAF_LIGHT_HS_V2_OEM_PRODUCT_ID 290
+#define USB_USBCAN_LIGHT_2HS_PRODUCT_ID291
+#define USB_MINI_PCIE_2HS_PRODUCT_ID   292
 
 static inline bool kvaser_is_leaf(const struct usb_device_id *id)
 {
return id->idProduct >= USB_LEAF_DEVEL_PRODUCT_ID &&
-  id->idProduct <= USB_MINI_PCIE_HS_PRODUCT_ID;
+  id->idProduct <= USB_MINI_PCIE_2HS_PRODUCT_ID;
 }
 
 /* Kvaser USBCan-II devices */
@@ -537,6 +540,9 @@ static const struct usb_device_id kvaser_usb_table[] = {
.driver_info = KVASER_HAS_TXRX_ERRORS },
{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_LITE_V2_PRODUCT_ID) },
{ USB_DEVICE(KVASER_VENDOR_ID, USB_MINI_PCIE_HS_PRODUCT_ID) },
+   { USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_LIGHT_HS_V2_OEM_PRODUCT_ID) },
+   { USB_DEVICE(KVASER_VENDOR_ID, USB_USBCAN_LIGHT_2HS_PRODUCT_ID) },
+   { USB_DEVICE(KVASER_VENDOR_ID, USB_MINI_PCIE_2HS_PRODUCT_ID) },
 
/* USBCANII family IDs */
{ USB_DEVICE(KVASER_VENDOR_ID, USB_USBCAN2_PRODUCT_ID),
-- 
2.8.1



pull-request: can 2016-06-23

2016-06-23 Thread Marc Kleine-Budde
Hello David,

this is a pull request of 3 patches for the upcoming linux-4.7 release.

The first two patches are by Oliver Hartkopp fixing oopes in the generic CAN
device netlink handling. Jimmy Assarsson's patch for the kvaser_usb driver adds
support for more devices by adding their USB product ids.

regards,
Marc

---

The following changes since commit acd43fe85b2d1dbad55ce211b8817e6d6687246f:

  Merge branch 'mlx4-fixes' (2016-06-22 16:38:17 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.7-20160623

for you to fetch changes up to 71873a9b38d1cc6c93e2962149a7bb7272a7cb66:

  can: kvaser_usb: Add support for more Kvaser Leaf v2 devices (2016-06-23 
11:16:41 +0200)


linux-can-fixes-for-4.7-20160623


Jimmy Assarsson (1):
  can: kvaser_usb: Add support for more Kvaser Leaf v2 devices

Oliver Hartkopp (2):
  can: fix handling of unmodifiable configuration options fix
  can: fix oops caused by wrong rtnl dellink usage

 drivers/net/can/dev.c| 9 +
 drivers/net/can/usb/Kconfig  | 2 ++
 drivers/net/can/usb/kvaser_usb.c | 8 +++-
 3 files changed, 18 insertions(+), 1 deletion(-)



[PATCH 1/3] can: fix handling of unmodifiable configuration options fix

2016-06-23 Thread Marc Kleine-Budde
From: Oliver Hartkopp 

With upstream commit bb208f144cf3f59 (can: fix handling of unmodifiable
configuration options) a new can_validate() function was introduced.

When invoking 'ip link set can0 type can' without any configuration data
can_validate() tries to validate the content without taking into account that
there's totally no content. This patch adds a check for missing content.

Reported-by: ajneu 
Signed-off-by: Oliver Hartkopp 
Cc: 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 910c12e2638e..348dd5001fa4 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -798,6 +798,9 @@ static int can_validate(struct nlattr *tb[], struct nlattr 
*data[])
 * - control mode with CAN_CTRLMODE_FD set
 */
 
+   if (!data)
+   return 0;
+
if (data[IFLA_CAN_CTRLMODE]) {
struct can_ctrlmode *cm = nla_data(data[IFLA_CAN_CTRLMODE]);
 
-- 
2.8.1



[PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-23 Thread Marc Kleine-Budde
From: Oliver Hartkopp 

For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be created nor
removed with the ip tool ...

This patch adds a private dellink function for the CAN device driver interface
that does just nothing.

It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.

Reported-by: ajneu 
Signed-off-by: Oliver Hartkopp 
Cc: 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 348dd5001fa4..ad535a854e5c 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -1011,6 +1011,11 @@ static int can_newlink(struct net *src_net, struct 
net_device *dev,
return -EOPNOTSUPP;
 }
 
+static void can_dellink(struct net_device *dev, struct list_head *head)
+{
+   return;
+}
+
 static struct rtnl_link_ops can_link_ops __read_mostly = {
.kind   = "can",
.maxtype= IFLA_CAN_MAX,
@@ -1019,6 +1024,7 @@ static struct rtnl_link_ops can_link_ops __read_mostly = {
.validate   = can_validate,
.newlink= can_newlink,
.changelink = can_changelink,
+   .dellink= can_dellink,
.get_size   = can_get_size,
.fill_info  = can_fill_info,
.get_xstats_size = can_get_xstats_size,
-- 
2.8.1



Re: [PATCHv2,1/7] ppc bpf/jit: Disable classic BPF JIT on ppc64le

2016-06-23 Thread Michael Ellerman
On Wed, 2016-22-06 at 16:25:01 UTC, "Naveen N. Rao" wrote:
> Classic BPF JIT was never ported completely to work on little endian
> powerpc. However, it can be enabled and will crash the system when used.
> As such, disable use of BPF JIT on ppc64le.
> 
> Reported-by: Thadeu Lima de Souza Cascardo 
> Signed-off-by: Naveen N. Rao 
> Acked-by: Thadeu Lima de Souza Cascardo 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/844e3be47693f92a108cb1fb3b

cheers


4.4 stable request for net: macb: fix default configuration for GMAC on AT91

2016-06-23 Thread Cyrille Pitchen
Hi David,

For 4.4 -stable, can you please consider commit:

commit 6bdaa5e9ed39b3b3328f35d218e8ad5a99cfc4d2
Author: Nicolas Ferre 
Date:   Thu Mar 10 16:44:32 2016 +0100

net: macb: fix default configuration for GMAC on AT91

On AT91 SoCs, the User Register (USRIO) exposes a switch to configure the
"Reduced" or "Traditional" version of the Media Independent Interface
(RMII vs. MII or RGMII vs. GMII).
As on the older EMAC version, on GMAC, this switch is set by default to the
non-reduced type of interface, so use the existing capability and extend it 
to
GMII as well. We then keep the current logic in the macb_init() function.

The capabilities of sama5d2, sama5d4 and sama5d3 GEM interface are updated 
in
the macb_config structure to be able to properly enable them with a 
traditional
interface (GMII or MII).

Reported-by: Romain HENRIET 
Signed-off-by: Nicolas Ferre 
Signed-off-by: David S. Miller 




Without this patch, the macb ethernet controller is always configured to be
connected to a RMII phy on sama5dx SoCs. Hence the network doesn't work on
boards embedding such SoCs and MII phy.


A backport to 4.4.13 is below.

Best regards,

Cyrille

From: Nicolas Ferre 
Date: Thu, 10 Mar 2016 16:44:32 +0100
Subject: [PATCH] net: macb: fix default configuration for GMAC on AT91

commit: 6bdaa5e9ed39b3b3328f35d218e8ad5a99cfc4d2

On AT91 SoCs, the User Register (USRIO) exposes a switch to configure the
"Reduced" or "Traditional" version of the Media Independent Interface
(RMII vs. MII or RGMII vs. GMII).
As on the older EMAC version, on GMAC, this switch is set by default to the
non-reduced type of interface, so use the existing capability and extend it to
GMII as well. We then keep the current logic in the macb_init() function.

The capabilities of sama5d2, sama5d4 and sama5d3 GEM interface are updated in
the macb_config structure to be able to properly enable them with a traditional
interface (GMII or MII).

Reported-by: Romain HENRIET 
Signed-off-by: Nicolas Ferre 
Signed-off-by: David S. Miller 
[cyrille.pitc...@atmel.com: backported to 4.4.y]
Signed-off-by: Cyrille Pitchen 
---
 drivers/net/ethernet/cadence/macb.c | 13 +++--
 drivers/net/ethernet/cadence/macb.h |  2 +-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 169059c92f80..8d54e7b41bbf 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2405,9 +2405,9 @@ static int macb_init(struct platform_device *pdev)
if (bp->phy_interface == PHY_INTERFACE_MODE_RGMII)
val = GEM_BIT(RGMII);
else if (bp->phy_interface == PHY_INTERFACE_MODE_RMII &&
-(bp->caps & MACB_CAPS_USRIO_DEFAULT_IS_MII))
+(bp->caps & MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII))
val = MACB_BIT(RMII);
-   else if (!(bp->caps & MACB_CAPS_USRIO_DEFAULT_IS_MII))
+   else if (!(bp->caps & MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII))
val = MACB_BIT(MII);
 
if (bp->caps & MACB_CAPS_USRIO_HAS_CLKEN)
@@ -2738,7 +2738,7 @@ static int at91ether_init(struct platform_device *pdev)
 }
 
 static const struct macb_config at91sam9260_config = {
-   .caps = MACB_CAPS_USRIO_HAS_CLKEN | MACB_CAPS_USRIO_DEFAULT_IS_MII,
+   .caps = MACB_CAPS_USRIO_HAS_CLKEN | MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
.clk_init = macb_clk_init,
.init = macb_init,
 };
@@ -2751,21 +2751,22 @@ static const struct macb_config pc302gem_config = {
 };
 
 static const struct macb_config sama5d2_config = {
-   .caps = 0,
+   .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
 };
 
 static const struct macb_config sama5d3_config = {
-   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
+   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE
+ | MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
 };
 
 static const struct macb_config sama5d4_config = {
-   .caps = 0,
+   .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
.dma_burst_length = 4,
.clk_init = macb_clk_init,
.init = macb_init,
diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index d83b0db77821..3f385ab94988 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -398,7 +398,7 @@
 /* Capability mask bits */
 #define MACB_CAPS_ISR_CLEAR_ON_WRITE   0x0001
 #define MACB_CAPS_USRIO_HAS_CLKEN  0x0002
-#define MACB_CAPS_USRIO_DEFAULT_IS_MII 0x0004
+#define MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII0x0004
 #define MACB_CAPS_NO_GIGABIT_HALF  0x0008
 #define MACB_CAPS_FIFO_MODE  

pull-request: can-next 2016-06-17

2016-06-23 Thread Marc Kleine-Budde
Hello David,

this is a pull request of 4 patches for net-next/master.

Arnd Bergmann's patch fixes a regresseion in af_can introduced in
linux-can-next-for-4.8-20160617. There are two patches by Ramesh
Shanmugasundaram, which add CAN-2.0 support to the rcar_canfd driver.
And a patch by Ed Spiridonov that adds better error diagnoses messages
to the Ed Spiridonov driver.

regards,
Marc

---

The following changes since commit b95e5928fcc76d156352570858abdea7b2628efd:

  openvswitch: Add packet len info to upcall. (2016-06-22 16:34:39 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.8-20160623

for you to fetch changes up to b63f69d0fc1fa1e25842a2266633862d523c380f:

  can: mcp251x: add message about sucessful/unsuccessful probe (2016-06-23 
11:23:49 +0200)


linux-can-next-for-4.8-20160623


Arnd Bergmann (1):
  can: only call can_stat_update with procfs

Ed Spiridonov (1):
  can: mcp251x: add message about sucessful/unsuccessful probe

Ramesh Shanmugasundaram (2):
  can: rcar_canfd: Add Classical CAN only mode support
  can: rcar_canfd: Add back-to-error-active support

 .../devicetree/bindings/net/can/rcar_canfd.txt |  21 +-
 drivers/net/can/rcar/rcar_canfd.c  | 429 ++---
 drivers/net/can/spi/mcp251x.c  |   7 +-
 net/can/af_can.c   |  22 +-
 net/can/af_can.h   |  11 -
 5 files changed, 328 insertions(+), 162 deletions(-)

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Daniel Borkmann

Hi Martin,

[ sorry to jump late in here, on pto currently ]

On 06/22/2016 11:17 PM, Martin KaFai Lau wrote:

Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 


Could you describe a bit more with regards to pinning maps and how this
should interact with cgroups? The two specialized array maps we have (tail
calls, perf events) have fairly complicated semantics for when to clean up
map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4).

How is this managed with cgroups? Once a cgroup fd is placed into a map and
the user removes the cgroup, will this be prevented due to 'being busy', or
will the cgroup live further as long as a program is running with a cgroup
map entry (but the cgroup itself is not visible from user space in any way
anymore)?

I presume it's a valid use case to pin a cgroup map, put fds into it and
remove the pinned file expecting to continue to match on it, right? So
lifetime is really until last prog using a cgroup map somewhere gets removed
(even if not accessible from user space anymore, meaning no prog has fd and
pinned file was removed).

I assume that using struct file here doesn't make sense (commit e03e7ee34fdd1c3)
either, right?

[...]

+#ifdef CONFIG_CGROUPS
+static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
+struct file *map_file /* not used */,
+int fd)
+{
+   return cgroup_get_from_fd(fd);
+}
+
+static void cgroup_fd_array_put_ptr(void *ptr)
+{
+   /* cgroup_put free cgrp after a rcu grace period */
+   cgroup_put(ptr);


Yeah, as long as this respects freeing after RCU grace period, it's fine
like this ...


+}
+
+static void cgroup_fd_array_free(struct bpf_map *map)
+{
+   bpf_fd_array_map_clear(map);
+   fd_array_map_free(map);
+}
+
+static const struct bpf_map_ops cgroup_array_ops = {
+   .map_alloc = fd_array_map_alloc,
+   .map_free = cgroup_fd_array_free,
+   .map_get_next_key = array_map_get_next_key,
+   .map_lookup_elem = fd_array_map_lookup_elem,
+   .map_delete_elem = fd_array_map_delete_elem,
+   .map_fd_get_ptr = cgroup_fd_array_get_ptr,
+   .map_fd_put_ptr = cgroup_fd_array_put_ptr,
+};
+
+static struct bpf_map_type_list cgroup_array_type __read_mostly = {
+   .ops = &cgroup_array_ops,
+   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
+};
+
+static int __init register_cgroup_array_map(void)
+{
+   bpf_register_map_type(&cgroup_array_type);
+   return 0;
+}
+late_initcall(register_cgroup_array_map);
+#endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c23a4e93..cac13f1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr)
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);





[PATCH] Bridge: Fix ipv6 mc snooping if it has no ipv6 address.

2016-06-23 Thread daniel
The bridge is falsly dropping ipv6 mulitcast packets
if there is no ipv6 address assigned on the brigde and no
external mld querier is present.

When the bridge fails to build mld queries, because it has no
ipv6 address, it silently returns, but keeps the local querier enabled.
(br_multicast.c:832)

Ipv6 multicast snooping can only work if:
 a) an external querier is present
 b) the bridge has an ipv6 address an is capable of sending own queries

Otherwise it has to forward/flood the ipv6 multicast traffic,
because snooping cannot work.

This patch fixes the issue by adding a flag to the bridge struct that
indicates that there is currently no ipv6 address assinged to the bridge
and returns a false state for the local querier in
__br_multicast_querier_exists().

Special thanks to Linus Lüssing.

Signed-off-by: Daniel Danzberger 
---
 net/bridge/br_multicast.c |  4 
 net/bridge/br_private.h   | 23 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 6852f3c..4384414 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -464,8 +464,11 @@ static struct sk_buff
*br_ip6_multicast_alloc_query(struct net_bridge *br,
if (ipv6_dev_get_saddr(dev_net(br->dev), br->dev, &ip6h->daddr, 0,
   &ip6h->saddr)) {
kfree_skb(skb);
+   br->has_ipv6_addr = 0;
return NULL;
}
+
+   br->has_ipv6_addr = 1;
ipv6_eth_mc_map(&ip6h->daddr, eth->h_dest);

hopopt = (u8 *)(ip6h + 1);
@@ -1745,6 +1748,7 @@ void br_multicast_init(struct net_bridge *br)
br->ip6_other_query.delay_time = 0;
br->ip6_querier.port = NULL;
 #endif
+   br->has_ipv6_addr = 1;

spin_lock_init(&br->multicast_lock);
setup_timer(&br->multicast_router_timer,
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index c7fb5d7..52edecf 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -314,6 +314,7 @@ struct net_bridge
u8  multicast_disabled:1;
u8  multicast_querier:1;
u8  multicast_query_use_ifaddr:1;
+   u8  has_ipv6_addr:1;

u32 hash_elasticity;
u32 hash_max;
@@ -588,10 +589,22 @@ static inline bool br_multicast_is_router(struct
net_bridge *br)

 static inline bool
 __br_multicast_querier_exists(struct net_bridge *br,
- struct bridge_mcast_other_query *querier)
+   struct bridge_mcast_other_query *querier,
+   const bool is_ipv6)
 {
+   bool own_querier_enabled;
+
+   if (br->multicast_querier) {
+   if (is_ipv6 && !br->has_ipv6_addr)
+   own_querier_enabled = false;
+   else
+   own_querier_enabled = true;
+   } else {
+   own_querier_enabled = false;
+   }
+
return time_is_before_jiffies(querier->delay_time) &&
-  (br->multicast_querier || timer_pending(&querier->timer));
+  (own_querier_enabled || timer_pending(&querier->timer));
 }

 static inline bool br_multicast_querier_exists(struct net_bridge *br,
@@ -599,10 +612,12 @@ static inline bool
br_multicast_querier_exists(struct net_bridge *br,
 {
switch (eth->h_proto) {
case (htons(ETH_P_IP)):
-   return __br_multicast_querier_exists(br, &br->ip4_other_query);
+   return __br_multicast_querier_exists(br,
+   &br->ip4_other_query, false);
 #if IS_ENABLED(CONFIG_IPV6)
case (htons(ETH_P_IPV6)):
-   return __br_multicast_querier_exists(br, &br->ip6_other_query);
+   return __br_multicast_querier_exists(br,
+   &br->ip6_other_query, true);
 #endif
default:
return false;
-- 
2.1.4



Re: [PATCH net-next v2 3/4] cgroup: bpf: Add bpf_skb_in_cgroup_proto

2016-06-23 Thread Daniel Borkmann

On 06/22/2016 11:17 PM, Martin KaFai Lau wrote:

Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2.  It is similar to the
feature added in netfilter:
commit c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.

Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
  include/uapi/linux/bpf.h | 12 
  kernel/bpf/verifier.c|  8 
  net/core/filter.c| 40 
  3 files changed, 60 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ef4e386..bad309f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -314,6 +314,18 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_get_tunnel_opt,
BPF_FUNC_skb_set_tunnel_opt,
+
+   /**
+* bpf_skb_in_cgroup(skb, map, index) - Check cgroup2 membership of skb
+* @skb: pointer to skb
+* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+* @index: index of the cgroup in the bpf_map
+* Return:
+*   == 0 skb failed the cgroup2 descendant test
+*   == 1 skb succeeded the cgroup2 descendant test
+*< 0 error
+*/
+   BPF_FUNC_skb_in_cgroup,
__BPF_FUNC_MAX_ID,
  };

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 668e079..68753e0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1062,6 +1062,10 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (func_id != BPF_FUNC_get_stackid)
goto error;
break;
+   case BPF_MAP_TYPE_CGROUP_ARRAY:
+   if (func_id != BPF_FUNC_skb_in_cgroup)
+   goto error;
+   break;


I think the BPF_MAP_TYPE_CGROUP_ARRAY case should have been fist here in
patch 2/4, but with unconditional goto error. And this one only adds the
'func_id != BPF_FUNC_skb_in_cgroup' test.


default:
break;
}
@@ -1081,6 +1085,10 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
goto error;
break;
+   case BPF_FUNC_skb_in_cgroup:
+   if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY)
+   goto error;
+   break;
default:
break;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index df6860c..a16f7d2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2024,6 +2024,42 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
}
  }

+#ifdef CONFIG_CGROUPS
+static u64 bpf_skb_in_cgroup(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct sk_buff *skb = (struct sk_buff *)(long)r1;
+   struct bpf_map *map = (struct bpf_map *)(long)r2;
+   struct bpf_array *array = container_of(map, struct bpf_array, map);
+   struct cgroup *cgrp;
+   struct sock *sk;
+   u32 i = (u32)r3;
+
+   WARN_ON_ONCE(!rcu_read_lock_held());


I think the WARN_ON_ONCE() test can be removed all-together. There are many
other functions without it. We really rely on RCU read-lock being held for
BPF programs (otherwise it would be horribly broken). F.e. it's kinda silly
that for some map update/lookups we even have this WARN_ON_ONCE() test twice
we go through in the fast-path (once from the generic eBPF helper function
and then once again from the actual implementation since it could also be
called from syscall). The actual invocation points are not that many and we
can make sure that related call sites hold RCU read lock.

Rest looks good to me, thanks.


+   sk = skb->sk;
+   if (!sk || !sk_fullsock(sk))
+   return -ENOENT;
+
+   if (unlikely(i >= array->map.max_entries))
+   return -E2BIG;
+
+   cgrp = READ_ONCE(array->ptrs[i]);
+   if (unlikely(!cgrp))
+   return -ENOENT;
+
+   return cgroup_is_descendant(sock_cgroup_ptr(&sk->sk_cgrp_data), cgrp);
+}
+
+static const struct bpf_func_proto bpf_skb_in_cgroup_proto = {
+   .func   = bpf_skb_in_cgroup,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_CONST_MAP_PTR,
+   .arg3_type  = ARG_ANYTHING,
+};
+#endif
+
  static const struct bpf_func_proto *
  sk_filter_func_proto(enum bpf_func_id func_id)
  {
@@ -2086,6 +2122,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return &bpf_get_route_realm_proto;
case BPF_FUNC_perf_event_output:
retu

Re: [PATCH net-next v2 4/4] cgroup: bpf: Add an example to do cgroup checking in BPF

2016-06-23 Thread Daniel Borkmann

On 06/22/2016 11:17 PM, Martin KaFai Lau wrote:

test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file.  The pinned file can be loaded by tc and then used
by the bpf prog later.  This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.

test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc.  It is to demonstrate
the usage of bpf_skb_in_cgroup.

test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together.  The idea is like:
1. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
with a cgroup fd
2. Load the test_cgrp2_tc_kern.o by tc
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
dropped because of a match on the cgroup

Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 


Btw, when no bpf fs is mounted, tc will already auto-mount it. I noticed in
your script, you do mount the fs manually. I guess it's okay to leave it like
this, but I hope users won't wrongly copy it assuming they /have/ to mount it
themselves.


[REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")

2016-06-23 Thread Arjun V.
Hi all, 

The following patch introduced a regression in Chelsio cxgb4 driver, causing 
port failure when running heavy TSO traffic:

commit 10d3be569243def8d92ac3722395ef5a59c504e6
Author: Eric Dumazet 
Date:   Thu Apr 21 10:55:23 2016 -0700

tcp-tso: do not split TSO packets at retransmit time

Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

This was fine back in the days when TSO/GSO were emerging, with their
bugs, but we believe the dark age is over.

Keeping big packets in write queues, but also in stack traversal
has a lot of benefits.
 - Less memory overhead, because write queues have less skbs
 - Less cpu overhead at ACK processing.
 - Better SACK processing, as lot of studies mentioned how
   awful linux was at this ;)
 - Less cpu overhead to send the rtx packets
   (IP stack traversal, netfilter traversal, drivers...)
 - Better latencies in presence of losses.
 - Smaller spikes in fq like packet schedulers, as retransmits
   are not constrained by TCP Small Queues.

1 % packet losses are common today, and at 100Gbit speeds, this
translates to ~80,000 losses per second.
Losses are often correlated, and we see many retransmit events
leading to 1-MSS train of packets, at the time hosts are already
under stress.

Signed-off-by: Eric Dumazet 
Acked-by: Yuchung Cheng 
Signed-off-by: David S. Miller da...@davemloft.net

When the number of TCP retransmissions are quite high, the packet length coming 
from stack does not seems to be proper, due to which our TSO module gets stuck. 
If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running fine. 
Please let us know if we are missing something.

Thanks,
Arjun.



Re: [PATCH v2] netfilter: fix possible ZERO_SIZE_PTR pointer dereferencing error.

2016-06-23 Thread Pablo Neira Ayuso
On Thu, Jun 02, 2016 at 10:59:56AM +0800, Xiubo Li wrote:
> Since we cannot make sure that the 'hook_mask' will always be none
> zero here. If it equals to zero, the num_hooks will be zero too,
> and then kmalloc() will return ZERO_SIZE_PTR, which is (void *)16.
> 
> Then the following error check will fails:
>   ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
>   if (ops == NULL)
>   return ERR_PTR(-ENOMEM);
> 
> So this patch will fix this with just doing the zero check before
> kmalloc() is called.
> 
> Maybe the case above will never happen here, but in theory.

Applied, thanks.


Re: [PATCH iptables 3/3] libxt_hashlimit: iptables-restore does not work as expected with xt_hashlimit

2016-06-23 Thread Pablo Neira Ayuso
On Wed, Jun 01, 2016 at 08:17:59PM -0400, Vishwanath Pai wrote:
> libxt_hashlimit: iptables-restore does not work as expected with xt_hashlimit
> 
> Add the following iptables rule.
> 
> $ iptables -A INPUT -m hashlimit --hashlimit-above 200/sec \
>   --hashlimit-burst 5 --hashlimit-mode srcip --hashlimit-name hashlimit1 \
>   --hashlimit-htable-expire 3 -j DROP
> 
> $ iptables-save > save.txt
> 
> Edit save.txt and change the value of --hashlimit-above to 300:
> 
> -A INPUT -m hashlimit --hashlimit-above 300/sec --hashlimit-burst 5 \
> --hashlimit-mode srcip --hashlimit-name hashlimit2 \
> --hashlimit-htable-expire 3 -j DROP
> 
> Now restore save.txt
> 
> $ iptables-restore < save.txt

In this case, we don't end up with two rules, we actually get one
single hashlimit rule, given the sequence you provide.

$ iptables-save > save.txt
... edit save.txt
$ iptables-restore < save.txt

> Now userspace thinks that the value of --hashlimit-above is 300 but it is
> actually 200 in the kernel. This happens because when we add multiple
> hash-limit rules with the same name they will share the same hashtable
> internally. The kernel module tries to re-use the old hashtable without
> updating the values.
> 
> There are multiple problems here:
> 1) We can add two iptables rules with the same name, but kernel does not
>handle this well, one procfs file cannot work with two rules
> 2) If the second rule has no effect because the hashtable has values from
>rule 1
> 3) hashtable-restore does not work (as described above)
> 
> To fix this I have made the following design change:
> 1) If a second rule is added with the same name as an existing rule,
>append a number when we create the procfs, for example hashlimit_1,
>hashlimit_2 etc
> 2) Two rules will not share the same hashtable unless they are similar in
>every possible way
> 3) This behavior has to be forced with a new userspace flag:
>--hashlimit-ehanced-procfs, if this flag is not passed we default to
>the old behavior. This is to make sure we do not break existing scripts
>that rely on the existing behavior.

We discussed this in netdev0.1, and I think we agreed on adding a new
option, something like --hashlimit-update that would force an update
to the existing hashlimit internal state (that is identified by the
hashlimit name).

I think the problem here is that you may want to update the internal
state of an existing hashlimit object, and currently this is not
actually happening.

With the explicit --hashlimit-update flag, from the kernel we really
know that the user wants an update.

Let me know, thanks.


Re: [alsa-devel] [very-RFC 0/8] TSN driver for the kernel

2016-06-23 Thread Henrik Austad
On Tue, Jun 21, 2016 at 10:45:18AM -0700, Pierre-Louis Bossart wrote:
> On 6/20/16 5:18 AM, Richard Cochran wrote:
> >On Mon, Jun 20, 2016 at 01:08:27PM +0200, Pierre-Louis Bossart wrote:
> >>The ALSA API provides support for 'audio' timestamps (playback/capture rate
> >>defined by audio subsystem) and 'system' timestamps (typically linked to
> >>TSC/ART) with one option to take synchronized timestamps should the hardware
> >>support them.
> >
> >Thanks for the info.  I just skimmed 
> >Documentation/sound/alsa/timestamping.txt.
> >
> >That is fairly new, only since v4.1.  Are then any apps in the wild
> >that I can look at?  AFAICT, OpenAVB, gstreamer, etc, don't use the
> >new API.
> 
> The ALSA API supports a generic .get_time_info callback, its implementation
> is for now limited to a regular 'DMA' or 'link' timestamp for HDaudio - the
> difference being which counters are used and how close they are to the link
> serializer. The synchronized part is still WIP but should come 'soon'

Interesting, would you mind CCing me in on those patches?

> >>The intent was that the 'audio' timestamps are translated to a shared time
> >>reference managed in userspace by gPTP, which in turn would define if
> >>(adaptive) audio sample rate conversion is needed. There is no support at
> >>the moment for a 'play_at' function in ALSA, only means to control a
> >>feedback loop.
> >
> >Documentation/sound/alsa/timestamping.txt says:
> >
> >  If supported in hardware, the absolute link time could also be used
> >  to define a precise start time (patches WIP)
> >
> >Two questions:
> >
> >1. Where are the patches?  (If some are coming, I would appreciate
> >   being on CC!)
> >
> >2. Can you mention specific HW that would support this?
> 
> You can experiment with the 'dma' and 'link' timestamps today on any
> HDaudio-based device. Like I said the synchronized part has not been
> upstreamed yet (delays + dependency on ART-to-TSC conversions that made it
> in the kernel recently)

Ok, I think I see a way to hook this into timestamps from the skbuf on 
incoming frames and a somewhat messy way on outgoing. Having time coupled 
with 'avail' and 'delay' is useful, and from the looks of it, 'link'-time 
is the appropriate level to add this.

I'm working on storing the time in the tsn_link struct I use, and then read 
that from the avb_alsa-shim. Details are still a bit fuzzy though, but I 
plan to do that and then see what audio-time gives me once it is up and 
running.

Richard: is it fair to assume that if ptp4l is running and is part of a PTP 
domain, ktime_get() will return PTP-adjusted time for the system? -Or do I 
also need to run phc2sys in order to sync the system-time to PTP-time? Note 
that this is for outgoing traffic, Rx should perhaps use the timestamp 
in skb.

Hooking into ktime_get() instead of directly to the PTP-subsystem (if that 
is even possible) makes it a lot easier to debug when running this in a VM 
as it doesn't *have* to use PTP-time when I'm crashing a new kernel :)

Thanks!

-- 
Henrik Austad


signature.asc
Description: Digital signature


Re: [PATCH iptables 1/3] libxt_hashlimit: Prepare libxt_hashlimit.c for revision 2

2016-06-23 Thread Pablo Neira Ayuso
Not specifically related to this patch.

It would be great if you can send us a patch to add new tests to
iptables/extensions/libxt_hashlimit.t for this new higher resolution
pps ratelimit.

Thanks!


Re: esp: Fix ESN generation under UDP encapsulation

2016-06-23 Thread Steffen Klassert
On Thu, Jun 23, 2016 at 04:25:21AM +, Blair Steven wrote:
> This change tests okay in my setup.
> 
> Thanks very much
> -Blair

David, can you please take this patch directly in the net tree?
This is a candidate for stable.

Acked-by: Steffen Klassert 


Re: [PATCH iproute2 net-next v4 0/5] bridge: json support for fdb and vlan show

2016-06-23 Thread Roopa Prabhu
On Wed, Jun 22, 2016 at 1:00 PM, Jiri Pirko  wrote:
> Wed, Jun 22, 2016 at 08:10:47PM CEST, step...@networkplumber.org wrote:
>>On Wed, 22 Jun 2016 16:53:44 +0200
>>Jiri Pirko  wrote:
>>
>>> Wed, Jun 22, 2016 at 03:45:50PM CEST, ro...@cumulusnetworks.com wrote:
>>> >From: Roopa Prabhu 
>>> >
>>> >This patch series adds json support for a few bridge show commands.
>>> >We plan to follow up with json support for additional commands soon.
>>>
>>> I'm just curious, what is you use case for this? Apps can use rtnetlink
>>> socket directly.
>>
>>Try using netlink in perl or python, it is quite difficult.
>
> pyroute2? Quite easy...

none of the implementations out there are complete nor can compete
with iproute2.
iproute2 is maintained by netdev community and always is up-todate
with the latest
networking api.

Nothing against pyroute2 but we wrote our own for other reasons and we
carry additional burden of maintaining it and keeping it up-todate for
every networking api..
that gets added to iproute2 (and the implementation of netlink
is often very easy in C).

Also, for external automation and orchestration tools (to whom this
patch-set is addressed),
there is no reason for them to write and maintain their own tools
using netlink when they
can use iproute2 directly to create a link or query its properties.


Re: [Patch net 1/2] act_ife: only acquire tcf_lock for existing actions

2016-06-23 Thread Jamal Hadi Salim

On 16-06-20 04:37 PM, Cong Wang wrote:

Alexey reported that we have GFP_KERNEL allocation when
holding the spinlock tcf_lock. Actually we don't have
to take that spinlock for all the cases, especially
for the new one we just create. To modify the existing
actions, we still need this spinlock to make sure
the whole update is atomic.

For net-next, we can get rid of this spinlock because
we already hold the RTNL lock on slow path, and on fast
path we can use RCU to protect the metalist.

Joint work with Jamal.

Reported-by: Alexey Khoroshilov 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Acked-by: Jamal Hadi Salim 


cheers,
jamal


Re: [Patch net 2/2] act_ife: acquire ife_mod_lock before reading ifeoplist

2016-06-23 Thread Jamal Hadi Salim

On 16-06-20 04:37 PM, Cong Wang wrote:

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Acked-by: Jamal Hadi Salim 


cheers,
jamal


[PATCH] wlcore: time sync : add support for 64 bit clock

2016-06-23 Thread Yaniv Machani
Changed the configuration to support 64bit instead of 32bit
this in order to offload the driver from handling a wraparound.

Signed-off-by: Yaniv Machani 
---
 drivers/net/wireless/ti/wl18xx/event.c | 26 +-
 drivers/net/wireless/ti/wl18xx/event.h | 19 +--
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/ti/wl18xx/event.c 
b/drivers/net/wireless/ti/wl18xx/event.c
index ef81184..2c5df43 100644
--- a/drivers/net/wireless/ti/wl18xx/event.c
+++ b/drivers/net/wireless/ti/wl18xx/event.c
@@ -112,12 +112,18 @@ static int wlcore_smart_config_decode_event(struct wl1271 
*wl,
return 0;
 }
 
-static void wlcore_event_time_sync(struct wl1271 *wl, u16 tsf_msb, u16 tsf_lsb)
+static void wlcore_event_time_sync(struct wl1271 *wl,
+  u16 tsf_high_msb, u16 tsf_high_lsb,
+  u16 tsf_low_msb, u16 tsf_low_lsb)
 {
-   u32 clock;
-   /* convert the MSB+LSB to a u32 TSF value */
-   clock = (tsf_msb << 16) | tsf_lsb;
-   wl1271_info("TIME_SYNC_EVENT_ID: clock %u", clock);
+   u32 clock_low;
+   u32 clock_high;
+
+   clock_high = (tsf_high_msb << 16) | tsf_high_lsb;
+   clock_low = (tsf_low_msb << 16) | tsf_low_lsb;
+
+   wl1271_info("TIME_SYNC_EVENT_ID: clock_high %u, clock low %u",
+   clock_high, clock_low);
 }
 
 int wl18xx_process_mailbox_events(struct wl1271 *wl)
@@ -138,8 +144,10 @@ int wl18xx_process_mailbox_events(struct wl1271 *wl)
 
if (vector & TIME_SYNC_EVENT_ID)
wlcore_event_time_sync(wl,
-   mbox->time_sync_tsf_msb,
-   mbox->time_sync_tsf_lsb);
+   mbox->time_sync_tsf_high_msb,
+   mbox->time_sync_tsf_high_lsb,
+   mbox->time_sync_tsf_low_msb,
+   mbox->time_sync_tsf_low_lsb);
 
if (vector & RADAR_DETECTED_EVENT_ID) {
wl1271_info("radar event: channel %d type %s",
@@ -187,11 +195,11 @@ int wl18xx_process_mailbox_events(struct wl1271 *wl)
 */
if (vector & MAX_TX_FAILURE_EVENT_ID)
wlcore_event_max_tx_failure(wl,
-   le32_to_cpu(mbox->tx_retry_exceeded_bitmap));
+   le16_to_cpu(mbox->tx_retry_exceeded_bitmap));
 
if (vector & INACTIVE_STA_EVENT_ID)
wlcore_event_inactive_sta(wl,
-   le32_to_cpu(mbox->inactive_sta_bitmap));
+   le16_to_cpu(mbox->inactive_sta_bitmap));
 
if (vector & REMAIN_ON_CHANNEL_COMPLETE_EVENT_ID)
wlcore_event_roc_complete(wl);
diff --git a/drivers/net/wireless/ti/wl18xx/event.h 
b/drivers/net/wireless/ti/wl18xx/event.h
index 070de12..b436bf9 100644
--- a/drivers/net/wireless/ti/wl18xx/event.h
+++ b/drivers/net/wireless/ti/wl18xx/event.h
@@ -74,10 +74,16 @@ struct wl18xx_event_mailbox {
__le16 bss_loss_bitmap;
 
/* bitmap of stations (by HLID) which exceeded max tx retries */
-   __le32 tx_retry_exceeded_bitmap;
+   __le16 tx_retry_exceeded_bitmap;
+
+   /* time sync high msb*/
+   u16 time_sync_tsf_high_msb;
 
/* bitmap of inactive stations (by HLID) */
-   __le32 inactive_sta_bitmap;
+   __le16 inactive_sta_bitmap;
+
+   /* time sync high lsb*/
+   u16 time_sync_tsf_high_lsb;
 
/* rx BA win size indicated by RX_BA_WIN_SIZE_CHANGE_EVENT_ID */
u8 rx_ba_role_id;
@@ -98,14 +104,15 @@ struct wl18xx_event_mailbox {
u8 sc_sync_channel;
u8 sc_sync_band;
 
-   /* time sync msb*/
-   u16 time_sync_tsf_msb;
+   /* time sync low msb*/
+   u16 time_sync_tsf_low_msb;
+
/* radar detect */
u8 radar_channel;
u8 radar_type;
 
-   /* time sync lsb*/
-   u16 time_sync_tsf_lsb;
+   /* time sync low lsb*/
+   u16 time_sync_tsf_low_lsb;
 
 } __packed;
 
-- 
2.9.0



Re: [PATCH 2/3] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-06-23 Thread Pablo Neira Ayuso
On Wed, Jun 01, 2016 at 08:11:38PM -0400, Vishwanath Pai wrote:
> +static void
> +cfg_copy(struct hashlimit_cfg2 *to, void *from, int revision)
> +{
> + if (revision == 1) {
> + struct hashlimit_cfg1 *cfg = (struct hashlimit_cfg1 *)from;
> +
> + to->mode = cfg->mode;
> + to->avg = cfg->avg;
> + to->burst = cfg->burst;
> + to->size = cfg->size;
> + to->max = cfg->max;
> + to->gc_interval = cfg->gc_interval;
> + to->expire = cfg->expire;
> + to->srcmask = cfg->srcmask;
> + to->dstmask = cfg->dstmask;
> + } else if (revision == 2) {
> + memcpy(to, from, sizeof(struct hashlimit_cfg2));
> + } else {
> + BUG();

BUG here is probably too much, this halts the system. I can see we
only use this somewhere else in this code. Instead, I'd suggest you
propagate an error back to userspace if this ever happen.

I would like to see if this spots any problem with our test
infrastructure under iptables/.

Thanks.


Re: [PATCH] wlcore: time sync : add support for 64 bit clock

2016-06-23 Thread Johannes Berg
On Thu, 2016-06-23 at 14:12 +0300, Yaniv Machani wrote:
> Changed the configuration to support 64bit instead of 32bit
> this in order to offload the driver from handling a wraparound.

[...]

Since you Cc'ed me, and presumably want me to review it, I'll say that
this looks like a terrible idea:

> @@ -74,10 +74,16 @@ struct wl18xx_event_mailbox {

This struct is evidently used for firmware/host communication.

>   __le16 bss_loss_bitmap;
>  
>   /* bitmap of stations (by HLID) which exceeded max tx
> retries */
> - __le32 tx_retry_exceeded_bitmap;
> + __le16 tx_retry_exceeded_bitmap;
> +
> + /* time sync high msb*/
> + u16 time_sync_tsf_high_msb;

So first of all, just using u16 instead of __le16 seems wrong.

Additionally, this looks like it changes the firmware API, so that
older firmware images will no longer work?

johannes


Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-23 Thread Roopa Prabhu
On 6/22/16, 10:40 PM, Jiri Pirko wrote:
> Wed, Jun 22, 2016 at 09:32:25PM CEST, ro...@cumulusnetworks.com wrote:
>> On Tue, Jun 21, 2016 at 8:15 AM, Jiri Pirko  wrote:
>>> From: Jiri Pirko 
>>>
>>> The problem we try to handle is about offloaded forwarded packets
>>> which are not seen by kernel. Let me try to draw it:
>>>
>>> port1   port2 (HW stats are counted here)
>>>   \  /
>>>\/
>>> \  /
>>>  --(A) ASIC --(B)--
>>> |
>>>(C)
>>> |
>>>CPU (SW stats are counted here)
>>>
>>>
>>> Now we have couple of flows for TX and RX (direction does not matter here):
>>>
>>> 1) port1->A->ASIC->C->CPU
>>>
>>>For this flow, HW and SW stats are equal.
>>>
>>> 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2
>>>
>>>For this flow, HW and SW stats are equal.
>>>
>>> 3) port1->A->ASIC->B->port2
>>>
>>>For this flow, SW stats are 0.
>>>
>>> The purpose of this patchset is to provide facility for user to
>>> find out the difference between flows 1+2 and 3. In other words, user
>>> will be able to see the statistics for the slow-path (through kernel).
>>>
>>> Also note that HW stats are what someone calls "accumulated" stats.
>>> Every packet counted by SW is also counted by HW. Not the other way around.
>>>
>>> As a default the accumulated stats (HW) will be exposed to user
>>> so the userspace apps can react properly.
>>>
>>>
>> curious, how do you plan to handle virtual device counters like vlan
>> and vxlan stats ?.
> Yes, that is another problem (1). We have to push stats up to this devices
> most probably. But that problem is orthogonal to this. To the user, you
> will still need 2 sets of stats and HW stats being default. So this
> patchset infra is going to be used as well.
hmm...But, i don't think we should start adding different tlv's hw and sw for
every stats variant we add.
>
>
>> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
>> counters do
>> not include CPU generated packetsyou will have to add CPU
>> generated pkt counters to the
>> hw counters for such virtual device stats.
> Can you please provide and example how that could happen?

example is the bridge vlan stats I mention below. These are usually counted
by attaching hw virtual counter resources. And CPU generated packets
in some cases maybe setup to bypass the ASIC pipeline because the CPU
has already made the required decisions. So, they may not be counted by
by such hw virtual counters.

>
>
>> example: In the switchdev model, for bridge vlan stats, when user
>> queries bridge vlan stats,
>> you will have to add the hw stats to the bridge driver vlan stats and
>> return it to the user .
> Yep, that is (1).

unless i misunderstood, this does not look like (1). In (1) you say hw stats
 already reflect sw stats. But in this case, hw counter does not include sw 
stats
for CPU generated packets.
>
>
>> Having a consistent model for all kinds of stats will help.



RE: [PATCH] wlcore: time sync : add support for 64 bit clock

2016-06-23 Thread Machani, Yaniv
On Thu, Jun 23, 2016 at 14:18:00, Johannes Berg wrote:
> linux-wirel...@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH] wlcore: time sync : add support for 64 bit clock
> 
> On Thu, 2016-06-23 at 14:12 +0300, Yaniv Machani wrote:
> > Changed the configuration to support 64bit instead of 32bit this in 
> > order to offload the driver from handling a wraparound.
> 
> [...]
> 
> Since you Cc'ed me, and presumably want me to review it, I'll say that 
> this looks like a terrible idea:
> 
> > @@ -74,10 +74,16 @@ struct wl18xx_event_mailbox {
> 
> This struct is evidently used for firmware/host communication.
> 
> >     __le16 bss_loss_bitmap;
> >
> >     /* bitmap of stations (by HLID) which exceeded max tx retries */
> > -   __le32 tx_retry_exceeded_bitmap;
> > +   __le16 tx_retry_exceeded_bitmap;
> > +
> > +   /* time sync high msb*/
> > +   u16 time_sync_tsf_high_msb;
> 
> So first of all, just using u16 instead of __le16 seems wrong.

Agree, should be changed.

> 
> Additionally, this looks like it changes the firmware API, so that 
> older firmware images will no longer work?

It is backwards compatible, 
although it changes a API structure, older firmware are using only u16 for the 
field so there is no impact on that.
Of course that for actually using the 64bit information, you will have to 
upgrade the firmware.

Yaniv



Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-23 Thread Jiri Pirko
Thu, Jun 23, 2016 at 01:27:35PM CEST, ro...@cumulusnetworks.com wrote:
>On 6/22/16, 10:40 PM, Jiri Pirko wrote:
>> Wed, Jun 22, 2016 at 09:32:25PM CEST, ro...@cumulusnetworks.com wrote:
>>> On Tue, Jun 21, 2016 at 8:15 AM, Jiri Pirko  wrote:
 From: Jiri Pirko 

 The problem we try to handle is about offloaded forwarded packets
 which are not seen by kernel. Let me try to draw it:

 port1   port2 (HW stats are counted here)
   \  /
\/
 \  /
  --(A) ASIC --(B)--
 |
(C)
 |
CPU (SW stats are counted here)


 Now we have couple of flows for TX and RX (direction does not matter here):

 1) port1->A->ASIC->C->CPU

For this flow, HW and SW stats are equal.

 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2

For this flow, HW and SW stats are equal.

 3) port1->A->ASIC->B->port2

For this flow, SW stats are 0.

 The purpose of this patchset is to provide facility for user to
 find out the difference between flows 1+2 and 3. In other words, user
 will be able to see the statistics for the slow-path (through kernel).

 Also note that HW stats are what someone calls "accumulated" stats.
 Every packet counted by SW is also counted by HW. Not the other way around.

 As a default the accumulated stats (HW) will be exposed to user
 so the userspace apps can react properly.


>>> curious, how do you plan to handle virtual device counters like vlan
>>> and vxlan stats ?.
>> Yes, that is another problem (1). We have to push stats up to this devices
>> most probably. But that problem is orthogonal to this. To the user, you
>> will still need 2 sets of stats and HW stats being default. So this
>> patchset infra is going to be used as well.
>hmm...But, i don't think we should start adding different tlv's hw and sw for
>every stats variant we add.
>>
>>
>>> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
>>> counters do
>>> not include CPU generated packetsyou will have to add CPU
>>> generated pkt counters to the
>>> hw counters for such virtual device stats.
>> Can you please provide and example how that could happen?
>
>example is the bridge vlan stats I mention below. These are usually counted
>by attaching hw virtual counter resources. And CPU generated packets
>in some cases maybe setup to bypass the ASIC pipeline because the CPU
>has already made the required decisions. So, they may not be counted by
>by such hw virtual counters.

Bypass ASIC? How do the packets get on the wire?


>
>>
>>
>>> example: In the switchdev model, for bridge vlan stats, when user
>>> queries bridge vlan stats,
>>> you will have to add the hw stats to the bridge driver vlan stats and
>>> return it to the user .
>> Yep, that is (1).
>
>unless i misunderstood, this does not look like (1). In (1) you say hw stats
> already reflect sw stats. But in this case, hw counter does not include sw 
> stats
>for CPU generated packets.
>>
>>
>>> Having a consistent model for all kinds of stats will help.
>


Re: [PATCH] wlcore: time sync : add support for 64 bit clock

2016-06-23 Thread Johannes Berg

> > Additionally, this looks like it changes the firmware API, so that 
> > older firmware images will no longer work?
> 
> It is backwards compatible, 
> although it changes a API structure, older firmware are using only
> u16 for the field so there is no impact on that.
> 

Oh, ok. I had also thought that the size changed, but missed that you
replaced a u32 with two u16. Thanks for checking :)

johannes


[PATCH] net: Fix resetting network_header in neigh_resolve_output and neigh_connected_output

2016-06-23 Thread Abdelrhman Ahmed
neigh_resolve_output and neigh_connected_output resets the skb to 
network_header because of the retry loop and this reset will pull down the data 
pointer to the network header in the first iteration then hardware header will 
be added, but it will overwrite any data which is inserted between network 
header and hardware header (for example by netfilter hooks) only for the first 
packet(s) before using cached hardware header as neigh_hh_output (which is 
called for using cached hardware header) does not reset to the network header.

The fix is to reset with reference to skb's data before loop instead of network 
header.

Fixes: e1f165032c8b ("net: Fix skb_under_panic oops in neigh_resolve_output")
Signed-off-by: Abdelrhman Ahmed 
---
 net/core/neighbour.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 29dd8cc..7aac242 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1293,15 +1293,19 @@ int neigh_resolve_output(struct neighbour *neigh, 
struct sk_buff *skb)
int rc = 0;
 
if (!neigh_event_send(neigh, skb)) {
-   int err;
+   int err, offset;
struct net_device *dev = neigh->dev;
+   unsigned char *data;
unsigned int seq;
 
if (dev->header_ops->cache && !neigh->hh.hh_len)
neigh_hh_init(neigh);
 
+   data = skb->data;
+
do {
-   __skb_pull(skb, skb_network_offset(skb));
+   offset = data - skb->data;
+   __skb_pull(skb, offset);
seq = read_seqbegin(&neigh->ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
  neigh->ha, NULL, skb->len);
@@ -1326,11 +1330,15 @@ EXPORT_SYMBOL(neigh_resolve_output);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb)
 {
struct net_device *dev = neigh->dev;
+   unsigned char *data;
unsigned int seq;
-   int err;
+   int err, offset;
+
+   data = skb->data;
 
do {
-   __skb_pull(skb, skb_network_offset(skb));
+   offset = data - skb->data;
+   __skb_pull(skb, offset);
seq = read_seqbegin(&neigh->ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
  neigh->ha, NULL, skb->len);
-- 
1.9.1



Re: [REGRESSION, bisect]cxgb4 port failure with TSO traffic after commit 10d3be569243def8("tcp-tso: do not split TSO packets at retransmit time")

2016-06-23 Thread Eric Dumazet
On Thu, Jun 23, 2016 at 3:08 AM, Arjun V.  wrote:
> Hi all,
>
> The following patch introduced a regression in Chelsio cxgb4 driver, causing 
> port failure when running heavy TSO traffic:
>
> commit 10d3be569243def8d92ac3722395ef5a59c504e6
> Author: Eric Dumazet 
> Date:   Thu Apr 21 10:55:23 2016 -0700
>
> tcp-tso: do not split TSO packets at retransmit time
>
> Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
>
> This was fine back in the days when TSO/GSO were emerging, with their
> bugs, but we believe the dark age is over.
>
> Keeping big packets in write queues, but also in stack traversal
> has a lot of benefits.
>  - Less memory overhead, because write queues have less skbs
>  - Less cpu overhead at ACK processing.
>  - Better SACK processing, as lot of studies mentioned how
>awful linux was at this ;)
>  - Less cpu overhead to send the rtx packets
>(IP stack traversal, netfilter traversal, drivers...)
>  - Better latencies in presence of losses.
>  - Smaller spikes in fq like packet schedulers, as retransmits
>are not constrained by TCP Small Queues.
>
> 1 % packet losses are common today, and at 100Gbit speeds, this
> translates to ~80,000 losses per second.
> Losses are often correlated, and we see many retransmit events
> leading to 1-MSS train of packets, at the time hosts are already
> under stress.
>
> Signed-off-by: Eric Dumazet 
> Acked-by: Yuchung Cheng 
> Signed-off-by: David S. Miller da...@davemloft.net
>
> When the number of TCP retransmissions are quite high, the packet length 
> coming from stack does not seems to be proper, due to which our TSO module 
> gets stuck.
> If I change segs back to 1 in __tcp_retransmit_skb(),  traffic is running 
> fine. Please let us know if we are missing something.
>
> Thanks,
> Arjun.
>

Hmm... I see nothing wrong in TCP stack.

Can you give me more details on the wrong packet length you see ?


Re: [RFC v2 3/3] vsockmon: Add vsock hooks

2016-06-23 Thread Sergei Shtylyov

Hello.

On 6/22/2016 7:11 PM, ggar...@abra.uab.cat wrote:


From: Gerard Garcia 

Signed-off-by: Gerard Garcia 
---
 drivers/vhost/vsock.c | 73 +++
 1 file changed, 73 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 17bfe4e..e8621cc 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c

[...]

@@ -45,6 +47,69 @@ struct vhost_vsock {
u32 guest_cid;
 };

+static struct sk_buff *
+virtio_vsock_pkt_to_skb(struct virtio_vsock_pkt *pkt)
+{
+   struct sk_buff *skb;
+   struct af_vsockmon_hdr *hdr;
+   void *payload;
+
+   u32 skb_len = sizeof(struct af_vsockmon_hdr) + pkt->len;
+
+   skb = alloc_skb(skb_len, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   skb_reserve(skb, sizeof(struct af_vsockmon_hdr));
+
+   if (pkt->len) {
+   payload = skb_put(skb, pkt->len);
+   memcpy(payload, pkt->buf, pkt->len);
+   }
+
+   hdr = (struct af_vsockmon_hdr *) skb_push(skb, sizeof(*hdr));
+
+   hdr->src_cid = pkt->hdr.src_cid;
+   hdr->src_port = pkt->hdr.src_port;
+   hdr->dst_cid = pkt->hdr.dst_cid;
+   hdr->dst_port = pkt->hdr.dst_port;
+   hdr->t = AF_VSOCK_T_VIRTIO;
+
+   switch(pkt->hdr.op) {
+   case VIRTIO_VSOCK_OP_REQUEST:
+   case VIRTIO_VSOCK_OP_RESPONSE:
+   hdr->op = AF_VSOCK_OP_CONNECT;
+   break;
+   case VIRTIO_VSOCK_OP_RST:
+   case VIRTIO_VSOCK_OP_SHUTDOWN:
+   hdr->op = AF_VSOCK_OP_DISCONNECT;
+   break;
+   case VIRTIO_VSOCK_OP_RW:
+   hdr->op = AF_VSOCK_OP_PAYLOAD;
+   break;
+   case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
+   case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
+   hdr->op = AF_VSOCK_OP_CONTROL;
+   break;
+   default:
+   hdr->op = AF_VSOCK_OP_UNKNOWN;
+   break;
+   }


   CodingStyle: *switch* and *case* should be at the same indentation level.

[...]

MBR, Sergei



[PATCH] net: ethernet: ti: cpdma: switch to use genalloc

2016-06-23 Thread Grygorii Strashko
TI CPDMA currently uses a bitmap for tracking descriptors alloactions
allocations, but The genalloc already handles the same and can be used
as with special memory (SRAM) as with DMA cherent memory chank
(dma_alloc_coherent()). Hence, switch to using genalloc and add
desc_num property for each channel for limitation of max number of
allowed descriptors for each CPDMA channel. This patch do not affect
on net throuput.

Cc: Ivan Khoronzhuk 
Signed-off-by: Grygorii Strashko 
---
Testing
TCP window: 256K, bandwidth in Mbits/sec:
 host: iperf -s
 device: iperf -c  172.22.39.17 -t600 -i5 -d -w128K

AM437x-idk, 1Gbps link
 before: : 341.60, after: 232+123=355
am57xx-beagle-x15, 1Gbps link
 before: : 1112.80, after: 814+321=1135
am335x-boneblack, 100Mbps link
 before: : 162.40, after: 72+93=165

 drivers/net/ethernet/ti/davinci_cpdma.c | 136 +++-
 1 file changed, 62 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 18bf3a8..03b9882 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include "davinci_cpdma.h"
 
 /* DMA Registers */
@@ -87,9 +87,8 @@ struct cpdma_desc_pool {
void*cpumap;/* dma_alloc map */
int desc_size, mem_size;
int num_desc, used_desc;
-   unsigned long   *bitmap;
struct device   *dev;
-   spinlock_t  lock;
+   struct gen_pool *gen_pool;
 };
 
 enum cpdma_state {
@@ -117,6 +116,7 @@ struct cpdma_chan {
int chan_num;
spinlock_t  lock;
int count;
+   u32 desc_num;
u32 mask;
cpdma_handler_fnhandler;
enum dma_data_direction dir;
@@ -145,6 +145,20 @@ struct cpdma_chan {
 (directed << CPDMA_TO_PORT_SHIFT));\
} while (0)
 
+static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
+{
+   if (!pool)
+   return;
+
+   WARN_ON(pool->used_desc);
+   if (pool->cpumap) {
+   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
+ pool->phys);
+   } else {
+   iounmap(pool->iomap);
+   }
+}
+
 /*
  * Utility constructs for a cpdma descriptor pool.  Some devices (e.g. davinci
  * emac) have dedicated on-chip memory for these descriptors.  Some other
@@ -155,24 +169,25 @@ static struct cpdma_desc_pool *
 cpdma_desc_pool_create(struct device *dev, u32 phys, dma_addr_t hw_addr,
int size, int align)
 {
-   int bitmap_size;
struct cpdma_desc_pool *pool;
+   int ret;
 
pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL);
if (!pool)
-   goto fail;
-
-   spin_lock_init(&pool->lock);
+   goto gen_pool_create_fail;
 
pool->dev   = dev;
pool->mem_size  = size;
pool->desc_size = ALIGN(sizeof(struct cpdma_desc), align);
pool->num_desc  = size / pool->desc_size;
 
-   bitmap_size  = (pool->num_desc / BITS_PER_LONG) * sizeof(long);
-   pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
-   if (!pool->bitmap)
-   goto fail;
+   pool->gen_pool = devm_gen_pool_create(dev, ilog2(pool->desc_size), -1,
+ "cpdma");
+   if (IS_ERR(pool->gen_pool)) {
+   dev_err(dev, "pool create failed %ld\n",
+   PTR_ERR(pool->gen_pool));
+   goto gen_pool_create_fail;
+   }
 
if (phys) {
pool->phys  = phys;
@@ -185,24 +200,22 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, 
dma_addr_t hw_addr,
pool->phys = pool->hw_addr; /* assumes no IOMMU, don't use this 
value */
}
 
-   if (pool->iomap)
-   return pool;
-fail:
-   return NULL;
-}
-
-static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
-{
-   if (!pool)
-   return;
+   if (!pool->iomap)
+   goto gen_pool_create_fail;
 
-   WARN_ON(pool->used_desc);
-   if (pool->cpumap) {
-   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
- pool->phys);
-   } else {
-   iounmap(pool->iomap);
+   ret = gen_pool_add_virt(pool->gen_pool, (unsigned long)pool->iomap,
+   pool->phys, pool->mem_size, -1);
+   if (ret < 0) {
+   dev_err(dev, "pool add failed %d\n", ret);
+   goto gen_pool_add_virt_fail;
}
+
+   return pool;
+
+gen_pool_add_virt_fail:
+   cpdma_desc_pool_destroy(poo

Re: [PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-23 Thread Sergei Shtylyov

Hello.

On 6/23/2016 12:22 PM, Marc Kleine-Budde wrote:


From: Oliver Hartkopp 

For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be created nor
removed with the ip tool ...

This patch adds a private dellink function for the CAN device driver interface
that does just nothing.

It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.

Reported-by: ajneu 
Signed-off-by: Oliver Hartkopp 
Cc: 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 348dd5001fa4..ad535a854e5c 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -1011,6 +1011,11 @@ static int can_newlink(struct net *src_net, struct 
net_device *dev,
return -EOPNOTSUPP;
 }

+static void can_dellink(struct net_device *dev, struct list_head *head)
+{
+   return;


   Why?


+}
+
 static struct rtnl_link_ops can_link_ops __read_mostly = {
.kind   = "can",
.maxtype= IFLA_CAN_MAX,

[...]

MBR, Sergei



Re: [PATCH] net: ethernet: ti: cpdma: switch to use genalloc

2016-06-23 Thread Ivan Khoronzhuk



On 23.06.16 15:36, Grygorii Strashko wrote:

TI CPDMA currently uses a bitmap for tracking descriptors alloactions
allocations, but The genalloc already handles the same and can be used
as with special memory (SRAM) as with DMA cherent memory chank
(dma_alloc_coherent()). Hence, switch to using genalloc and add
desc_num property for each channel for limitation of max number of
allowed descriptors for each CPDMA channel. This patch do not affect
on net throuput.

Cc: Ivan Khoronzhuk 
Signed-off-by: Grygorii Strashko 
---
Testing
TCP window: 256K, bandwidth in Mbits/sec:
  host: iperf -s
  device: iperf -c  172.22.39.17 -t600 -i5 -d -w128K

AM437x-idk, 1Gbps link
  before: : 341.60, after: 232+123=355
am57xx-beagle-x15, 1Gbps link
  before: : 1112.80, after: 814+321=1135
am335x-boneblack, 100Mbps link
  before: : 162.40, after: 72+93=165

  drivers/net/ethernet/ti/davinci_cpdma.c | 136 +++-
  1 file changed, 62 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 18bf3a8..03b9882 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -21,7 +21,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include "davinci_cpdma.h"

  /* DMA Registers */
@@ -87,9 +87,8 @@ struct cpdma_desc_pool {
void*cpumap;/* dma_alloc map */
int desc_size, mem_size;
int num_desc, used_desc;
-   unsigned long   *bitmap;
struct device   *dev;
-   spinlock_t  lock;
+   struct gen_pool *gen_pool;
  };

  enum cpdma_state {
@@ -117,6 +116,7 @@ struct cpdma_chan {
int chan_num;
spinlock_t  lock;
int count;
+   u32 desc_num;
u32 mask;
cpdma_handler_fnhandler;
enum dma_data_direction dir;
@@ -145,6 +145,20 @@ struct cpdma_chan {
 (directed << CPDMA_TO_PORT_SHIFT));  \
} while (0)

+static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
+{
+   if (!pool)
+   return;
+
+   WARN_ON(pool->used_desc);
+   if (pool->cpumap) {
+   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
+ pool->phys);
+   } else {
+   iounmap(pool->iomap);
+   }
+}
+

single if, brackets?


  /*
   * Utility constructs for a cpdma descriptor pool.  Some devices (e.g. davinci
   * emac) have dedicated on-chip memory for these descriptors.  Some other
@@ -155,24 +169,25 @@ static struct cpdma_desc_pool *
  cpdma_desc_pool_create(struct device *dev, u32 phys, dma_addr_t hw_addr,
int size, int align)
  {
-   int bitmap_size;
struct cpdma_desc_pool *pool;
+   int ret;

pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL);
if (!pool)
-   goto fail;
-
-   spin_lock_init(&pool->lock);
+   goto gen_pool_create_fail;

pool->dev= dev;
pool->mem_size   = size;
pool->desc_size  = ALIGN(sizeof(struct cpdma_desc), align);
pool->num_desc   = size / pool->desc_size;

-   bitmap_size  = (pool->num_desc / BITS_PER_LONG) * sizeof(long);
-   pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
-   if (!pool->bitmap)
-   goto fail;
+   pool->gen_pool = devm_gen_pool_create(dev, ilog2(pool->desc_size), -1,
+ "cpdma");
+   if (IS_ERR(pool->gen_pool)) {
+   dev_err(dev, "pool create failed %ld\n",
+   PTR_ERR(pool->gen_pool));
+   goto gen_pool_create_fail;
+   }

if (phys) {
pool->phys  = phys;
@@ -185,24 +200,22 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, 
dma_addr_t hw_addr,
pool->phys = pool->hw_addr; /* assumes no IOMMU, don't use this 
value */
}

-   if (pool->iomap)
-   return pool;
-fail:
-   return NULL;
-}
-
-static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
-{
-   if (!pool)
-   return;
+   if (!pool->iomap)
+   goto gen_pool_create_fail;

-   WARN_ON(pool->used_desc);
-   if (pool->cpumap) {
-   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
- pool->phys);
-   } else {
-   iounmap(pool->iomap);
+   ret = gen_pool_add_virt(pool->gen_pool, (unsigned long)pool->iomap,
+   pool->phys, pool->mem_size, -1);
+   if (ret < 0) {
+   dev_err(dev, "pool add failed %d\n", ret);
+   goto gen_pool_add_virt_fail;

Re: [PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-23 Thread Oliver Hartkopp



On 06/23/2016 02:55 PM, Sergei Shtylyov wrote:

Hello.

On 6/23/2016 12:22 PM, Marc Kleine-Budde wrote:


From: Oliver Hartkopp 

For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be
created nor
removed with the ip tool ...

This patch adds a private dellink function for the CAN device driver
interface
that does just nothing.

It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.

Reported-by: ajneu 
Signed-off-by: Oliver Hartkopp 
Cc: 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 348dd5001fa4..ad535a854e5c 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -1011,6 +1011,11 @@ static int can_newlink(struct net *src_net,
struct net_device *dev,
 return -EOPNOTSUPP;
 }

+static void can_dellink(struct net_device *dev, struct list_head *head)
+{
+return;


   Why?



http://marc.info/?l=linux-can&m=146651600421205&w=2

The same reason as for commit 993e6f2fd.

Regards,
Oliver


+}
+
 static struct rtnl_link_ops can_link_ops __read_mostly = {
 .kind= "can",
 .maxtype= IFLA_CAN_MAX,

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-23 Thread Sergei Shtylyov

On 6/23/2016 4:01 PM, Oliver Hartkopp wrote:


From: Oliver Hartkopp 

For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be
created nor
removed with the ip tool ...

This patch adds a private dellink function for the CAN device driver
interface
that does just nothing.

It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.

Reported-by: ajneu 
Signed-off-by: Oliver Hartkopp 
Cc: 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 348dd5001fa4..ad535a854e5c 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -1011,6 +1011,11 @@ static int can_newlink(struct net *src_net,
struct net_device *dev,
 return -EOPNOTSUPP;
 }

+static void can_dellink(struct net_device *dev, struct list_head *head)
+{
+return;


   Why?



http://marc.info/?l=linux-can&m=146651600421205&w=2

The same reason as for commit 993e6f2fd.


   I was asking just about the useless *return* statement...


Regards,
Oliver


MBR, Sergei



[PATCH net] ipv6: allows gracefull fallback from table lookup

2016-06-23 Thread Paolo Abeni
with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:

ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to host

This change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani 
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 969913d..520b788 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1782,7 +1782,7 @@ static struct rt6_info *ip6_nh_lookup_table(struct net 
*net,
};
struct fib6_table *table;
struct rt6_info *rt;
-   int flags = 0;
+   int flags = RT6_LOOKUP_F_IFACE;
 
table = fib6_get_table(net, cfg->fc_table);
if (!table)
-- 
1.8.3.1



Re: [PATCH net] ipv6: allows gracefull fallback from table lookup

2016-06-23 Thread Paolo Abeni
On Thu, 2016-06-23 at 15:11 +0200, Paolo Abeni wrote:
> with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
> nexthop lookups"), net hop lookup is first performed on route creation
> in the passed-in table.
> However device match is not enforced in table lookup, so the found
> route can be later discarded due to egress device mismatch and no
> global lookup will be performed.
> This cause the following to fail:
> 
> ip link add dummy1 type dummy
> ip link add dummy2 type dummy
> ip link set dummy1 up
> ip link set dummy2 up
> ip route add 2001:db8:8086::/48 dev dummy1 metric 20
> ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
> ip route add 2001:db8:8086::/48 dev dummy2 metric 21
> ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
> RTNETLINK answers: No route to host
> 
> This change fixes the issue enforcing device lookup in
> ip6_nh_lookup_table()
> 
> Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
> Reported-and-tested-by: Beniamino Galvani 
> Signed-off-by: Paolo Abeni 

Oops, bad commit message title (not updated from a previous
implementation), I'll resubmit with a more relevant one. Sorry for the
noise.

Paolo



Re: [PATCH] net: ethernet: ti: cpdma: switch to use genalloc

2016-06-23 Thread ivan.khoronzhuk



On 23.06.16 15:36, Grygorii Strashko wrote:

TI CPDMA currently uses a bitmap for tracking descriptors alloactions
allocations, but The genalloc already handles the same and can be used
as with special memory (SRAM) as with DMA cherent memory chank
(dma_alloc_coherent()). Hence, switch to using genalloc and add
desc_num property for each channel for limitation of max number of
allowed descriptors for each CPDMA channel. This patch do not affect
on net throuput.

Cc: Ivan Khoronzhuk 
Signed-off-by: Grygorii Strashko 


Tested-by: Ivan Khoronzhuk 


---
Testing
TCP window: 256K, bandwidth in Mbits/sec:
  host: iperf -s
  device: iperf -c  172.22.39.17 -t600 -i5 -d -w128K

AM437x-idk, 1Gbps link
  before: : 341.60, after: 232+123=355
am57xx-beagle-x15, 1Gbps link
  before: : 1112.80, after: 814+321=1135
am335x-boneblack, 100Mbps link
  before: : 162.40, after: 72+93=165

  drivers/net/ethernet/ti/davinci_cpdma.c | 136 +++-
  1 file changed, 62 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 18bf3a8..03b9882 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -21,7 +21,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include "davinci_cpdma.h"

  /* DMA Registers */
@@ -87,9 +87,8 @@ struct cpdma_desc_pool {
void*cpumap;/* dma_alloc map */
int desc_size, mem_size;
int num_desc, used_desc;
-   unsigned long   *bitmap;
struct device   *dev;
-   spinlock_t  lock;
+   struct gen_pool *gen_pool;
  };

  enum cpdma_state {
@@ -117,6 +116,7 @@ struct cpdma_chan {
int chan_num;
spinlock_t  lock;
int count;
+   u32 desc_num;
u32 mask;
cpdma_handler_fnhandler;
enum dma_data_direction dir;
@@ -145,6 +145,20 @@ struct cpdma_chan {
 (directed << CPDMA_TO_PORT_SHIFT));  \
} while (0)

+static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
+{
+   if (!pool)
+   return;
+
+   WARN_ON(pool->used_desc);
+   if (pool->cpumap) {
+   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
+ pool->phys);
+   } else {
+   iounmap(pool->iomap);
+   }
+}
+
  /*
   * Utility constructs for a cpdma descriptor pool.  Some devices (e.g. davinci
   * emac) have dedicated on-chip memory for these descriptors.  Some other
@@ -155,24 +169,25 @@ static struct cpdma_desc_pool *
  cpdma_desc_pool_create(struct device *dev, u32 phys, dma_addr_t hw_addr,
int size, int align)
  {
-   int bitmap_size;
struct cpdma_desc_pool *pool;
+   int ret;

pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL);
if (!pool)
-   goto fail;
-
-   spin_lock_init(&pool->lock);
+   goto gen_pool_create_fail;

pool->dev= dev;
pool->mem_size   = size;
pool->desc_size  = ALIGN(sizeof(struct cpdma_desc), align);
pool->num_desc   = size / pool->desc_size;

-   bitmap_size  = (pool->num_desc / BITS_PER_LONG) * sizeof(long);
-   pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
-   if (!pool->bitmap)
-   goto fail;
+   pool->gen_pool = devm_gen_pool_create(dev, ilog2(pool->desc_size), -1,
+ "cpdma");
+   if (IS_ERR(pool->gen_pool)) {
+   dev_err(dev, "pool create failed %ld\n",
+   PTR_ERR(pool->gen_pool));
+   goto gen_pool_create_fail;
+   }

if (phys) {
pool->phys  = phys;
@@ -185,24 +200,22 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, 
dma_addr_t hw_addr,
pool->phys = pool->hw_addr; /* assumes no IOMMU, don't use this 
value */
}

-   if (pool->iomap)
-   return pool;
-fail:
-   return NULL;
-}
-
-static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool)
-{
-   if (!pool)
-   return;
+   if (!pool->iomap)
+   goto gen_pool_create_fail;

-   WARN_ON(pool->used_desc);
-   if (pool->cpumap) {
-   dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
- pool->phys);
-   } else {
-   iounmap(pool->iomap);
+   ret = gen_pool_add_virt(pool->gen_pool, (unsigned long)pool->iomap,
+   pool->phys, pool->mem_size, -1);
+   if (ret < 0) {
+   dev_err(dev, "pool add failed %d\n", ret);
+   goto gen_pool_add_vi

[PATCH net v2] ipv6: enforce egress device match in per table nexthop lookups

2016-06-23 Thread Paolo Abeni
with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:

ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to host

This change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()

v1->v2: updated commit message title

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani 
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 969913d..520b788 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1782,7 +1782,7 @@ static struct rt6_info *ip6_nh_lookup_table(struct net 
*net,
};
struct fib6_table *table;
struct rt6_info *rt;
-   int flags = 0;
+   int flags = RT6_LOOKUP_F_IFACE;
 
table = fib6_get_table(net, cfg->fc_table);
if (!table)
-- 
1.8.3.1



Re: [alsa-devel] [very-RFC 0/8] TSN driver for the kernel

2016-06-23 Thread Richard Cochran
On Thu, Jun 23, 2016 at 12:38:48PM +0200, Henrik Austad wrote:
> Richard: is it fair to assume that if ptp4l is running and is part of a PTP 
> domain, ktime_get() will return PTP-adjusted time for the system?

No.

> Or do I also need to run phc2sys in order to sync the system-time
> to PTP-time?

Yes, unless you are using SW time stamping, in which case ptp4l will
steer the system clock directly.

HTH,
Richard





[PATCH net-next 3/5] phy: separate swphy state validation from register generation

2016-06-23 Thread Russell King
Separate out the generation of MII registers from the state validation.
This allows us to simplify the error handing in fixed_phy() by allowing
earlier error detection.

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/fixed_phy.c | 15 +++
 drivers/net/phy/swphy.c | 33 ++---
 drivers/net/phy/swphy.h |  3 ++-
 3 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index d98a0d90b5a5..d84e30c46824 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -48,12 +48,12 @@ static struct fixed_mdio_bus platform_fmb = {
.phys = LIST_HEAD_INIT(platform_fmb.phys),
 };
 
-static int fixed_phy_update_regs(struct fixed_phy *fp)
+static void fixed_phy_update_regs(struct fixed_phy *fp)
 {
if (gpio_is_valid(fp->link_gpio))
fp->status.link = !!gpio_get_value_cansleep(fp->link_gpio);
 
-   return swphy_update_regs(fp->regs, &fp->status);
+   swphy_update_regs(fp->regs, &fp->status);
 }
 
 static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
@@ -160,6 +160,10 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
struct fixed_mdio_bus *fmb = &platform_fmb;
struct fixed_phy *fp;
 
+   ret = swphy_validate_state(status);
+   if (ret < 0)
+   return ret;
+
fp = kzalloc(sizeof(*fp), GFP_KERNEL);
if (!fp)
return -ENOMEM;
@@ -180,17 +184,12 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
goto err_regs;
}
 
-   ret = fixed_phy_update_regs(fp);
-   if (ret)
-   goto err_gpio;
+   fixed_phy_update_regs(fp);
 
list_add_tail(&fp->node, &fmb->phys);
 
return 0;
 
-err_gpio:
-   if (gpio_is_valid(fp->link_gpio))
-   gpio_free(fp->link_gpio);
 err_regs:
kfree(fp);
return ret;
diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index c88a194b4cb6..21a9bd8a7830 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -87,6 +87,29 @@ static int swphy_decode_speed(int speed)
 }
 
 /**
+ * swphy_validate_state - validate the software phy status
+ * @state: software phy status
+ *
+ * This checks that we can represent the state stored in @state can be
+ * represented in the emulated MII registers.  Returns 0 if it can,
+ * otherwise returns -EINVAL.
+ */
+int swphy_validate_state(const struct fixed_phy_status *state)
+{
+   int err;
+
+   if (state->link) {
+   err = swphy_decode_speed(state->speed);
+   if (err < 0) {
+   pr_warn("swphy: unknown speed\n");
+   return -EINVAL;
+   }
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(swphy_validate_state);
+
+/**
  * swphy_update_regs - update MII register array with fixed phy state
  * @regs: array of 32 registers to update
  * @state: fixed phy status
@@ -94,7 +117,7 @@ static int swphy_decode_speed(int speed)
  * Update the array of MII registers with the fixed phy link, speed,
  * duplex and pause mode settings.
  */
-int swphy_update_regs(u16 *regs, const struct fixed_phy_status *state)
+void swphy_update_regs(u16 *regs, const struct fixed_phy_status *state)
 {
int speed_index, duplex_index;
u16 bmsr = BMSR_ANEGCAPABLE;
@@ -103,10 +126,8 @@ int swphy_update_regs(u16 *regs, const struct 
fixed_phy_status *state)
u16 lpa = 0;
 
speed_index = swphy_decode_speed(state->speed);
-   if (speed_index < 0) {
-   pr_warn("swphy: unknown speed\n");
-   return -EINVAL;
-   }
+   if (WARN_ON(speed_index < 0))
+   return;
 
duplex_index = state->duplex ? SWMII_DUPLEX_FULL : SWMII_DUPLEX_HALF;
 
@@ -133,7 +154,5 @@ int swphy_update_regs(u16 *regs, const struct 
fixed_phy_status *state)
regs[MII_BMCR] = bmcr;
regs[MII_LPA] = lpa;
regs[MII_STAT1000] = lpagb;
-
-   return 0;
 }
 EXPORT_SYMBOL_GPL(swphy_update_regs);
diff --git a/drivers/net/phy/swphy.h b/drivers/net/phy/swphy.h
index feaa38ff86a2..33d2e061896e 100644
--- a/drivers/net/phy/swphy.h
+++ b/drivers/net/phy/swphy.h
@@ -3,6 +3,7 @@
 
 struct fixed_phy_status;
 
-int swphy_update_regs(u16 *regs, const struct fixed_phy_status *state);
+int swphy_validate_state(const struct fixed_phy_status *state);
+void swphy_update_regs(u16 *regs, const struct fixed_phy_status *state);
 
 #endif
-- 
2.1.0



[PATCH net-next 1/5] phy: move fixed_phy MII register generation to a library

2016-06-23 Thread Russell King
Move the fixed_phy MII register generation to a library to allow other
software phy implementations to use this code.

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/Kconfig |   4 ++
 drivers/net/phy/Makefile|   3 +-
 drivers/net/phy/fixed_phy.c |  95 ++---
 drivers/net/phy/swphy.c | 126 
 drivers/net/phy/swphy.h |   8 +++
 5 files changed, 143 insertions(+), 93 deletions(-)
 create mode 100644 drivers/net/phy/swphy.c
 create mode 100644 drivers/net/phy/swphy.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 8dac88abbc39..f96829415ce6 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -12,6 +12,9 @@ menuconfig PHYLIB
 
 if PHYLIB
 
+config SWPHY
+   bool
+
 comment "MII PHY device drivers"
 
 config AQUANTIA_PHY
@@ -159,6 +162,7 @@ config MICROCHIP_PHY
 config FIXED_PHY
tristate "Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs"
depends on PHYLIB
+   select SWPHY
---help---
  Adds the platform "fixed" MDIO Bus to cover the boards that use
  PHYs that are not connected to the real MDIO bus.
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 4170642a2035..7158274327d0 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -1,6 +1,7 @@
 # Makefile for Linux PHY drivers
 
-libphy-objs:= phy.o phy_device.o mdio_bus.o mdio_device.o
+libphy-y   := phy.o phy_device.o mdio_bus.o mdio_device.o
+libphy-$(CONFIG_SWPHY) += swphy.o
 
 obj-$(CONFIG_PHYLIB)   += libphy.o
 obj-$(CONFIG_AQUANTIA_PHY) += aquantia.o
diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 2d2e4339f0df..d98a0d90b5a5 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 
+#include "swphy.h"
+
 #define MII_REGS_NUM 29
 
 struct fixed_mdio_bus {
@@ -48,101 +50,10 @@ static struct fixed_mdio_bus platform_fmb = {
 
 static int fixed_phy_update_regs(struct fixed_phy *fp)
 {
-   u16 bmsr = BMSR_ANEGCAPABLE;
-   u16 bmcr = 0;
-   u16 lpagb = 0;
-   u16 lpa = 0;
-
if (gpio_is_valid(fp->link_gpio))
fp->status.link = !!gpio_get_value_cansleep(fp->link_gpio);
 
-   if (fp->status.duplex) {
-   switch (fp->status.speed) {
-   case 1000:
-   bmsr |= BMSR_ESTATEN;
-   break;
-   case 100:
-   bmsr |= BMSR_100FULL;
-   break;
-   case 10:
-   bmsr |= BMSR_10FULL;
-   break;
-   default:
-   break;
-   }
-   } else {
-   switch (fp->status.speed) {
-   case 1000:
-   bmsr |= BMSR_ESTATEN;
-   break;
-   case 100:
-   bmsr |= BMSR_100HALF;
-   break;
-   case 10:
-   bmsr |= BMSR_10HALF;
-   break;
-   default:
-   break;
-   }
-   }
-
-   if (fp->status.link) {
-   bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
-
-   if (fp->status.duplex) {
-   bmcr |= BMCR_FULLDPLX;
-
-   switch (fp->status.speed) {
-   case 1000:
-   bmcr |= BMCR_SPEED1000;
-   lpagb |= LPA_1000FULL;
-   break;
-   case 100:
-   bmcr |= BMCR_SPEED100;
-   lpa |= LPA_100FULL;
-   break;
-   case 10:
-   lpa |= LPA_10FULL;
-   break;
-   default:
-   pr_warn("fixed phy: unknown speed\n");
-   return -EINVAL;
-   }
-   } else {
-   switch (fp->status.speed) {
-   case 1000:
-   bmcr |= BMCR_SPEED1000;
-   lpagb |= LPA_1000HALF;
-   break;
-   case 100:
-   bmcr |= BMCR_SPEED100;
-   lpa |= LPA_100HALF;
-   break;
-   case 10:
-   lpa |= LPA_10HALF;
-   break;
-   default:
-   pr_warn("fixed phy: unknown speed\n");
-   return -EINVAL;
-   }
-   }
-
-   if (fp->status.pause)
-

[PATCH net-next 0/5] Initial SFP support patches

2016-06-23 Thread Russell King - ARM Linux
Hi David,

Please review and merge this initial patch set, which is part of a
larger set previously posted adding SFP support to phy and mvneta.

This initial set are focused on cleaning up and reorganising the
fixed-phy code to allow the core software-phy code to be re-used.

These are based on net-next.

Thanks.

 drivers/net/phy/Kconfig |   4 +
 drivers/net/phy/Makefile|   3 +-
 drivers/net/phy/fixed_phy.c | 153 +++--
 drivers/net/phy/swphy.c | 179 
 drivers/net/phy/swphy.h |   9 +++
 5 files changed, 222 insertions(+), 126 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


[PATCH net-next 2/5] phy: convert swphy register generation to tabular form

2016-06-23 Thread Russell King
Convert the swphy register generation to tabular form which allows us
to eliminate multiple switch() statements.  This results in a smaller
object code size, more efficient, and easier to add support for faster
speeds.

Before:

Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 0164      0034  2**2

   textdata bss dec hex filename
388   0   0 388 184 swphy.o

After:

Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 00fc      0034  2**2
  5 .rodata   0028      0138  2**2

   textdata bss dec hex filename
324   0   0 324 144 swphy.o

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/swphy.c | 143 ++--
 1 file changed, 78 insertions(+), 65 deletions(-)

diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index 0551a79a2454..c88a194b4cb6 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -20,6 +20,72 @@
 
 #include "swphy.h"
 
+struct swmii_regs {
+   u16 bmcr;
+   u16 bmsr;
+   u16 lpa;
+   u16 lpagb;
+};
+
+enum {
+   SWMII_SPEED_10 = 0,
+   SWMII_SPEED_100,
+   SWMII_SPEED_1000,
+   SWMII_DUPLEX_HALF = 0,
+   SWMII_DUPLEX_FULL,
+};
+
+/*
+ * These two tables get bitwise-anded together to produce the final result.
+ * This means the speed table must contain both duplex settings, and the
+ * duplex table must contain all speed settings.
+ */
+static const struct swmii_regs speed[] = {
+   [SWMII_SPEED_10] = {
+   .bmcr  = BMCR_FULLDPLX,
+   .lpa   = LPA_10FULL | LPA_10HALF,
+   },
+   [SWMII_SPEED_100] = {
+   .bmcr  = BMCR_FULLDPLX | BMCR_SPEED100,
+   .bmsr  = BMSR_100FULL | BMSR_100HALF,
+   .lpa   = LPA_100FULL | LPA_100HALF,
+   },
+   [SWMII_SPEED_1000] = {
+   .bmcr  = BMCR_FULLDPLX | BMCR_SPEED1000,
+   .bmsr  = BMSR_ESTATEN,
+   .lpagb = LPA_1000FULL | LPA_1000HALF,
+   },
+};
+
+static const struct swmii_regs duplex[] = {
+   [SWMII_DUPLEX_HALF] = {
+   .bmcr  = ~BMCR_FULLDPLX,
+   .bmsr  = BMSR_ESTATEN | BMSR_100HALF,
+   .lpa   = LPA_10HALF | LPA_100HALF,
+   .lpagb = LPA_1000HALF,
+   },
+   [SWMII_DUPLEX_FULL] = {
+   .bmcr  = ~0,
+   .bmsr  = BMSR_ESTATEN | BMSR_100FULL,
+   .lpa   = LPA_10FULL | LPA_100FULL,
+   .lpagb = LPA_1000FULL,
+   },
+};
+
+static int swphy_decode_speed(int speed)
+{
+   switch (speed) {
+   case 1000:
+   return SWMII_SPEED_1000;
+   case 100:
+   return SWMII_SPEED_100;
+   case 10:
+   return SWMII_SPEED_10;
+   default:
+   return -EINVAL;
+   }
+}
+
 /**
  * swphy_update_regs - update MII register array with fixed phy state
  * @regs: array of 32 registers to update
@@ -30,81 +96,28 @@
  */
 int swphy_update_regs(u16 *regs, const struct fixed_phy_status *state)
 {
+   int speed_index, duplex_index;
u16 bmsr = BMSR_ANEGCAPABLE;
u16 bmcr = 0;
u16 lpagb = 0;
u16 lpa = 0;
 
-   if (state->duplex) {
-   switch (state->speed) {
-   case 1000:
-   bmsr |= BMSR_ESTATEN;
-   break;
-   case 100:
-   bmsr |= BMSR_100FULL;
-   break;
-   case 10:
-   bmsr |= BMSR_10FULL;
-   break;
-   default:
-   break;
-   }
-   } else {
-   switch (state->speed) {
-   case 1000:
-   bmsr |= BMSR_ESTATEN;
-   break;
-   case 100:
-   bmsr |= BMSR_100HALF;
-   break;
-   case 10:
-   bmsr |= BMSR_10HALF;
-   break;
-   default:
-   break;
-   }
+   speed_index = swphy_decode_speed(state->speed);
+   if (speed_index < 0) {
+   pr_warn("swphy: unknown speed\n");
+   return -EINVAL;
}
 
+   duplex_index = state->duplex ? SWMII_DUPLEX_FULL : SWMII_DUPLEX_HALF;
+
+   bmsr |= speed[speed_index].bmsr & duplex[duplex_index].bmsr;
+
if (state->link) {
bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
 
-   if (state->duplex) {
-   bmcr |= BMCR_FULLDPLX;
-
-   switch (state->speed) {
-   case 1000:
-   bmcr |= BMCR_SPEED1000;
-   lpagb |= LPA_1000FULL;
-   break;
-

[PATCH net-next 4/5] phy: generate swphy registers on the fly

2016-06-23 Thread Russell King
Generate software phy registers as and when requested, rather than
duplicating the state in fixed_phy.  This allows us to eliminate
the duplicate storage of of the same data, which is only different
in format.

As fixed_phy_update_regs() no longer updates register state, rename
it to fixed_phy_update().

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/fixed_phy.c | 31 +-
 drivers/net/phy/swphy.c | 47 -
 drivers/net/phy/swphy.h |  2 +-
 3 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index d84e30c46824..0dfed86bdb5a 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -26,8 +26,6 @@
 
 #include "swphy.h"
 
-#define MII_REGS_NUM 29
-
 struct fixed_mdio_bus {
struct mii_bus *mii_bus;
struct list_head phys;
@@ -35,7 +33,6 @@ struct fixed_mdio_bus {
 
 struct fixed_phy {
int addr;
-   u16 regs[MII_REGS_NUM];
struct phy_device *phydev;
struct fixed_phy_status status;
int (*link_update)(struct net_device *, struct fixed_phy_status *);
@@ -48,12 +45,10 @@ static struct fixed_mdio_bus platform_fmb = {
.phys = LIST_HEAD_INIT(platform_fmb.phys),
 };
 
-static void fixed_phy_update_regs(struct fixed_phy *fp)
+static void fixed_phy_update(struct fixed_phy *fp)
 {
if (gpio_is_valid(fp->link_gpio))
fp->status.link = !!gpio_get_value_cansleep(fp->link_gpio);
-
-   swphy_update_regs(fp->regs, &fp->status);
 }
 
 static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
@@ -61,29 +56,15 @@ static int fixed_mdio_read(struct mii_bus *bus, int 
phy_addr, int reg_num)
struct fixed_mdio_bus *fmb = bus->priv;
struct fixed_phy *fp;
 
-   if (reg_num >= MII_REGS_NUM)
-   return -1;
-
-   /* We do not support emulating Clause 45 over Clause 22 register reads
-* return an error instead of bogus data.
-*/
-   switch (reg_num) {
-   case MII_MMD_CTRL:
-   case MII_MMD_DATA:
-   return -1;
-   default:
-   break;
-   }
-
list_for_each_entry(fp, &fmb->phys, node) {
if (fp->addr == phy_addr) {
/* Issue callback if user registered it. */
if (fp->link_update) {
fp->link_update(fp->phydev->attached_dev,
&fp->status);
-   fixed_phy_update_regs(fp);
+   fixed_phy_update(fp);
}
-   return fp->regs[reg_num];
+   return swphy_read_reg(reg_num, &fp->status);
}
}
 
@@ -143,7 +124,7 @@ int fixed_phy_update_state(struct phy_device *phydev,
_UPD(pause);
_UPD(asym_pause);
 #undef _UPD
-   fixed_phy_update_regs(fp);
+   fixed_phy_update(fp);
return 0;
}
}
@@ -168,8 +149,6 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
if (!fp)
return -ENOMEM;
 
-   memset(fp->regs, 0xFF,  sizeof(fp->regs[0]) * MII_REGS_NUM);
-
if (irq != PHY_POLL)
fmb->mii_bus->irq[phy_addr] = irq;
 
@@ -184,7 +163,7 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
goto err_regs;
}
 
-   fixed_phy_update_regs(fp);
+   fixed_phy_update(fp);
 
list_add_tail(&fp->node, &fmb->phys);
 
diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index 21a9bd8a7830..34f58f2349e9 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -20,6 +20,8 @@
 
 #include "swphy.h"
 
+#define MII_REGS_NUM 29
+
 struct swmii_regs {
u16 bmcr;
u16 bmsr;
@@ -110,14 +112,13 @@ int swphy_validate_state(const struct fixed_phy_status 
*state)
 EXPORT_SYMBOL_GPL(swphy_validate_state);
 
 /**
- * swphy_update_regs - update MII register array with fixed phy state
- * @regs: array of 32 registers to update
+ * swphy_read_reg - return a MII register from the fixed phy state
+ * @reg: MII register
  * @state: fixed phy status
  *
- * Update the array of MII registers with the fixed phy link, speed,
- * duplex and pause mode settings.
+ * Return the MII @reg register generated from the fixed phy state @state.
  */
-void swphy_update_regs(u16 *regs, const struct fixed_phy_status *state)
+int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 {
int speed_index, duplex_index;
u16 bmsr = BMSR_ANEGCAPABLE;
@@ -125,9 +126,12 @@ void swphy_update_regs(u16 *regs, const struct 
fixed_phy_status *state)
u16 lpagb = 0;
u16 lpa = 0;
 
+   if (reg > MII_REGS_NUM)
+   return -1;
+
speed_index = sw

[PATCH net-next 5/5] phy: improve safety of fixed-phy MII register reading

2016-06-23 Thread Russell King
There is no prevention of a concurrent call to both fixed_mdio_read()
and fixed_phy_update_state(), which can result in the state being
modified while it's being inspected.  Fix this by using a seqcount
to detect modifications, and memcpy()ing the state.

We remain slightly naughty here, calling link_update() and updating
the link status within the read-side loop - which would need rework
of the design to change.

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/fixed_phy.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 0dfed86bdb5a..b376ada83598 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "swphy.h"
 
@@ -34,6 +35,7 @@ struct fixed_mdio_bus {
 struct fixed_phy {
int addr;
struct phy_device *phydev;
+   seqcount_t seqcount;
struct fixed_phy_status status;
int (*link_update)(struct net_device *, struct fixed_phy_status *);
struct list_head node;
@@ -58,13 +60,21 @@ static int fixed_mdio_read(struct mii_bus *bus, int 
phy_addr, int reg_num)
 
list_for_each_entry(fp, &fmb->phys, node) {
if (fp->addr == phy_addr) {
-   /* Issue callback if user registered it. */
-   if (fp->link_update) {
-   fp->link_update(fp->phydev->attached_dev,
-   &fp->status);
-   fixed_phy_update(fp);
-   }
-   return swphy_read_reg(reg_num, &fp->status);
+   struct fixed_phy_status state;
+   int s;
+
+   do {
+   s = read_seqcount_begin(&fp->seqcount);
+   /* Issue callback if user registered it. */
+   if (fp->link_update) {
+   
fp->link_update(fp->phydev->attached_dev,
+   &fp->status);
+   fixed_phy_update(fp);
+   }
+   state = fp->status;
+   } while (read_seqcount_retry(&fp->seqcount, s));
+
+   return swphy_read_reg(reg_num, &state);
}
}
 
@@ -116,6 +126,7 @@ int fixed_phy_update_state(struct phy_device *phydev,
 
list_for_each_entry(fp, &fmb->phys, node) {
if (fp->addr == phydev->mdio.addr) {
+   write_seqcount_begin(&fp->seqcount);
 #define _UPD(x) if (changed->x) \
fp->status.x = status->x
_UPD(link);
@@ -125,6 +136,7 @@ int fixed_phy_update_state(struct phy_device *phydev,
_UPD(asym_pause);
 #undef _UPD
fixed_phy_update(fp);
+   write_seqcount_end(&fp->seqcount);
return 0;
}
}
@@ -149,6 +161,8 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
if (!fp)
return -ENOMEM;
 
+   seqcount_init(&fp->seqcount);
+
if (irq != PHY_POLL)
fmb->mii_bus->irq[phy_addr] = irq;
 
-- 
2.1.0



Re: rstpd implementation

2016-06-23 Thread Phil

On 06/22/2016 08:12 PM, Stephen Hemminger wrote:

On Wed, 22 Jun 2016 12:44:52 -0500
ebied...@xmission.com (Eric W. Biederman) wrote:


Phil  writes:


Hi,

When looking for an RSTP daemon I found Stephen Hemminger's
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/rstp.git

with it's last commit from October 2011.

Is this implementation still in good use by anybody - or has it been
replaced/superseded by another implementation?

I don't know and when you get into user space daemons they aren't much
talked about on the kernel lists.  That said you will likely fair better
on the netdev list (cc'd).

Eric

The current one I recommend is the MSTPd done by Cumulus
  https://sourceforge.net/p/mstpd/wiki/Home/
But like all projects they could use help


Thank's a lot to Eric and Stephen for your answers.
In the meantime I also found https://github.com/ocedo/mstpd

Philipp


[PATCH] dsa: mv88e6xxx: hide unused functions

2016-06-23 Thread Arnd Bergmann
When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
functions whose only callers are all inside of an #ifdef:

drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but 
not used [-Werror=unused-function]
drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but not 
used [-Werror=unused-function]

This adds another ifdef around the function definitions. The warnings
appeared after the functions were marked 'static', but the problem
was already there before that.

Signed-off-by: Arnd Bergmann 
Fixes: 57d3231057e9 ("net: dsa: mv88e6xxx: fix style issues")
---
 drivers/net/dsa/mv88e6xxx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 9b116d8d4e23..2a95f2d6cf09 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3241,6 +3241,7 @@ unlock:
return err;
 }
 
+#ifdef CONFIG_NET_DSA_HWMON
 static int mv88e6xxx_mdio_page_read(struct dsa_switch *ds, int port, int page,
int reg)
 {
@@ -3266,6 +3267,7 @@ static int mv88e6xxx_mdio_page_write(struct dsa_switch 
*ds, int port, int page,
 
return ret;
 }
+#endif
 
 static int mv88e6xxx_port_to_mdio_addr(struct mv88e6xxx_priv_state *ps,
   int port)
-- 
2.9.0



Re: [PATCH net v2] ipv6: enforce egress device match in per table nexthop lookups

2016-06-23 Thread David Ahern

On 6/23/16 7:25 AM, Paolo Abeni wrote:

with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:

ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to host

This change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()

v1->v2: updated commit message title

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani 
Signed-off-by: Paolo Abeni 
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 969913d..520b788 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1782,7 +1782,7 @@ static struct rt6_info *ip6_nh_lookup_table(struct net 
*net,
};
struct fib6_table *table;
struct rt6_info *rt;
-   int flags = 0;
+   int flags = RT6_LOOKUP_F_IFACE;

table = fib6_get_table(net, cfg->fc_table);
if (!table)



Acked-by: David Ahern 


Re: [PATCH net-next 0/4] net_sched: bulk dequeue and deferred drops

2016-06-23 Thread Jesper Dangaard Brouer
On Wed, 22 Jun 2016 09:49:48 -0700
Eric Dumazet  wrote:

> On Wed, 2016-06-22 at 17:44 +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 22 Jun 2016 07:55:43 -0700
> > Eric Dumazet  wrote:
> >   
> > > On Wed, 2016-06-22 at 16:47 +0200, Jesper Dangaard Brouer wrote:  
> > > > On Tue, 21 Jun 2016 23:16:48 -0700
> > > > Eric Dumazet  wrote:
> > > > 
> > > > > First patch adds an additional parameter to ->enqueue() qdisc method
> > > > > so that drops can be done outside of critical section
> > > > > (after locks are released).
> > > > > 
> > > > > Then fq_codel can have a small optimization to reduce number of cache
> > > > > lines misses during a drop event
> > > > > (possibly accumulating hundreds of packets to be freed).
> > > > > 
> > > > > A small htb change exports the backlog in class dumps.
> > > > > 
> > > > > Final patch adds bulk dequeue to qdiscs that were lacking this 
> > > > > feature.
> > > > > 
> > > > > This series brings a nice qdisc performance increase (more than 80 %
> > > > > in some cases).
> > > > 
> > > > Thanks for working on this Eric! this is great work! :-)
> > > 
> > > Thanks Jesper
> > > 
> > > I worked yesterday on bulk enqueues, but initial results are not that
> > > great.  
> > 
> > Hi Eric,
> > 
> > This is interesting work! But I think you should read Luigi Rizzo's
> > (Cc'ed) paper on title "A Fast and Practical Software Packet Scheduling
> > Architecture"[1]
> > 
> > [1] http://info.iet.unipi.it/~luigi/papers/20160511-mysched-preprint.pdf
> > 
> > Luigi will be at Netfilter Workshop next week, and will actually
> > present on topic/paper you two should talk ;-)
> > 
> > The article is not a 100% match for what we need, but there is some
> > good ideas.  The article also have a sort of "prequeue" that
> > enqueue'ing CPUs will place packets into.
> > 
> > My understanding of the article:
> > 
> > 1. transmitters submit packets to an intermediate queue
> >(replace q->enqueue call) lockless submit as queue per CPU
> >(runs in parallel)
> > 
> > 2. like we only have _one_ qdisc dequeue process, this process (called
> >arbiter) empty the intermediate queues, and then invoke q->enqueue()
> >and q->dequeue(). (in a locked session/region)
> > 
> > 3. Packets returned from q->dequeue() is placed on an outgoing
> >intermediate queue.
> > 
> > 4. the transmitter then looks to see there are any packets to drain()
> >from the outgoing queue.  This can run in parallel.
> > 
> > If the transmitter submitting a packet, detect no arbiter is running,
> > it can become the arbiter itself.  Like we do with qdisc_run_begin()
> > setting state __QDISC___STATE_RUNNING.
> > 
> > The problem with this scheme is push-back from qdisc->enqueue
> > (NET_XMIT_CN) does not "reach" us.  And push-back in-form of processes
> > blocking on qdisc root lock, but that could be handled by either
> > blocking in article's submit() or returning some congestion return code
> > from submit().   
> 
> Okay, I see that you prepare upcoming conference in Amsterdam,
> but please keep this thread about existing kernel code, not the one that
> eventually reach a new operating system in 5 years ;)
> 
> 1) We _want_ the result of the sends, obviously.

How dependent are we on the return codes?

E.g. the NET_XMIT_CN return is not that accurate, it does not mean this
packet was dropped, it could be from an unrelated flow.


> 2) We also want back pressure, without adding complex callbacks and
> ref-counting.
> 
> 3) We do not want to burn a cpu per TX queue (at least one per NUMA
> node ???) only to send few packets per second,
> Our model is still interrupt based, plus NAPI for interrupt mitigation.
>
> 4) I do not want to lock an innocent cpu to send packets from other
> threads/cpu without a tight control.

Article present two modes: 1) a dedicated CPU runs the "arbiter",
2) submitting CPU becomes the arbiter (iif not other CPU is the arbiter).

I imagine we use mode 2.  Which is almost what we already do now.
The qdisc layer only allow a single CPU to be dequeue'ing packets.  This
process can be seen as the "arbiter".  The only difference is that it
will pickup packets from an intermediate queue, and invoke q->enqueue().
(Still keeping the quota in __qdisc_run()).

 
> In the patch I sent, I basically replaced a locked operation
> (spin_lock(&q->busylock)) with another one (xchg()) , but I did not add
> yet another queue before the qdisc ones, bufferbloat forbids.

Is it really bufferbloat to introduce an intermidiate queue at this
point.  The enqueue/submit process, can see that qdisc_is_running, thus
it knows these packets will be picked up very shortly (within 200
cycles) and "arbiter" will invoke q->enqueue() allowing qdisc to react
to bufferbloat.


> The virtual queue here is one packet per cpu, which basically is the
> same than before this patch, since each cpu spinning on busylock has one
> skb to send anyway.
> 
> This is basically a simple extension of MCS locks, wher

Re: [PATCH net v2] ipv6: enforce egress device match in per table nexthop lookups

2016-06-23 Thread David Ahern

On 6/23/16 8:20 AM, David Ahern wrote:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 969913d..520b788 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1782,7 +1782,7 @@ static struct rt6_info
*ip6_nh_lookup_table(struct net *net,
 };
 struct fib6_table *table;
 struct rt6_info *rt;
-int flags = 0;
+int flags = RT6_LOOKUP_F_IFACE;

 table = fib6_get_table(net, cfg->fc_table);
 if (!table)



Acked-by: David Ahern 


I take that back.

I think RT6_LOOKUP_F_IFACE should only be set if cfg->fc_ifindex is set.


Re: [PATCH net v2] ipv6: enforce egress device match in per table nexthop lookups

2016-06-23 Thread Paolo Abeni
On Thu, 2016-06-23 at 08:29 -0600, David Ahern wrote:
> On 6/23/16 8:20 AM, David Ahern wrote:
> >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> >> index 969913d..520b788 100644
> >> --- a/net/ipv6/route.c
> >> +++ b/net/ipv6/route.c
> >> @@ -1782,7 +1782,7 @@ static struct rt6_info
> >> *ip6_nh_lookup_table(struct net *net,
> >>  };
> >>  struct fib6_table *table;
> >>  struct rt6_info *rt;
> >> -int flags = 0;
> >> +int flags = RT6_LOOKUP_F_IFACE;
> >>
> >>  table = fib6_get_table(net, cfg->fc_table);
> >>  if (!table)
> >>
> >
> > Acked-by: David Ahern 
> 
> I take that back.
> 
> I think RT6_LOOKUP_F_IFACE should only be set if cfg->fc_ifindex is set.

AFAICS the latter condition should not be needed. The related
information is passed all way down to rt6_score_route(), where it's
really used:

m = rt6_check_dev(rt, oif);
if (!m && (strict & RT6_LOOKUP_F_IFACE))
return RT6_NUD_FAIL_HARD;

and 'm' can be 0 only if oif is set: RT6_LOOKUP_F_IFACE has no effect
ifindex is set.

Paolo




Re: [PATCH] Maxim/driver: Add driver for maxim ds26522

2016-06-23 Thread David Miller
From: Zhao Qiang 
Date: Thu, 23 Jun 2016 09:09:45 +0800

> +MODULE_DESCRIPTION(DRV_DESC);

There is no definition of DRV_DESC, so this makes it look like
you didn't even compile this driver.


Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-23 Thread Anuradha Karuppiah
 we can't separate CPU and HW stats there. In some cases (or ASICs) HW
 counters do
 not include CPU generated packetsyou will have to add CPU
 generated pkt counters to the
 hw counters for such virtual device stats.
>>> Can you please provide and example how that could happen?
>>
>>example is the bridge vlan stats I mention below. These are usually counted
>>by attaching hw virtual counter resources. And CPU generated packets
>>in some cases maybe setup to bypass the ASIC pipeline because the CPU
>>has already made the required decisions. So, they may not be counted by
>>by such hw virtual counters.
>
> Bypass ASIC? How do the packets get on the wire?
>

Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
ASIC ships the CPU generated packet out of the switch/front-panel
port. Continuing Roopa's example of vlan netdev stats To get the
HW stats counters are typically tied to the ingress and egress vlan hw
entries. All the incoming packets are subject to the ingress vlan
lookup irrespective of whether they get punted to the CPU or whether
they are forwarded to another front panel port. In that case the
ingress HW stats does represent all packets. However for CPU
originated packets egress vlan lookups are bypassed in the ASIC (this
is common forwarding option in most ASICs) and the packet shipped as
is out of front-panel port specified by the CPU. Which means these
packets will NOT be counted against the egress VLAN HW counter; hence
the need for summation.


Re: [PATCH] dsa: mv88e6xxx: hide unused functions

2016-06-23 Thread Vivien Didelot
Hi,

Arnd Bergmann  writes:

> When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
> functions whose only callers are all inside of an #ifdef:
>
> drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but 
> not used [-Werror=unused-function]
> drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but 
> not used [-Werror=unused-function]
>
> This adds another ifdef around the function definitions. The warnings
> appeared after the functions were marked 'static', but the problem
> was already there before that.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: 57d3231057e9 ("net: dsa: mv88e6xxx: fix style issues")

Reviewed-by: Vivien Didelot 

David, this patch is meant for net-next. It applies cleanly *before* my
last two submissions:

1/2 http://patchwork.ozlabs.org/patch/638773/
2/2 http://patchwork.ozlabs.org/patch/638772/

Thanks,

Vivien


[PATCH] vsock: make listener child lock ordering explicit

2016-06-23 Thread Stefan Hajnoczi
There are several places where the listener and pending or accept queue
child sockets are accessed at the same time.  Lockdep is unhappy that
two locks from the same class are held.

Tell lockdep that it is safe and document the lock ordering.

Originally Claudio Imbrenda  sent a similar
patch asking whether this is safe.  I have audited the code and also
covered the vsock_pending_work() function.

Suggested-by: Claudio Imbrenda 
Signed-off-by: Stefan Hajnoczi 
---
 net/vmw_vsock/af_vsock.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index b5f1221..b96ac91 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -61,6 +61,14 @@
  * function will also cleanup rejected sockets, those that reach the connected
  * state but leave it before they have been accepted.
  *
+ * - Lock ordering for pending or accept queue sockets is:
+ *
+ * lock_sock(listener);
+ * lock_sock_nested(pending, SINGLE_DEPTH_NESTING);
+ *
+ * Using explicit nested locking keeps lockdep happy since normally only one
+ * lock of a given class may be taken at a time.
+ *
  * - Sockets created by user action will be cleaned up when the user process
  * calls close(2), causing our release implementation to be called. Our release
  * implementation will perform some cleanup then drop the last reference so our
@@ -443,7 +451,7 @@ void vsock_pending_work(struct work_struct *work)
cleanup = true;
 
lock_sock(listener);
-   lock_sock(sk);
+   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
 
if (vsock_is_pending(sk)) {
vsock_remove_pending(listener, sk);
@@ -1292,7 +1300,7 @@ static int vsock_accept(struct socket *sock, struct 
socket *newsock, int flags)
if (connected) {
listener->sk_ack_backlog--;
 
-   lock_sock(connected);
+   lock_sock_nested(connected, SINGLE_DEPTH_NESTING);
vconnected = vsock_sk(connected);
 
/* If the listener socket has received an error, then we should
-- 
2.7.4



Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-23 Thread Jiri Pirko
Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com wrote:
> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
> counters do
> not include CPU generated packetsyou will have to add CPU
> generated pkt counters to the
> hw counters for such virtual device stats.
 Can you please provide and example how that could happen?
>>>
>>>example is the bridge vlan stats I mention below. These are usually counted
>>>by attaching hw virtual counter resources. And CPU generated packets
>>>in some cases maybe setup to bypass the ASIC pipeline because the CPU
>>>has already made the required decisions. So, they may not be counted by
>>>by such hw virtual counters.
>>
>> Bypass ASIC? How do the packets get on the wire?
>>
>
>Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
>ASIC ships the CPU generated packet out of the switch/front-panel
>port. Continuing Roopa's example of vlan netdev stats To get the
>HW stats counters are typically tied to the ingress and egress vlan hw
>entries. All the incoming packets are subject to the ingress vlan
>lookup irrespective of whether they get punted to the CPU or whether
>they are forwarded to another front panel port. In that case the
>ingress HW stats does represent all packets. However for CPU
>originated packets egress vlan lookups are bypassed in the ASIC (this
>is common forwarding option in most ASICs) and the packet shipped as
>is out of front-panel port specified by the CPU. Which means these
>packets will NOT be counted against the egress VLAN HW counter; hence
>the need for summation.

Driver will know about this, and will provide the stats accordignly to
the core. Who else than driver should resolve this.



vmw_vsock sk_ack_backlog double decrement bug

2016-06-23 Thread Stefan Hajnoczi
Hi Jorgen,
virtio-vsock doesn't use vsock_pending_work() but I may have spotted a
problem that affects the VMCI transport.  I'm not sending a patch
because I can't test it.

1. During vsock_accept() listener->sk_ack_backlog is decremented.
2. vsock_pending_work() will decrement listener->sk_ack_backlog again if
   vsk->rejected.

The result is that sk_ack_backlog can be invalid.  It only happens in
the case where the listener socket has an error.  Maybe in practice it's
not a problem because the server application will close the listener
socket if there is an error...

Stefan


signature.asc
Description: PGP signature


[PATCH net-next V2 01/10] net/mlx5: Rate limit tables support

2016-06-23 Thread Saeed Mahameed
From: Yevgeny Petrilin 

Configuring and managing HW rate limit tables.
The HW holds a table of rate limits, each rate is
associated with an index in that table.
Later a Send Queue uses this index to set the rate limit.
Multiple Send Queues can have the same rate limit, which is
represented by a single entry in this table.
Even though a rate can be shared, each queue is being rate
limited independently of others.

The SW shadow of this table holds the rate itself,
the index in the HW table and the refcount (number of queues)
working with this rate.

The exported functions are mlx5_rl_add_rate and mlx5_rl_remove_rate.
Number of different rates and their values are derived
from HW capabilities.

Signed-off-by: Yevgeny Petrilin 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c   |  10 ++
 drivers/net/ethernet/mellanox/mlx5/core/rl.c | 209 +++
 include/linux/mlx5/device.h  |   4 +
 include/linux/mlx5/driver.h  |  27 +++
 6 files changed, 259 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/rl.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9ea7b58..0c8a7dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -1,8 +1,9 @@
 obj-$(CONFIG_MLX5_CORE)+= mlx5_core.o
 
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
-   health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o   \
-   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o 
fs_counters.o
+   health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
+   mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
+   fs_counters.o rl.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 75c7ae6..77fc1aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -151,6 +151,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
 
+   if (MLX5_CAP_GEN(dev, qos)) {
+   err = mlx5_core_get_caps(dev, MLX5_CAP_QOS);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index a19b593..08cae34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1144,6 +1144,13 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
dev_err(&pdev->dev, "Failed to init flow steering\n");
goto err_fs;
}
+
+   err = mlx5_init_rl_table(dev);
+   if (err) {
+   dev_err(&pdev->dev, "Failed to init rate limiting\n");
+   goto err_rl;
+   }
+
 #ifdef CONFIG_MLX5_CORE_EN
err = mlx5_eswitch_init(dev);
if (err) {
@@ -1183,6 +1190,8 @@ err_sriov:
mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
 err_reg_dev:
+   mlx5_cleanup_rl_table(dev);
+err_rl:
mlx5_cleanup_fs(dev);
 err_fs:
mlx5_cleanup_mkey_table(dev);
@@ -1253,6 +1262,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
 
+   mlx5_cleanup_rl_table(dev);
mlx5_cleanup_fs(dev);
mlx5_cleanup_mkey_table(dev);
mlx5_cleanup_srq_table(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
new file mode 100644
index 000..c07c28b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (c) 2013-2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the follow

[PATCH net-next V2 02/10] net/mlx5e: Add TXQ set max rate support

2016-06-23 Thread Saeed Mahameed
From: Yevgeny Petrilin 

Implement set_maxrate ndo.
Use the rate index from the hardware table to attach to channel SQ/TXQ.
In case of failure to configure new rate, the queue remains with
unlimited rate.

We save the configuration on priv structure and apply it each time
Send Queues are being reinitialized (after open/close) operations.

Signed-off-by: Yevgeny Petrilin 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 102 +-
 2 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e8a6c33..017e047 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -88,6 +88,7 @@
 #define MLX5E_LOG_INDIR_RQT_SIZE   0x7
 #define MLX5E_INDIR_RQT_SIZE   BIT(MLX5E_LOG_INDIR_RQT_SIZE)
 #define MLX5E_MAX_NUM_CHANNELS (MLX5E_INDIR_RQT_SIZE >> 1)
+#define MLX5E_MAX_NUM_SQS  (MLX5E_MAX_NUM_CHANNELS * 
MLX5E_MAX_NUM_TC)
 #define MLX5E_TX_CQ_POLL_BUDGET128
 #define MLX5E_UPDATE_STATS_INTERVAL200 /* msecs */
 #define MLX5E_SQ_BF_BUDGET 16
@@ -354,6 +355,7 @@ struct mlx5e_sq {
struct mlx5e_channel  *channel;
inttc;
struct mlx5e_ico_wqe_info *ico_wqe_info;
+   u32rate_limit;
 } cacheline_aligned_in_smp;
 
 static inline bool mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n)
@@ -530,6 +532,7 @@ struct mlx5e_priv {
u32indir_rqtn;
u32indir_tirn[MLX5E_NUM_INDIR_TIRS];
struct mlx5e_direct_tirdirect_tir[MLX5E_MAX_NUM_CHANNELS];
+   u32tx_rates[MLX5E_MAX_NUM_SQS];
 
struct mlx5e_flow_steering fs;
struct mlx5e_vxlan_db  vxlan;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8b7c6f3..e5a2cef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -702,7 +702,8 @@ static int mlx5e_enable_sq(struct mlx5e_sq *sq, struct 
mlx5e_sq_param *param)
return err;
 }
 
-static int mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state, int next_state)
+static int mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state,
+  int next_state, bool update_rl, int rl_index)
 {
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
@@ -722,6 +723,10 @@ static int mlx5e_modify_sq(struct mlx5e_sq *sq, int 
curr_state, int next_state)
 
MLX5_SET(modify_sq_in, in, sq_state, curr_state);
MLX5_SET(sqc, sqc, state, next_state);
+   if (update_rl && next_state == MLX5_SQC_STATE_RDY) {
+   MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);
+   MLX5_SET(sqc,  sqc, packet_pacing_rate_limit_index, rl_index);
+   }
 
err = mlx5_core_modify_sq(mdev, sq->sqn, in, inlen);
 
@@ -737,6 +742,8 @@ static void mlx5e_disable_sq(struct mlx5e_sq *sq)
struct mlx5_core_dev *mdev = priv->mdev;
 
mlx5_core_destroy_sq(mdev, sq->sqn);
+   if (sq->rate_limit)
+   mlx5_rl_remove_rate(mdev, sq->rate_limit);
 }
 
 static int mlx5e_open_sq(struct mlx5e_channel *c,
@@ -754,7 +761,8 @@ static int mlx5e_open_sq(struct mlx5e_channel *c,
if (err)
goto err_destroy_sq;
 
-   err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RST, MLX5_SQC_STATE_RDY);
+   err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RST, MLX5_SQC_STATE_RDY,
+ false, 0);
if (err)
goto err_disable_sq;
 
@@ -793,7 +801,8 @@ static void mlx5e_close_sq(struct mlx5e_sq *sq)
if (mlx5e_sq_has_room_for(sq, 1))
mlx5e_send_nop(sq, true);
 
-   mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR);
+   mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR,
+   false, 0);
}
 
while (sq->cc != sq->pc) /* wait till sq is empty */
@@ -1024,6 +1033,79 @@ static void mlx5e_build_channeltc_to_txq_map(struct 
mlx5e_priv *priv, int ix)
ix + i * priv->params.num_channels;
 }
 
+static int mlx5e_set_sq_maxrate(struct net_device *dev,
+   struct mlx5e_sq *sq, u32 rate)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   u16 rl_index = 0;
+   int err;
+
+   if (rate == sq->rate_limit)
+   /* nothing to do */
+   return 0;
+
+   if (sq->rate_limit)
+   /* remove current rl index to free space to next ones */
+   mlx5_rl_remove_rate(mdev, sq->rate_limit);
+
+   sq->rate_limit = 0;
+
+

[PATCH net-next V2 09/10] net/mlx5e: Use new ethtool get/set link ksettings API

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Use new get/set link ksettings and remove get/set settings legacy
callbacks.
This allows us to use bitmasks longer than 32 bit for supported and
advertised link modes and use modes that were previously not supported.

Signed-off-by: Gal Pressman 
CC: Ben Hutchings 
CC: David Decotigny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 306 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   1 +
 3 files changed, 143 insertions(+), 167 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b8732e6..da885c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -634,6 +634,9 @@ enum mlx5e_link_mode {
 
 #define MLX5E_PROT_MASK(link_mode) (1 << link_mode)
 
+
+void mlx5e_build_ptys2ethtool_map(void);
+
 void mlx5e_send_nop(struct mlx5e_sq *sq, bool notify_hw);
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
   void *accel_priv, select_queue_fallback_t fallback);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index d0d3dcf..4c560e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -48,123 +48,85 @@ static void mlx5e_get_drvinfo(struct net_device *dev,
sizeof(drvinfo->bus_info));
 }
 
-static const struct {
-   u32 supported;
-   u32 advertised;
+struct ptys2ethtool_config {
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(advertised);
u32 speed;
-} ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER] = {
-   [MLX5E_1000BASE_CX_SGMII] = {
-   .supported  = SUPPORTED_1000baseKX_Full,
-   .advertised = ADVERTISED_1000baseKX_Full,
-   .speed  = 1000,
-   },
-   [MLX5E_1000BASE_KX] = {
-   .supported  = SUPPORTED_1000baseKX_Full,
-   .advertised = ADVERTISED_1000baseKX_Full,
-   .speed  = 1000,
-   },
-   [MLX5E_10GBASE_CX4] = {
-   .supported  = SUPPORTED_1baseKX4_Full,
-   .advertised = ADVERTISED_1baseKX4_Full,
-   .speed  = 1,
-   },
-   [MLX5E_10GBASE_KX4] = {
-   .supported  = SUPPORTED_1baseKX4_Full,
-   .advertised = ADVERTISED_1baseKX4_Full,
-   .speed  = 1,
-   },
-   [MLX5E_10GBASE_KR] = {
-   .supported  = SUPPORTED_1baseKR_Full,
-   .advertised = ADVERTISED_1baseKR_Full,
-   .speed  = 1,
-   },
-   [MLX5E_20GBASE_KR2] = {
-   .supported  = SUPPORTED_2baseKR2_Full,
-   .advertised = ADVERTISED_2baseKR2_Full,
-   .speed  = 2,
-   },
-   [MLX5E_40GBASE_CR4] = {
-   .supported  = SUPPORTED_4baseCR4_Full,
-   .advertised = ADVERTISED_4baseCR4_Full,
-   .speed  = 4,
-   },
-   [MLX5E_40GBASE_KR4] = {
-   .supported  = SUPPORTED_4baseKR4_Full,
-   .advertised = ADVERTISED_4baseKR4_Full,
-   .speed  = 4,
-   },
-   [MLX5E_56GBASE_R4] = {
-   .supported  = SUPPORTED_56000baseKR4_Full,
-   .advertised = ADVERTISED_56000baseKR4_Full,
-   .speed  = 56000,
-   },
-   [MLX5E_10GBASE_CR] = {
-   .supported  = SUPPORTED_1baseKR_Full,
-   .advertised = ADVERTISED_1baseKR_Full,
-   .speed  = 1,
-   },
-   [MLX5E_10GBASE_SR] = {
-   .supported  = SUPPORTED_1baseKR_Full,
-   .advertised = ADVERTISED_1baseKR_Full,
-   .speed  = 1,
-   },
-   [MLX5E_10GBASE_ER] = {
-   .supported  = SUPPORTED_1baseKR_Full,
-   .advertised = ADVERTISED_1baseKR_Full,
-   .speed  = 1,
-   },
-   [MLX5E_40GBASE_SR4] = {
-   .supported  = SUPPORTED_4baseSR4_Full,
-   .advertised = ADVERTISED_4baseSR4_Full,
-   .speed  = 4,
-   },
-   [MLX5E_40GBASE_LR4] = {
-   .supported  = SUPPORTED_4baseLR4_Full,
-   .advertised = ADVERTISED_4baseLR4_Full,
-   .speed  = 4,
-   },
-   [MLX5E_100GBASE_CR4] = {
-   .speed  = 10,
-   },
-   [MLX5E_100GBASE_SR4] = {
-   .speed  = 10,
-   },
-   [MLX5E_100GBASE_KR4] = {
-   .speed  = 10,
-   },
-   [MLX5E_100GBASE_LR4] = {
-   .speed  = 10,
-   },
-   [MLX5E_100BASE_TX]   = {
-   

[PATCH net-next V2 08/10] net/mlx5e: Add missing 50G baseSR2 link mode

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Add MLX5E_50GBASE_SR2 as ETHTOOL_LINK_MODE_5baseSR2_Full_BIT.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
Cc: Ben Hutchings 
Cc: David Decotigny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index aa36a3a..b8732e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -616,6 +616,7 @@ enum mlx5e_link_mode {
MLX5E_10GBASE_ER = 14,
MLX5E_40GBASE_SR4= 15,
MLX5E_40GBASE_LR4= 16,
+   MLX5E_50GBASE_SR2= 18,
MLX5E_100GBASE_CR4   = 20,
MLX5E_100GBASE_SR4   = 21,
MLX5E_100GBASE_KR4   = 22,
-- 
2.8.0



[PATCH net-next V2 03/10] net/mlx5e: Introduce net device priv flags infrastructure

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Introduce an infrastructure for getting/setting private net device
flags.

Currently a 'nop' priv flag is added, following patches will override
the flag will actual feature specific flags.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 17 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 59 ++
 2 files changed, 76 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 017e047..02fa4da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -144,6 +144,22 @@ struct mlx5e_umr_wqe {
struct mlx5_wqe_data_seg   data;
 };
 
+static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
+   "nop",
+};
+
+enum mlx5e_priv_flag {
+   MLX5E_PFLAG_NOP = (1 << 0),
+};
+
+#define MLX5E_SET_PRIV_FLAG(priv, pflag, enable)\
+   do {\
+   if (enable) \
+   priv->pflags |= pflag;  \
+   else\
+   priv->pflags &= ~pflag; \
+   } while (0)
+
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #define MLX5E_MIN_BW_ALLOC 1   /* Min percentage of BW allocation */
@@ -543,6 +559,7 @@ struct mlx5e_priv {
struct work_struct set_rx_mode_work;
struct delayed_workupdate_stats_work;
 
+   u32pflags;
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index fc7dcc0..f8bbc2b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -198,6 +198,8 @@ static int mlx5e_get_sset_count(struct net_device *dev, int 
sset)
   MLX5E_NUM_RQ_STATS(priv) +
   MLX5E_NUM_SQ_STATS(priv) +
   MLX5E_NUM_PFC_COUNTERS(priv);
+   case ETH_SS_PRIV_FLAGS:
+   return ARRAY_SIZE(mlx5e_priv_flags);
/* fallthrough */
default:
return -EOPNOTSUPP;
@@ -272,9 +274,12 @@ static void mlx5e_get_strings(struct net_device *dev,
  uint32_t stringset, uint8_t *data)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
+   int i;
 
switch (stringset) {
case ETH_SS_PRIV_FLAGS:
+   for (i = 0; i < ARRAY_SIZE(mlx5e_priv_flags); i++)
+   strcpy(data + i * ETH_GSTRING_LEN, mlx5e_priv_flags[i]);
break;
 
case ETH_SS_TEST:
@@ -1272,6 +1277,58 @@ static int mlx5e_get_module_eeprom(struct net_device 
*netdev,
return 0;
 }
 
+typedef int (*mlx5e_pflag_handler)(struct net_device *netdev, bool enable);
+
+static int set_pflag_nop(struct net_device *netdev, bool enable)
+{
+   return 0;
+}
+
+static int mlx5e_handle_pflag(struct net_device *netdev,
+ u32 wanted_flags,
+ enum mlx5e_priv_flag flag,
+ mlx5e_pflag_handler pflag_handler)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   bool enable = !!(wanted_flags & flag);
+   u32 changes = wanted_flags ^ priv->pflags;
+   int err;
+
+   if (!(changes & flag))
+   return 0;
+
+   err = pflag_handler(netdev, enable);
+   if (err) {
+   netdev_err(netdev, "%s private flag 0x%x failed err %d\n",
+  enable ? "Enable" : "Disable", flag, err);
+   return err;
+   }
+
+   MLX5E_SET_PRIV_FLAG(priv, flag, enable);
+   return 0;
+}
+
+static int mlx5e_set_priv_flags(struct net_device *netdev, u32 pflags)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   int err;
+
+   mutex_lock(&priv->state_lock);
+
+   err = mlx5e_handle_pflag(netdev, pflags, MLX5E_PFLAG_NOP,
+set_pflag_nop);
+
+   mutex_unlock(&priv->state_lock);
+   return err ? -EINVAL : 0;
+}
+
+static u32 mlx5e_get_priv_flags(struct net_device *netdev)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+
+   return priv->pflags;
+}
+
 const struct ethtool_ops mlx5e_ethtool_ops = {
.get_drvinfo   = mlx5e_get_drvinfo,
.get_link  = ethtool_op_get_link,
@@ -1301,4 +1358,6 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.set_wol   = mlx5e_set_wol,
.get_module_info   = mlx5e_get_module_info,
.get_module_eeprom = mlx5e_get_module_eeprom,
+   .get_priv_flags= mlx5e_get_priv_flags,
+   .set_priv_flags= mlx5e_set_priv_flags
 };
-- 
2.

[PATCH net-next V2 06/10] net/mlx5e: Toggle link only after modifying port parameters

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Add a dedicated function to toggle port link. It should be called only
after setting a port register.
Toggle will set port link to down and bring it back up in case that it's
admin status was up.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c   |  9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  7 +--
 drivers/net/ethernet/mellanox/mlx5/core/port.c   | 12 
 include/linux/mlx5/port.h|  1 +
 4 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index b2db180..e688313 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -191,7 +191,6 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 {
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;
-   enum mlx5_port_status ps;
u8 curr_pfc_en;
int ret;
 
@@ -200,14 +199,8 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
if (pfc->pfc_en == curr_pfc_en)
return 0;
 
-   mlx5_query_port_admin_status(mdev, &ps);
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
-
ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
-
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_UP);
+   mlx5_toggle_port_link(mdev);
 
return ret;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c4be394..d0d3dcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -795,7 +795,6 @@ static int mlx5e_set_settings(struct net_device *netdev,
u32 link_modes;
u32 speed;
u32 eth_proto_cap, eth_proto_admin;
-   enum mlx5_port_status ps;
int err;
 
speed = ethtool_cmd_speed(cmd);
@@ -829,12 +828,8 @@ static int mlx5e_set_settings(struct net_device *netdev,
if (link_modes == eth_proto_admin)
goto out;
 
-   mlx5_query_port_admin_status(mdev, &ps);
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
mlx5_set_port_proto(mdev, link_modes, MLX5_PTYS_EN);
-   if (ps == MLX5_PORT_UP)
-   mlx5_set_port_admin_status(mdev, MLX5_PORT_UP);
+   mlx5_toggle_port_link(mdev);
 
 out:
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 3e35611..1562e73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -222,6 +222,18 @@ int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 
proto_admin,
 }
 EXPORT_SYMBOL_GPL(mlx5_set_port_proto);
 
+/* This function should be used after setting a port register only */
+void mlx5_toggle_port_link(struct mlx5_core_dev *dev)
+{
+   enum mlx5_port_status ps;
+
+   mlx5_query_port_admin_status(dev, &ps);
+   mlx5_set_port_admin_status(dev, MLX5_PORT_DOWN);
+   if (ps == MLX5_PORT_UP)
+   mlx5_set_port_admin_status(dev, MLX5_PORT_UP);
+}
+EXPORT_SYMBOL_GPL(mlx5_toggle_port_link);
+
 int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
   enum mlx5_port_status status)
 {
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 9851862..4adfac1 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -67,6 +67,7 @@ int mlx5_query_port_proto_oper(struct mlx5_core_dev *dev,
   u8 local_port);
 int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 proto_admin,
int proto_mask);
+void mlx5_toggle_port_link(struct mlx5_core_dev *dev);
 int mlx5_set_port_admin_status(struct mlx5_core_dev *dev,
   enum mlx5_port_status status);
 int mlx5_query_port_admin_status(struct mlx5_core_dev *dev,
-- 
2.8.0



[PATCH net-next V2 07/10] ethtool: Add 50G baseSR2 link mode

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Add ETHTOOL_LINK_MODE_5baseSR2_Full_BIT bit.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
Cc: Ben Hutchings 
Cc: David Decotigny 
---
 include/uapi/linux/ethtool.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 5f030b4..b8f38e8 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1362,6 +1362,7 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_10baseSR4_Full_BIT= 37,
ETHTOOL_LINK_MODE_10baseCR4_Full_BIT= 38,
ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT= 39,
+   ETHTOOL_LINK_MODE_5baseSR2_Full_BIT = 40,
 
/* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit
 * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_*
@@ -1370,7 +1371,7 @@ enum ethtool_link_mode_bit_indices {
 */
 
__ETHTOOL_LINK_MODE_LAST
- = ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT,
+ = ETHTOOL_LINK_MODE_5baseSR2_Full_BIT,
 };
 
 #define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name) \
-- 
2.8.0



[PATCH net-next V2 10/10] net/mlx5e: Report correct auto negotiation and allow toggling

2016-06-23 Thread Saeed Mahameed
From: Gal Pressman 

Previous to this patch auto negotiation was reported off although it was
on by default in hardware. This patch reports the correct information to
ethtool and allows the user to toggle it on/off.

Added another parameter to set port proto function in order to pass
the auto negotiation field to the hardware.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 42 ++
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 36 ---
 include/linux/mlx5/port.h  | 15 ++--
 3 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 4c560e0..39a4d96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -702,6 +702,8 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
u32 eth_proto_admin;
u32 eth_proto_lp;
u32 eth_proto_oper;
+   u8 an_disable_admin;
+   u8 an_status;
int err;
 
err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
@@ -712,10 +714,12 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
goto err_query_ptys;
}
 
-   eth_proto_cap   = MLX5_GET(ptys_reg, out, eth_proto_capability);
-   eth_proto_admin = MLX5_GET(ptys_reg, out, eth_proto_admin);
-   eth_proto_oper  = MLX5_GET(ptys_reg, out, eth_proto_oper);
-   eth_proto_lp= MLX5_GET(ptys_reg, out, eth_proto_lp_advertise);
+   eth_proto_cap= MLX5_GET(ptys_reg, out, eth_proto_capability);
+   eth_proto_admin  = MLX5_GET(ptys_reg, out, eth_proto_admin);
+   eth_proto_oper   = MLX5_GET(ptys_reg, out, eth_proto_oper);
+   eth_proto_lp = MLX5_GET(ptys_reg, out, eth_proto_lp_advertise);
+   an_disable_admin = MLX5_GET(ptys_reg, out, an_disable_admin);
+   an_status= MLX5_GET(ptys_reg, out, an_status);
 
ethtool_link_ksettings_zero_link_mode(link_ksettings, supported);
ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
@@ -729,6 +733,18 @@ static int mlx5e_get_link_ksettings(struct net_device 
*netdev,
link_ksettings->base.port = get_connector_port(eth_proto_oper);
get_lp_advertising(eth_proto_lp, link_ksettings);
 
+   if (an_status == MLX5_AN_COMPLETE)
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+lp_advertising, Autoneg);
+
+   link_ksettings->base.autoneg = an_disable_admin ? AUTONEG_DISABLE :
+ AUTONEG_ENABLE;
+   ethtool_link_ksettings_add_link_mode(link_ksettings, supported,
+Autoneg);
+   if (!an_disable_admin)
+   ethtool_link_ksettings_add_link_mode(link_ksettings,
+advertising, Autoneg);
+
 err_query_ptys:
return err;
 }
@@ -764,9 +780,14 @@ static int mlx5e_set_link_ksettings(struct net_device 
*netdev,
 {
struct mlx5e_priv *priv= netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
+   u32 eth_proto_cap, eth_proto_admin;
+   bool an_changes = false;
+   u8 an_disable_admin;
+   u8 an_disable_cap;
+   bool an_disable;
u32 link_modes;
+   u8 an_status;
u32 speed;
-   u32 eth_proto_cap, eth_proto_admin;
int err;
 
speed = link_ksettings->base.speed;
@@ -797,10 +818,17 @@ static int mlx5e_set_link_ksettings(struct net_device 
*netdev,
goto out;
}
 
-   if (link_modes == eth_proto_admin)
+   mlx5_query_port_autoneg(mdev, MLX5_PTYS_EN, &an_status,
+   &an_disable_cap, &an_disable_admin);
+
+   an_disable = link_ksettings->base.autoneg == AUTONEG_DISABLE;
+   an_changes = ((!an_disable && an_disable_admin) ||
+ (an_disable && !an_disable_admin));
+
+   if (!an_changes && link_modes == eth_proto_admin)
goto out;
 
-   mlx5_set_port_proto(mdev, link_modes, MLX5_PTYS_EN);
+   mlx5_set_port_ptys(mdev, an_disable, link_modes, MLX5_PTYS_EN);
mlx5_toggle_port_link(mdev);
 
 out:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 1562e73..752c081 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -202,15 +202,24 @@ int mlx5_query_port_proto_oper(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_proto_oper);
 
-int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 proto_admin,
-   int proto_mask)
+int mlx5_set_port_ptys(struct mlx5_core_dev *dev, bool an_disable,
+ 

[PATCH net-next V2 04/10] net/mlx5e: CQE based moderation

2016-06-23 Thread Saeed Mahameed
From: Tariq Toukan 

In this mode the moderation timer will restart upon
new completion (CQE) generation rather than upon interrupt
generation.

The outcome is that for bursty traffic the period timer will never
expire and thus only the moderation frames counter will dictate
interrupt generation, thus the interrupt rate will be relative
to the incoming packets size.
If the burst seizes for "moderation period" time then an interrupt
will be issued immediately.

CQE based moderation is off by default and can be controlled
via ethtool set_priv_flags.

Performance tested on ConnectX4-Lx 50G.

Less packet loss in netperf UDP and TCP tests, with no bw degradation,
for both single and multi streams, with message sizes of
64, 1024, 1472 and 32768 byte.

Signed-off-by: Tariq Toukan 
Signed-off-by: Achiad Shochat 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Gal Pressman 
Signed-off-by: Gil Rockah 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 20 +---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 54 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 54 --
 3 files changed, 95 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 02fa4da..36f625d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -79,6 +79,7 @@
 
 #define MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ (64 * 1024)
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC  0x10
+#define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC  0x10
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS  0x20
@@ -145,11 +146,11 @@ struct mlx5e_umr_wqe {
 };
 
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
-   "nop",
+   "rx_cqe_moder",
 };
 
 enum mlx5e_priv_flag {
-   MLX5E_PFLAG_NOP = (1 << 0),
+   MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
 };
 
 #define MLX5E_SET_PRIV_FLAG(priv, pflag, enable)\
@@ -165,6 +166,11 @@ enum mlx5e_priv_flag {
 #define MLX5E_MIN_BW_ALLOC 1   /* Min percentage of BW allocation */
 #endif
 
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+};
+
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -173,12 +179,11 @@ struct mlx5e_params {
u8  log_rq_size;
u16 num_channels;
u8  num_tc;
+   u8  rx_cq_period_mode;
bool rx_cqe_compress_admin;
bool rx_cqe_compress;
-   u16 rx_cq_moderation_usec;
-   u16 rx_cq_moderation_pkts;
-   u16 tx_cq_moderation_usec;
-   u16 tx_cq_moderation_pkts;
+   struct mlx5e_cq_moder rx_cq_moderation;
+   struct mlx5e_cq_moder tx_cq_moderation;
u16 min_rx_wqes;
bool lro_en;
u32 lro_wqe_sz;
@@ -667,6 +672,9 @@ void mlx5e_build_default_indir_rqt(struct mlx5_core_dev 
*mdev,
   int num_channels);
 int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 
+void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
+u8 cq_period_mode);
+
 static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq,
  struct mlx5_wqe_ctrl_seg *ctrl, int bf_sz)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index f8bbc2b..4f433d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -524,10 +524,10 @@ static int mlx5e_get_coalesce(struct net_device *netdev,
if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
return -ENOTSUPP;
 
-   coal->rx_coalesce_usecs   = priv->params.rx_cq_moderation_usec;
-   coal->rx_max_coalesced_frames = priv->params.rx_cq_moderation_pkts;
-   coal->tx_coalesce_usecs   = priv->params.tx_cq_moderation_usec;
-   coal->tx_max_coalesced_frames = priv->params.tx_cq_moderation_pkts;
+   coal->rx_coalesce_usecs   = priv->params.rx_cq_moderation.usec;
+   coal->rx_max_coalesced_frames = priv->params.rx_cq_moderation.pkts;
+   coal->tx_coalesce_usecs   = priv->params.tx_cq_moderation.usec;
+   coal->tx_max_coalesced_frames = priv->params.tx_cq_moderation.pkts;
 
return 0;
 }
@@ -545,10 +545,11 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
return -ENOTSUPP;
 
mutex_lock(&priv->state_lock);
-   priv->params.tx_cq_moderation_usec = coal->tx_coalesce_usecs;
-   priv->params.tx_cq_moderation_pkts = coal->tx_max_coalesced_frames;
-   priv->params.rx_cq_moderation_usec = coal->rx_coalesce_usecs;
-   priv->params.rx_cq_moderation_pkts = coal->rx_max_coalesced_frames;
+
+   priv->params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
+

Re: esp: Fix ESN generation under UDP encapsulation

2016-06-23 Thread David Miller
From: Steffen Klassert 
Date: Thu, 23 Jun 2016 12:40:07 +0200

> On Thu, Jun 23, 2016 at 04:25:21AM +, Blair Steven wrote:
>> This change tests okay in my setup.
>> 
>> Thanks very much
>> -Blair
> 
> David, can you please take this patch directly in the net tree?
> This is a candidate for stable.
> 
> Acked-by: Steffen Klassert 

Applied, thanks everyone.

Does the ipv6 side need the same fix?


[PATCH net-next V2 00/10] Mellanox 100G mlx5e Ethernet extensions

2016-06-23 Thread Saeed Mahameed
Hi Dave,

This series includes multiple features extensions for mlx5 Ethernet netdevice 
driver.
Namely, TX Rate limiting, RX interrupt moderation, ethtool settings.

TX Rate limiting:
- ConnectX-4 rate limiting infrastructure
- Set max rate NDO support

RX interrupt moderation:
- CQE based coalescing option (controlled via priv flags)
- Adaptive RX coalescing

ethtool settings:
- priv flags callbacks
- Support new ksettings API
- Add 50G missing link mode
- Support auto negotiation on/off

Changes since V1:
- Split ("net/mlx5e: Add 50G missing link mode to ethtool and mlx5 
driver")

Thanks,
Saeed.

Gal Pressman (6):
  net/mlx5e: Introduce net device priv flags infrastructure
  net/mlx5e: Toggle link only after modifying port parameters
  ethtool: Add 50G baseSR2 link mode
  net/mlx5e: Add missing 50G baseSR2 link mode
  net/mlx5e: Use new ethtool get/set link ksettings API
  net/mlx5e: Report correct auto negotiation and allow toggling

Gil Rockah (1):
  net/mlx5e: Support adaptive RX coalescing

Tariq Toukan (1):
  net/mlx5e: CQE based moderation

Yevgeny Petrilin (2):
  net/mlx5: Rate limit tables support
  net/mlx5e: Add TXQ set max rate support

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  73 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 476 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 181 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 335 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  48 ++-
 drivers/net/ethernet/mellanox/mlx5/core/rl.c   | 209 +
 include/linux/mlx5/device.h|   4 +
 include/linux/mlx5/driver.h|  27 ++
 include/linux/mlx5/port.h  |  16 +-
 include/uapi/linux/ethtool.h   |   3 +-
 15 files changed, 1179 insertions(+), 231 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/rl.c

-- 
2.8.0



[PATCH net-next V2 05/10] net/mlx5e: Support adaptive RX coalescing

2016-06-23 Thread Saeed Mahameed
From: Gil Rockah 

Striving for high message rate and low interrupt rate.

Usage:
ethtool -C  adaptive-rx on/off

Signed-off-by: Gil Rockah 
Signed-off-by: Achiad Shochat 
Signed-off-by: Saeed Mahameed 
CC: Arnd Bergmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  33 ++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  30 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 335 +
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +
 6 files changed, 416 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 0c8a7dc..c4f450f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -7,6 +7,7 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
-   en_txrx.o en_clock.o vxlan.o en_tc.o en_arfs.o
+   en_rx_am.o en_txrx.o en_clock.o vxlan.o en_tc.o \
+   en_arfs.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 36f625d..aa36a3a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -195,6 +195,7 @@ struct mlx5e_params {
 #ifdef CONFIG_MLX5_CORE_EN_DCB
struct ieee_ets ets;
 #endif
+   bool rx_am_enabled;
 };
 
 struct mlx5e_tstamp {
@@ -213,6 +214,7 @@ struct mlx5e_tstamp {
 enum {
MLX5E_RQ_STATE_POST_WQES_ENABLE,
MLX5E_RQ_STATE_UMR_WQE_IN_PROGRESS,
+   MLX5E_RQ_STATE_AM,
 };
 
 struct mlx5e_cq {
@@ -220,6 +222,7 @@ struct mlx5e_cq {
struct mlx5_cqwq   wq;
 
/* data path - accessed per napi poll */
+   u16event_ctr;
struct napi_struct*napi;
struct mlx5_core_cqmcq;
struct mlx5e_channel  *channel;
@@ -247,6 +250,30 @@ struct mlx5e_dma_info {
dma_addr_t  addr;
 };
 
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   unsigned intpkt_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  work;
+   u8  profile_ix;
+   u8  mode;
+   u8  tune_state;
+   u8  steps_right;
+   u8  steps_left;
+   u8  tired;
+};
+
 struct mlx5e_rq {
/* data path */
struct mlx5_wq_ll  wq;
@@ -267,6 +294,8 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
+   struct mlx5e_rx_am am; /* Adaptive Moderation */
+
/* control */
struct mlx5_wq_ctrlwq_ctrl;
u8 wq_type;
@@ -637,6 +666,10 @@ void mlx5e_free_rx_fragmented_mpwqe(struct mlx5e_rq *rq,
struct mlx5e_mpw_info *wi);
 struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq);
 
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 void mlx5e_update_stats(struct mlx5e_priv *priv);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 4f433d3..c4be394 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -528,6 +528,7 @@ static int mlx5e_get_coalesce(struct net_device *netdev,
coal->rx_max_coalesced_frames = priv->params.rx_cq_moderation.pkts;
coal->tx_coalesce_usecs   = priv->params.tx_cq_moderation.usec;
coal->tx_max_coalesced_frames = priv->params.tx_cq_moderation.pkts;
+   coal->use_adaptive_rx_coalesce = priv->params.rx_am_enabled;
 
return 0;
 }
@@ -538,6 +539,10 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
struct mlx5e_priv *priv= netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_channel *c;
+   bool re

Re: [PATCH] mpls: Add missing RCU-bh read side critical section locking in output path

2016-06-23 Thread David Miller
From: Lennert Buytenhek 
Date: Mon, 20 Jun 2016 21:05:27 +0300

> From: David Barroso 
> 
> When locally originated IP traffic hits a route that says to push
> MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output()
> -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the
> last function in this chain accesses a RCU-bh protected struct
> neigh_table pointer without us ever having declared an RCU-bh read
> side critical section.
> 
> As in case of locally originated IP traffic we'll be running in process
> context, with softirqs enabled, we can be preempted by a softirq at any
> time, and RCU-bh considers the completion of a softirq as signaling
> the end of any pending read-side critical sections, so if we do get a
> softirq here, we can end up with an unexpected RCU grace period and
> all the nastiness that that comes with.
> 
> This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the
> code that expects to be treated as an RCU-bh read side critical section.
> 
> Signed-off-by: David Barroso 
> Signed-off-by: Lennert Buytenhek 

Whilst the case that was used to discover this problem was MPLS, that
is not the subsystem where the bug exists and is being fixed.

Therefore please fix your Subject line.

Thanks.


Re: [Patch net 0/2] net_sched: bug fixes for ife action

2016-06-23 Thread David Miller
From: Cong Wang 
Date: Mon, 20 Jun 2016 13:37:17 -0700

> Cong Wang (2):
>   act_ife: only acquire tcf_lock for existing actions
>   act_ife: acquire ife_mod_lock before reading ifeoplist

Series applied, thanks.


Re: [PATCH net-next 0/4] net_sched: bulk dequeue and deferred drops

2016-06-23 Thread Luigi Rizzo
On Wed, Jun 22, 2016 at 6:49 PM, Eric Dumazet  wrote:
> On Wed, 2016-06-22 at 17:44 +0200, Jesper Dangaard Brouer wrote:
>> On Wed, 22 Jun 2016 07:55:43 -0700
>> Eric Dumazet  wrote:
>>
>> > On Wed, 2016-06-22 at 16:47 +0200, Jesper Dangaard Brouer wrote:
>> > > On Tue, 21 Jun 2016 23:16:48 -0700
>> > > Eric Dumazet  wrote:
>> > >
>> > > > First patch adds an additional parameter to ->enqueue() qdisc method
>> > > > so that drops can be done outside of critical section
>> > > > (after locks are released).
>> > > >
>> > > > Then fq_codel can have a small optimization to reduce number of cache
>> > > > lines misses during a drop event
>> > > > (possibly accumulating hundreds of packets to be freed).
>> > > >
>> > > > A small htb change exports the backlog in class dumps.
>> > > >
>> > > > Final patch adds bulk dequeue to qdiscs that were lacking this feature.
>> > > >
>> > > > This series brings a nice qdisc performance increase (more than 80 %
>> > > > in some cases).
>> > >
>> > > Thanks for working on this Eric! this is great work! :-)
>> >
>> > Thanks Jesper
>> >
>> > I worked yesterday on bulk enqueues, but initial results are not that
>> > great.
>>
>> Hi Eric,
>>
>> This is interesting work! But I think you should read Luigi Rizzo's
>> (Cc'ed) paper on title "A Fast and Practical Software Packet Scheduling
>> Architecture"[1]
>>
>> [1] http://info.iet.unipi.it/~luigi/papers/20160511-mysched-preprint.pdf
>>
>> Luigi will be at Netfilter Workshop next week, and will actually
>> present on topic/paper you two should talk ;-)
>>
>> The article is not a 100% match for what we need, but there is some
>> good ideas.  The article also have a sort of "prequeue" that
>> enqueue'ing CPUs will place packets into.
>>
>> My understanding of the article:
>>
>> 1. transmitters submit packets to an intermediate queue
>>(replace q->enqueue call) lockless submit as queue per CPU
>>(runs in parallel)
>>
>> 2. like we only have _one_ qdisc dequeue process, this process (called
>>arbiter) empty the intermediate queues, and then invoke q->enqueue()
>>and q->dequeue(). (in a locked session/region)
>>
>> 3. Packets returned from q->dequeue() is placed on an outgoing
>>intermediate queue.
>>
>> 4. the transmitter then looks to see there are any packets to drain()
>>from the outgoing queue.  This can run in parallel.
>>
>> If the transmitter submitting a packet, detect no arbiter is running,
>> it can become the arbiter itself.  Like we do with qdisc_run_begin()
>> setting state __QDISC___STATE_RUNNING.
>>
>> The problem with this scheme is push-back from qdisc->enqueue
>> (NET_XMIT_CN) does not "reach" us.  And push-back in-form of processes
>> blocking on qdisc root lock, but that could be handled by either
>> blocking in article's submit() or returning some congestion return code
>> from submit().
>
> Okay, I see that you prepare upcoming conference in Amsterdam,
> but please keep this thread about existing kernel code, not the one that
> eventually reach a new operating system in 5 years ;)
>
> 1) We _want_ the result of the sends, obviously.
>
> 2) We also want back pressure, without adding complex callbacks and
> ref-counting.
>
> 3) We do not want to burn a cpu per TX queue (at least one per NUMA
> node ???) only to send few packets per second,
> Our model is still interrupt based, plus NAPI for interrupt mitigation.
>
> 4) I do not want to lock an innocent cpu to send packets from other
> threads/cpu without a tight control.
>
> In the patch I sent, I basically replaced a locked operation
> (spin_lock(&q->busylock)) with another one (xchg()) , but I did not add
> yet another queue before the qdisc ones, bufferbloat forbids.
>
> The virtual queue here is one packet per cpu, which basically is the
> same than before this patch, since each cpu spinning on busylock has one
> skb to send anyway.
>
> This is basically a simple extension of MCS locks, where the cpu at the
> head of the queue can queue up to 16 packets, instead of queueing its
> own packet only and give queue owner ship to the following cpu.

Hi Eric (and others),

don't worry, my proposal (PSPAT) is not specifically addressing/targeting
the linux qdisc now, but at the same time it does have any of the
faults you are worried about.

My target, at a high level, is a VM hosting node where the guest VMs
may create large amounts of traffic, maybe most of it doomed to be dropped,
but still consuming theirs and system's resources by creating the
packets and pounding on the xmit calls.

The goal of PSPAT is to let those clients know very early (possibly even
before doing lookups or encapsulation) when the underlying path
to the NIC will be essentially free for transmission, at which
point the sender can complete building the packet and push it out.


To comment on your observations, PSPAT has the following features:

1) it does return the result of the send, which is run by the individual
   thread who submitted the packet when it gets gran

Re: [PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-23 Thread Oliver Hartkopp

On 06/23/2016 03:09 PM, Sergei Shtylyov wrote:


+static void can_dellink(struct net_device *dev, struct list_head
*head)
+{
+return;


   Why?



http://marc.info/?l=linux-can&m=146651600421205&w=2

The same reason as for commit 993e6f2fd.


   I was asking just about the useless *return* statement...



Ah!

I did some investigation before whether using 'return' in empty void 
functions or not.


static void can_dellink(struct net_device *dev, struct list_head *head);

and

static void can_dellink(struct net_device *dev, struct list_head *head)
{
return;
}

do the same job, right?

But the first one looks like a forward declaration and you would try to 
find the 'implementing' function then.


Of course you can write less code and both implementations are correct - 
but this representation makes it pretty clear that here's nothing to do :-)


Regards,
Oliver




Re: [PATCH net-next V2 07/10] ethtool: Add 50G baseSR2 link mode

2016-06-23 Thread David Decotigny
On Thu, Jun 23, 2016 at 7:02 AM, Saeed Mahameed  wrote:
> From: Gal Pressman 
>
> Add ETHTOOL_LINK_MODE_5baseSR2_Full_BIT bit.
>
> Signed-off-by: Gal Pressman 
> Signed-off-by: Saeed Mahameed 
> Cc: Ben Hutchings 
> Cc: David Decotigny 
> ---
>  include/uapi/linux/ethtool.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 5f030b4..b8f38e8 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -1362,6 +1362,7 @@ enum ethtool_link_mode_bit_indices {
> ETHTOOL_LINK_MODE_10baseSR4_Full_BIT= 37,
> ETHTOOL_LINK_MODE_10baseCR4_Full_BIT= 38,
> ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT= 39,
> +   ETHTOOL_LINK_MODE_5baseSR2_Full_BIT = 40,
>
> /* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit
>  * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_*
> @@ -1370,7 +1371,7 @@ enum ethtool_link_mode_bit_indices {
>  */
>
> __ETHTOOL_LINK_MODE_LAST
> - = ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT,
> + = ETHTOOL_LINK_MODE_5baseSR2_Full_BIT,
>  };
>
>  #define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name) \
> --
> 2.8.0
>

Acked-By: David Decotigny 


Re: [PATCH net-next v2 3/4] cgroup: bpf: Add bpf_skb_in_cgroup_proto

2016-06-23 Thread Martin KaFai Lau
On Thu, Jun 23, 2016 at 11:53:50AM +0200, Daniel Borkmann wrote:
> >diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >index 668e079..68753e0 100644
> >--- a/kernel/bpf/verifier.c
> >+++ b/kernel/bpf/verifier.c
> >@@ -1062,6 +1062,10 @@ static int check_map_func_compatibility(struct 
> >bpf_map *map, int func_id)
> > if (func_id != BPF_FUNC_get_stackid)
> > goto error;
> > break;
> >+case BPF_MAP_TYPE_CGROUP_ARRAY:
> >+if (func_id != BPF_FUNC_skb_in_cgroup)
> >+goto error;
> >+break;
>
> I think the BPF_MAP_TYPE_CGROUP_ARRAY case should have been fist here in
> patch 2/4, but with unconditional goto error. And this one only adds the
> 'func_id != BPF_FUNC_skb_in_cgroup' test.
I am not sure I understand.  Can you elaborate? I am probably missing
something here.

>
> > default:
> > break;
> > }
> >@@ -1081,6 +1085,10 @@ static int check_map_func_compatibility(struct 
> >bpf_map *map, int func_id)
> > if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
> > goto error;
> > break;
> >+case BPF_FUNC_skb_in_cgroup:
> >+if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY)
> >+goto error;
> >+break;
> > default:
> > break;
> > }


Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support

2016-06-23 Thread Alexander Duyck
On Wed, Jun 22, 2016 at 9:17 PM, Yuval Mintz  wrote:
>> Then again, if you're basically saying every HW-assisted offload on
>> receive should be done under LRO flag, what would be the use case
>> where a GRO-assisted offload would help?
>
>> I.e., afaik LRO is superior to GRO in `brute force' -
>> it creates better packed packets and utilizes memory better
>> [with all the obvious cons such as inability for defragmentation].
>> So if you'd have the choice of having an adpater perform 'classic'
>> LRO aggregation or something that resembles a GRO packet,
>> what would be the gain from doing the latter?

LRO and GRO shouldn't really differ in packing or anything like that.
The big difference between the two is that LRO is destructive while
GRO is not.  Specifically in the case of GRO you should be able to
take the resultant frame, feed it through GSO, and get the original
stream of frames back out.  So you can pack the frames however you
want the only key is that you must capture all the correct offsets and
set the gso_size correct for the flow.

> Just to relate to bnx2x/qede differences in current implementation -
> when this GRO hw-offload was added to bnx2x, it has already
> supported classical LRO, and due to above statement whenever LRO
> was set driver aggregated incoming traffic as classic LRO.
> I agree that in hindsight the lack of distinction between sw/hw GRO
> was hurting us.

In the case of bnx2x it sounds like you have issues that are
significantly hurting the performance versus classic software GRO.  If
that is the case you might want to simply flip the logic for the
module parameter that Rick mentioned and just disable the hardware
assisted GRO unless it is specifically requested.

> qede isn't implementing LRO, so we could easily mark this feature
> under LRO there - but question is, given that the adapter can support
> LRO, if we're going to suffer from all the shotrages that arise from
> putting this feature under LRO, why should we bother?

The idea is to address feature isolation.  The fact is the hardware
exists outside of kernel control.  If you end up linking an internal
kernel feature to your device like this you are essentially stripping
the option of using the kernel feature.

I would prefer to see us extend LRO to support "close enough GRO"
instead of have us extend GRO to also include LRO.  That way when we
encounter issues like the FW limitation that Rick encountered he can
just go in and disable the LRO and have true GRO kick in which would
be significantly better than having to poke around through
documentation to find a module parameter that can force the feature
off.  Really the fact that you have to use a module parameter is
frowned upon as well as most drivers aren't supposed to be using those
in the netdev tree.

> You can argue that we might need a new feature bit for control
> over such a feature; If we don't do that, is there any gain in all of this?

I would argue that yes there are many cases where we will be able to
show gain.  The fact is there is a strong likelihood of the GRO on
your parts having some differences either now, or at some point in the
future as the code evolves.  As I mentioned there was already some
talk about possibly needing to push the UDP tunnel aggregation out of
GRO and perhaps handling it sometime after IP look up had verified
that the destination was in fact a local address in the namespace.  In
addition it makes the changes to include the tunnel encapsulation much
more acceptable as LRO is already naturally dropped in the routing and
bridging cases if I recall correctly.

- Alex


[4.6] kernel BUG at net/ipv6/raw.c:592

2016-06-23 Thread Dave Jones

Found this logs after a Trinity run.

kernel BUG at net/ipv6/raw.c:592!
[ cut here ]
invalid opcode:  [#1] SMP 

Modules linked in: udp_diag dccp_ipv6 dccp_ipv4 dccp sctp af_key tcp_diag 
inet_diag ip6table_filter xt_NFLOG nfnetlink_log xt_comment xt_statistic 
iptable_filter nfsv3 nfs_acl nfs fscache lockd grace autofs4 i2c_piix4 
rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc loop dummy ipmi_devintf 
iTCO_wdt iTCO_vendor_support acpi_cpufreq efivars ipmi_si ipmi_msghandler 
i2c_i801 i2c_core sg lpc_ich mfd_core button

CPU: 2 PID: 28854 Comm: trinity-c23 Not tainted 4.6.0 #1
Hardware name: Quanta Leopard-DDR3/Leopard-DDR3, BIOS F06_3A14.DDR3 05/13/2015
task: 880459cab600 ti: 880747bc4000 task.ti: 880747bc4000
RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:880747bc7bf8  EFLAGS: 00010282
RAX: fff2 RBX: 88080c6f2d00 RCX: 0002
RDX: 880747bc7cd8 RSI: 0030 RDI: 8803de801500
RBP: 880747bc7d90 R08: 002d R09: 0009
R10: 8803de801500 R11: 0009 R12: 0030
R13: 8803de801500 R14: 88086d67e000 R15: 88046bdac480
FS:  7fe29c566700() GS:88046fa4() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01f0f2c0 CR3: 00080b99d000 CR4: 001406e0
Stack:
  88086d67e000 880747bc7d18 88046bdac480
 8804  880747bc7c68 88086d67e000
 8808002d 88080009  0001
 
Call Trace:
 [] ? page_fault+0x22/0x30
 [] ? bad_to_user+0x6a/0x6fa
 [] inet_sendmsg+0x67/0xa0
 [] sock_sendmsg+0x38/0x50
 [] sock_write_iter+0x78/0xd0
 [] __vfs_write+0xaa/0xe0
 [] vfs_write+0xa2/0x1a0
 [] SyS_write+0x46/0xa0 
 [] entry_SYSCALL_64_fastpath+0x13/0x8f
Code: 23 f7 ff ff f7 d0 41 01 c0 41 83 d0 00 e9 ac fd ff ff 48 8b 44 24 48 48 
8b 80 c0 01 00 00 65 48 ff 40 28 8b 51 78 d0 41 01 c0 41 83 d0 00 e9 ac fd ff 
ff 48 8b 44 24 48 48 8b 80 c0 01 00 00 65 48 ff 40 28 8b 51 78 e9 64 fe ff ff 
<0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 

RIP [] rawv6_sendmsg+0xc30/0xc40
 RSP 

 590 
 591 offset += skb_transport_offset(skb);
 592 BUG_ON(skb_copy_bits(skb, offset, &csum, 2));
 593 



Re: [PATCH v2 1/2] netfilter/nflog: nflog-range does not truncate packets

2016-06-23 Thread Pablo Neira Ayuso
On Tue, Jun 21, 2016 at 02:58:46PM -0400, Vishwanath Pai wrote:
> netfilter/nflog: nflog-range does not truncate packets
> 
> li->u.ulog.copy_len is currently ignored by the kernel, we should truncate
> the packet to either li->u.ulog.copy_len (if set) or copy_range before
> sending it to userspace. 0 is a valid input for copy_len, so add a new
> flag to indicate whether this was option was specified by the user or not.
> 
> Add two flags to indicate whether nflog-size/copy_len was set or not.
> XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log
> 
> On the userspace side, this was initially represented by the option
> nflog-range, this will be replaced by --nflog-size now. --nflog-range would
> still exist but does not do anything.

Applied, thanks!


[iproute PATCH v3 4/6] No need to initialize rtattr fields before parsing

2016-06-23 Thread Phil Sutter
Since parse_rtattr_flags() calls memset already, there is no need for
callers to do so themselves.

Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c | 2 +-
 tc/tc_class.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 62856f2c26eba..703a56b88d257 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -439,7 +439,7 @@ static void print_num(FILE *fp, unsigned int width, 
uint64_t count)
 
 static void print_vf_stats64(FILE *fp, struct rtattr *vfstats)
 {
-   struct rtattr *vf[IFLA_VF_STATS_MAX + 1] = {};
+   struct rtattr *vf[IFLA_VF_STATS_MAX + 1];
 
if (vfstats->rta_type != IFLA_VF_STATS) {
fprintf(stderr, "BUG: rta type is %d\n", vfstats->rta_type);
diff --git a/tc/tc_class.c b/tc/tc_class.c
index 158b4b18506eb..0d6000b91f539 100644
--- a/tc/tc_class.c
+++ b/tc/tc_class.c
@@ -219,7 +219,7 @@ static void graph_cls_show(FILE *fp, char *buf, struct 
hlist_head *root_list,
 {
struct hlist_node *n, *tmp_cls;
char cls_id_str[256] = {};
-   struct rtattr *tb[TCA_MAX + 1] = {};
+   struct rtattr *tb[TCA_MAX + 1];
struct qdisc_util *q;
char str[100] = {};
 
@@ -304,7 +304,7 @@ int print_class(const struct sockaddr_nl *who,
FILE *fp = (FILE *)arg;
struct tcmsg *t = NLMSG_DATA(n);
int len = n->nlmsg_len;
-   struct rtattr *tb[TCA_MAX + 1] = {};
+   struct rtattr *tb[TCA_MAX + 1];
struct qdisc_util *q;
char abuf[256];
 
-- 
2.8.2



[iproute PATCH v3 0/6] Big C99 style initializer rework

2016-06-23 Thread Phil Sutter
This is v3 of my C99-style initializer related patch series. The changes
since v2 are:

- Flattened embedded struct's initializers:
  Since the field names are very short, I figured it makes more sense to
  keep indenting low. Also, the same style is already used in
  ip/xfrm_policy.c so take that as an example.

- Moved leftover nlmsg_seq initializing into the common place as well:
  I was unsure whether this is a good idea at first (due to the
  increment), but again it's done in ip/xfrm_policy.c as well so should
  be fine.

- Added a comma after the last field initializer as suggested by Jakub.

- Dropped patch 7 since it was NACKed.

- Eliminated checkpatch non-compliance.

- Second go at union bpf_attr in tc/tc_bpf.c:
  I figured that while it is not possible to initialize fields, gcc-3.4.6
  does not complain when setting the whole union to zero using '= {0}'.
  So I did this and thereby at least got rid of the memset calls.

For reference, here's the v2 changelog:

- Rebased onto current upstream master:
  My own commit a0a73b298a579 ("tc: m_action: Use C99 style initializers
  for struct req") contains most of the changes to tc/m_action.c already,
  so I put the remaining ones into a dedicated patch (the first one here)
  with a better description.

- Tested against gcc-3.4.6:
  This is the oldest gcc version I was able to install locally. It indeed
  does not like the former changes in tc/tc_bpf.c, so I reverted them.
  Apart from emitting many warnings, it successfully compiles the
  sources.

In the process of compatibility testing, I made a few more changes which
make sense to have:

- New patch 5 allows to conveniently override the compiler via command
  line.

- New patch 6 eliminates a warning with old gcc but looks valid in
  general.

- A warning made me look at ip/tcp_metrics.c and I found a minor code
  simplification (patch 7).

Phil Sutter (6):
  tc: m_action: Improve conversion to C99 style initializers
  Use C99 style initializers everywhere
  Replace malloc && memset by calloc
  No need to initialize rtattr fields before parsing
  Makefile: Allow to override CC
  misc/ifstat: simplify unsigned value comparison

 Makefile   |   4 +-
 bridge/fdb.c   |  25 ++--
 bridge/link.c  |  14 +++
 bridge/mdb.c   |  17 -
 bridge/vlan.c  |  17 -
 genl/ctrl.c|  44 +
 genl/genl.c|   3 +-
 ip/ip6tunnel.c |  10 ++---
 ip/ipaddress.c |  33 +++-
 ip/ipaddrlabel.c   |  21 --
 ip/iplink.c|  61 -
 ip/iplink_can.c|   4 +-
 ip/ipmaddr.c   |  25 
 ip/ipmroute.c  |   8 +---
 ip/ipneigh.c   |  30 ++-
 ip/ipnetconf.c |  10 ++---
 ip/ipnetns.c   |  39 +--
 ip/ipntable.c  |  25 
 ip/iproute.c   |  78 +
 ip/iprule.c|  22 +--
 ip/iptoken.c   |  19 -
 ip/iptunnel.c  |  31 +--
 ip/ipxfrm.c|  26 -
 ip/link_gre.c  |  18 -
 ip/link_gre6.c |  18 -
 ip/link_ip6tnl.c   |  25 +---
 ip/link_iptnl.c|  22 +--
 ip/link_vti.c  |  18 -
 ip/link_vti6.c |  18 -
 ip/xfrm_policy.c   |  99 +++
 ip/xfrm_state.c| 110 ++---
 lib/libnetlink.c   |  77 ++---
 lib/ll_map.c   |   1 -
 lib/names.c|   7 +---
 misc/arpd.c|  64 ++-
 misc/ifstat.c  |   2 +-
 misc/lnstat.c  |   6 +--
 misc/lnstat_util.c |   4 +-
 misc/ss.c  |  37 +++---
 tc/e_bpf.c |   7 +---
 tc/em_canid.c  |   4 +-
 tc/em_cmp.c|   4 +-
 tc/em_ipset.c  |   4 +-
 tc/em_meta.c   |   4 +-
 tc/em_nbyte.c  |   4 +-
 tc/em_u32.c|   4 +-
 tc/f_flow.c|   3 --
 tc/f_flower.c  |   3 +-
 tc/f_fw.c  |   6 +--
 tc/f_route.c   |   3 --
 tc/f_rsvp.c|   6 +--
 tc/f_u32.c |  12 ++
 tc/m_action.c  |  26 -
 tc/m_bpf.c |   5 +--
 tc/m_csum.c|   4 +-
 tc/m_ematch.c  |   4 +-
 tc/m_gact.c|   5 +--
 tc/m_ife.c |   5 +--
 tc/m_ipt.c |  13 ++-
 tc/m_mirred.c  |   7 +---
 tc/m_nat.c |   4 +-
 tc/m_pedit.c   |  11 ++
 tc/m_police.c  |   5 +--
 tc/q_atm.c |   3 +-
 tc/q_cbq.c |  22 +++
 tc/q_choke.c   |   4 +-
 tc/q_codel.c   |   3 +-
 tc/q_dsmark.c  |   1 -
 tc/q_fifo.c|   4 +-
 tc/q_fq_codel.c|   3 +-
 tc/q_hfsc.c|  13 ++-
 tc/q_htb.c |  15 +++-
 tc/q_netem.c   |  16 +++-
 tc/q_red.c |   4 +-
 tc/q_sfb.c |  17 -
 tc/q_sfq.c |   4 +-
 tc/q_tbf.c |   4 +-
 tc/tc.c|   9 ++---
 tc/tc_bpf.c|  58 ++---

[iproute PATCH v3 3/6] Replace malloc && memset by calloc

2016-06-23 Thread Phil Sutter
This only replaces occurrences where the newly allocated memory is
cleared completely afterwards, as in other cases it is a theoretical
performance hit although code would be cleaner this way.

Signed-off-by: Phil Sutter 
---
Changes since v2:
- Fix checkpatch errors.
---
 genl/genl.c|  3 +--
 lib/names.c|  7 ++-
 misc/lnstat.c  |  6 ++
 misc/lnstat_util.c |  4 +---
 tc/em_canid.c  |  4 ++--
 tc/m_action.c  |  3 +--
 tc/m_ipt.c | 13 -
 tc/m_pedit.c   |  3 +--
 tc/tc.c|  9 +++--
 tc/tc_bpf.c|  4 +---
 tc/tc_class.c  |  3 +--
 tc/tc_exec.c   |  3 +--
 12 files changed, 20 insertions(+), 42 deletions(-)

diff --git a/genl/genl.c b/genl/genl.c
index e33fafdf2f524..747074b029a7b 100644
--- a/genl/genl.c
+++ b/genl/genl.c
@@ -86,9 +86,8 @@ reg:
return f;
 
 noexist:
-   f = malloc(sizeof(*f));
+   f = calloc(1, sizeof(*f));
if (f) {
-   memset(f, 0, sizeof(*f));
strncpy(f->name, str, 15);
f->parse_genlopt = parse_nofopt;
f->print_genlopt = print_nofopt;
diff --git a/lib/names.c b/lib/names.c
index 3b5b0b1e1201a..fbd6503f22d42 100644
--- a/lib/names.c
+++ b/lib/names.c
@@ -54,15 +54,12 @@ struct db_names *db_names_alloc(void)
 {
struct db_names *db;
 
-   db = malloc(sizeof(*db));
+   db = calloc(1, sizeof(*db));
if (!db)
return NULL;
 
-   memset(db, 0, sizeof(*db));
-
db->size = MAX_ENTRIES;
-   db->hash = malloc(sizeof(struct db_entry *) * db->size);
-   memset(db->hash, 0, sizeof(struct db_entry *) * db->size);
+   db->hash = calloc(db->size, sizeof(struct db_entry *));
 
return db;
 }
diff --git a/misc/lnstat.c b/misc/lnstat.c
index 659a01bd69931..863fd4d9f03f2 100644
--- a/misc/lnstat.c
+++ b/misc/lnstat.c
@@ -182,10 +182,8 @@ static struct table_hdr *build_hdr_string(struct 
lnstat_file *lnstat_files,
static struct table_hdr th;
int ofs = 0;
 
-   for (i = 0; i < HDR_LINES; i++) {
-   th.hdr[i] = malloc(HDR_LINE_LENGTH);
-   memset(th.hdr[i], 0, HDR_LINE_LENGTH);
-   }
+   for (i = 0; i < HDR_LINES; i++)
+   th.hdr[i] = calloc(1, HDR_LINE_LENGTH);
 
for (i = 0; i < fps->num; i++) {
char *cname, *fname = fps->params[i].lf->name;
diff --git a/misc/lnstat_util.c b/misc/lnstat_util.c
index d918151282f55..cc54598fe1bef 100644
--- a/misc/lnstat_util.c
+++ b/misc/lnstat_util.c
@@ -173,15 +173,13 @@ static struct lnstat_file *alloc_and_open(const char 
*path, const char *file)
struct lnstat_file *lf;
 
/* allocate */
-   lf = malloc(sizeof(*lf));
+   lf = calloc(1, sizeof(*lf));
if (!lf) {
fprintf(stderr, "out of memory\n");
return NULL;
}
 
/* initialize */
-   memset(lf, 0, sizeof(*lf));
-
/* de->d_name is guaranteed to be <= NAME_MAX */
strcpy(lf->basename, file);
strcpy(lf->path, path);
diff --git a/tc/em_canid.c b/tc/em_canid.c
index 16f6ed5c0b7a4..ceb64cb933f51 100644
--- a/tc/em_canid.c
+++ b/tc/em_canid.c
@@ -106,8 +106,8 @@ static int canid_parse_eopt(struct nlmsghdr *n, struct 
tcf_ematch_hdr *hdr,
if (args == NULL)
return PARSE_ERR(args, "canid: missing arguments");
 
-   rules.rules_raw = malloc(sizeof(struct can_filter) * 
rules.rules_capacity);
-   memset(rules.rules_raw, 0, sizeof(struct can_filter) * 
rules.rules_capacity);
+   rules.rules_raw = calloc(rules.rules_capacity,
+sizeof(struct can_filter));
 
do {
if (!bstrcmp(args, "sff")) {
diff --git a/tc/m_action.c b/tc/m_action.c
index 806fdd197965d..24f8b5d855211 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -126,9 +126,8 @@ noexist:
goto restart_s;
}
 #endif
-   a = malloc(sizeof(*a));
+   a = calloc(1, sizeof(*a));
if (a) {
-   memset(a, 0, sizeof(*a));
strncpy(a->id, "noact", 15);
a->parse_aopt = parse_noaopt;
a->print_aopt = print_noaopt;
diff --git a/tc/m_ipt.c b/tc/m_ipt.c
index 098f610f9439a..d6f62bd6b32c9 100644
--- a/tc/m_ipt.c
+++ b/tc/m_ipt.c
@@ -164,16 +164,11 @@ get_target_name(const char *name)
return NULL;
 #endif
 
-   new_name = malloc(strlen(name) + 1);
-   lname = malloc(strlen(name) + 1);
-   if (new_name)
-   memset(new_name, '\0', strlen(name) + 1);
-   else
+   new_name = calloc(1, strlen(name) + 1);
+   lname = calloc(1, strlen(name) + 1);
+   if (!new_name)
exit_error(PARAMETER_PROBLEM, "get_target_name");
-
-   if (lname)
-   memset(lname, '\0', strlen(name) + 1);
-   else
+   if (!lname)
exit_error(PARAMETER_PROBLEM, "get_target_name");
 
strcpy(new_name, name);
diff --git a/tc/m_ped

[iproute PATCH v3 2/6] Use C99 style initializers everywhere

2016-06-23 Thread Phil Sutter
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).

Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.

The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.

Signed-off-by: Phil Sutter 
---
Changes since v2:
- Flatten initializers.
- Leave a final comma in place.
- Fix checkpatch warnings.
- Initialize nlmsg_seq in the declaration, too.
- Use C99-style init in tc_bpf.c to get rid of the memset().
Changes since v1:
- Dropped former changes to tc/tc_bpf.c as they are incompatible to older
  gcc versions (at least <=3.4.6).
---
 bridge/fdb.c |  25 ++---
 bridge/link.c|  14 +++
 bridge/mdb.c |  17 -
 bridge/vlan.c|  17 -
 genl/ctrl.c  |  44 +-
 ip/ip6tunnel.c   |  10 ++---
 ip/ipaddress.c   |  31 +++-
 ip/ipaddrlabel.c |  21 ---
 ip/iplink.c  |  61 +-
 ip/iplink_can.c  |   4 +-
 ip/ipmaddr.c |  25 -
 ip/ipmroute.c|   8 +---
 ip/ipneigh.c |  30 ++-
 ip/ipnetconf.c   |  10 ++---
 ip/ipnetns.c |  39 +---
 ip/ipntable.c|  25 -
 ip/iproute.c |  78 ++-
 ip/iprule.c  |  22 +--
 ip/iptoken.c |  19 --
 ip/iptunnel.c|  31 +---
 ip/ipxfrm.c  |  26 -
 ip/link_gre.c|  18 -
 ip/link_gre6.c   |  18 -
 ip/link_ip6tnl.c |  25 +
 ip/link_iptnl.c  |  22 +--
 ip/link_vti.c|  18 -
 ip/link_vti6.c   |  18 -
 ip/xfrm_policy.c |  99 -
 ip/xfrm_state.c  | 110 +++
 lib/libnetlink.c |  77 ++
 lib/ll_map.c |   1 -
 misc/arpd.c  |  64 ++--
 misc/ss.c|  37 +++
 tc/e_bpf.c   |   7 +---
 tc/em_cmp.c  |   4 +-
 tc/em_ipset.c|   4 +-
 tc/em_meta.c |   4 +-
 tc/em_nbyte.c|   4 +-
 tc/em_u32.c  |   4 +-
 tc/f_flow.c  |   3 --
 tc/f_flower.c|   3 +-
 tc/f_fw.c|   6 +--
 tc/f_route.c |   3 --
 tc/f_rsvp.c  |   6 +--
 tc/f_u32.c   |  12 ++
 tc/m_bpf.c   |   5 +--
 tc/m_csum.c  |   4 +-
 tc/m_ematch.c|   4 +-
 tc/m_gact.c  |   5 +--
 tc/m_ife.c   |   5 +--
 tc/m_mirred.c|   7 +---
 tc/m_nat.c   |   4 +-
 tc/m_pedit.c |   8 +---
 tc/m_police.c|   5 +--
 tc/q_atm.c   |   3 +-
 tc/q_cbq.c   |  22 +++
 tc/q_choke.c |   4 +-
 tc/q_codel.c |   3 +-
 tc/q_dsmark.c|   1 -
 tc/q_fifo.c  |   4 +-
 tc/q_fq_codel.c  |   3 +-
 tc/q_hfsc.c  |  13 ++-
 tc/q_htb.c   |  15 +++-
 tc/q_netem.c |  16 +++-
 tc/q_red.c   |   4 +-
 tc/q_sfb.c   |  17 -
 tc/q_sfq.c   |   4 +-
 tc/q_tbf.c   |   4 +-
 tc/tc_bpf.c  |  54 ++-
 tc/tc_class.c|  31 ++--
 tc/tc_exec.c |   3 +-
 tc/tc_filter.c   |  33 ++---
 tc/tc_qdisc.c|  33 ++---
 tc/tc_stab.c |   4 +-
 tc/tc_util.c |   3 +-
 75 files changed, 532 insertions(+), 913 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index be849f980a802..59538b1e16506 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -177,16 +177,15 @@ static int fdb_show(int argc, char **argv)
struct nlmsghdr n;
struct ifinfomsgifm;
charbuf[256];
-   } req;
+   } req = {
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+   .ifm.ifi_family = PF_BRIDGE,
+   };
 
char *filter_dev = NULL;
char *br = NULL;
int msg_size = sizeof(struct ifinfomsg);
 
-   memset(&req, 0, sizeof(req));
-   req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-   req.ifm.ifi_family = PF_BRIDGE;
-
while (argc > 0) {
if ((strcmp(*argv, "brport") == 0) || strcmp(*argv, "dev") == 
0) {
NEXT_ARG();
@@ -247,7 +246,13 @@ static int fdb_modify(int cmd, int flags, int argc, char 
**argv)
struct nlmsghdr n;
struct ndmsgndm;
charbuf[256];
-   } req;
+   } req = {
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
+   .n.nlmsg_flags = NLM_F_REQUEST | flags,
+   .n.nlmsg_type = cmd,
+   .ndm.ndm_family = PF_BRIDGE,
+   .ndm.ndm_state = NUD_NOARP,
+   };
char *addr = NULL;
char *d = NULL;
char abuf[ETH_ALEN];
@@ -259,14 +264,6 @@

[iproute PATCH v3 1/6] tc: m_action: Improve conversion to C99 style initializers

2016-06-23 Thread Phil Sutter
This improves my initial change in the following points:

- Flatten embedded struct's initializers.
- No need to initialize variables to zero as the key feature of C99
  initializers is to do this implicitly.
- By relocating the declaration of struct rtattr *tail, it can be
  initialized at the same time.

Fixes: a0a73b298a579 ("tc: m_action: Use C99 style initializers for struct req")
Signed-off-by: Phil Sutter 
---
Changes since v2:
- Don't drop the "superfluous" comma.
- Flatten initializers.
Changes since v1:
- Created this patch.
---
 tc/m_action.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index ea16817aefd4f..806fdd197965d 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -395,13 +395,10 @@ static int tc_action_gd(int cmd, unsigned int flags, int 
*argc_p, char ***argv_p
struct tcamsg   t;
charbuf[MAX_MSG];
} req = {
-   .n = {
-   .nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
-   .nlmsg_flags = NLM_F_REQUEST | flags,
-   .nlmsg_type = cmd,
-   },
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
+   .n.nlmsg_flags = NLM_F_REQUEST | flags,
+   .n.nlmsg_type = cmd,
.t.tca_family = AF_UNSPEC,
-   .buf = { 0 }
};
 
argc -= 1;
@@ -491,23 +488,18 @@ static int tc_action_modify(int cmd, unsigned int flags, 
int *argc_p, char ***ar
int argc = *argc_p;
char **argv = *argv_p;
int ret = 0;
-
-   struct rtattr *tail;
struct {
struct nlmsghdr n;
struct tcamsg   t;
charbuf[MAX_MSG];
} req = {
-   .n = {
-   .nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
-   .nlmsg_flags = NLM_F_REQUEST | flags,
-   .nlmsg_type = cmd,
-   },
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
+   .n.nlmsg_flags = NLM_F_REQUEST | flags,
+   .n.nlmsg_type = cmd,
.t.tca_family = AF_UNSPEC,
-   .buf = { 0 }
};
+   struct rtattr *tail = NLMSG_TAIL(&req.n);
 
-   tail = NLMSG_TAIL(&req.n);
argc -= 1;
argv += 1;
if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) {
@@ -540,7 +532,6 @@ static int tc_act_list_or_flush(int argc, char **argv, int 
event)
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
.t.tca_family = AF_UNSPEC,
-   .buf = { 0 }
};
 
tail = NLMSG_TAIL(&req.n);
-- 
2.8.2



[iproute PATCH v3 5/6] Makefile: Allow to override CC

2016-06-23 Thread Phil Sutter
This makes it easier to build iproute2 with a custom compiler.

While at it, make HOSTCC default to the value of CC if not explicitly
set elsewhere.

Signed-off-by: Phil Sutter 
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 15c81ecfdca3a..fa200ddb76679 100644
--- a/Makefile
+++ b/Makefile
@@ -34,8 +34,8 @@ ADDLIB+=ipx_ntop.o ipx_pton.o
 #options for mpls
 ADDLIB+=mpls_ntop.o mpls_pton.o
 
-CC = gcc
-HOSTCC = gcc
+CC := gcc
+HOSTCC ?= $(CC)
 DEFINES += -D_GNU_SOURCE
 # Turn on transparent support for LFS
 DEFINES += -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
-- 
2.8.2



[iproute PATCH v3 6/6] misc/ifstat: simplify unsigned value comparison

2016-06-23 Thread Phil Sutter
By directly comparing the value of both unsigned variables, casting to
signed becomes unnecessary.

This also fixes for compiling with older versions of gcc (at least
<=3.4.6) which emit the following warning:

| ifstat.c: In function `update_db':
| ifstat.c:542: warning: comparison is always false due to limited range of 
data type

Signed-off-by: Phil Sutter 
---
 misc/ifstat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index abbb4e732fcef..9a44da487599e 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -539,7 +539,7 @@ static void update_db(int interval)
int i;
 
for (i = 0; i < MAXS; i++) {
-   if ((long)(h1->ival[i] - n->ival[i]) < 
0) {
+   if (h1->ival[i] < n->ival[i]) {
memset(n->ival, 0, 
sizeof(n->ival));
break;
}
-- 
2.8.2



Re: [PATCH] bridge: netfilter: spanning tree: Add masked_ether_addr_equal and neatening

2016-06-23 Thread Pablo Neira Ayuso
On Wed, Jun 15, 2016 at 01:58:45PM -0700, Joe Perches wrote:
> There is code duplication of a masked ethernet address comparison here
> so make it a separate function instead.
> 
> Miscellanea:
> 
> o Neaten alignment of FWINV macro uses to make it clearer for the reader

Applied, thanks.

> Signed-off-by: Joe Perches 
> ---
> 
> This masked_ether_addr_equal function could go into etherdevice.h,
> but I don't see another use like it in kernel code.  Is there one?

This is specific of iptables, not even nftables would use this. So I
would keep this in the iptables tree.


Re: [PATCH v2 2/2] netfilter/nflog: nflog-range does not truncate packets (userspace)

2016-06-23 Thread Pablo Neira Ayuso
On Tue, Jun 21, 2016 at 03:02:16PM -0400, Vishwanath Pai wrote:
> netfilter/nflog: nflog-range does not truncate packets
> 
> The option --nflog-range has never worked, but we cannot just fix this
> because users might be using this feature option and their behavior would
> change. Instead add a new option --nflog-size. This option works the same
> way nflog-range should have, and both of them are mutually exclusive. When
> someone uses --nflog-range we print a warning message informing them that
> this feature has no effect.
> 
> To indicate the kernel that the user has set --nflog-size we have to pass a
> new flag XT_NFLOG_F_COPY_LEN.
> 
> Also updated the man page to reflect this.

Please, send me a v3 including tests, see:

iptables/extensions/libxt_NFLOG.t

Thanks.


  1   2   >