[PATCH 1/1] forcedeth: Remove return from a void function

2017-02-23 Thread Zhu Yanjun
In a void function, it is not necessary to append a return statement in it.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/nvidia/forcedeth.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c 
b/drivers/net/ethernet/nvidia/forcedeth.c
index 3913f07..a992595 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -3274,8 +3274,6 @@ static void nv_force_linkspeed(struct net_device *dev, 
int speed, int duplex)
pci_push(base);
writel(np->linkspeed, base + NvRegLinkSpeed);
pci_push(base);
-
-   return;
 }
 
 /**
-- 
2.7.4



Re: VXLAN RCU error

2017-02-23 Thread Jakub Kicinski
On Wed, 22 Feb 2017 20:30:31 -0800, Jakub Kicinski wrote:
> On Wed, 22 Feb 2017 14:27:45 -0800, Jakub Kicinski wrote:
> > [ 1571.067134] ===
> > [ 1571.071842] [ ERR: suspicious RCU usage.  ]
> > [ 1571.076546] 4.10.0-debug-03232-g12d656af4e3d #1 Tainted: GW  O   
> > [ 1571.084166] ---
> > [ 1571.088867] ../drivers/net/vxlan.c:2111 suspicious 
> > rcu_dereference_check() usage!
> > [ 1571.097286] 
> > [ 1571.097286] other info that might help us debug this:
> > [ 1571.097286] 
> > [ 1571.106305] 
> > [ 1571.106305] rcu_scheduler_active = 2, debug_locks = 1
> > [ 1571.113654] 3 locks held by ping/13826:
> > [ 1571.117968]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [] 
> > raw_sendmsg+0x14e2/0x2e40
> > [ 1571.127758]  #1:  (rcu_read_lock_bh){..}, at: [] 
> > ip_finish_output2+0x274/0x1390
> > [ 1571.138135]  #2:  (rcu_read_lock_bh){..}, at: [] 
> > __dev_queue_xmit+0x1ec/0x2750
> 
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4e27c5b09600..8aa3e837cd6c 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2109,7 +2109,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
>  vxlan->cfg.port_max, true);
>  
> if (dst->sa.sa_family == AF_INET) {
> -   struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
> +   struct vxlan_sock *sock4 = 
> rcu_dereference_bh(vxlan->vn4_sock);
> struct rtable *rt;
> __be16 df = 0;
>  
> @@ -2148,7 +2148,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
> src_port, dst_port, xnet, !udp_sum);
>  #if IS_ENABLED(CONFIG_IPV6)
> } else {
> -   struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
> +   struct vxlan_sock *sock6 = 
> rcu_dereference_bh(vxlan->vn6_sock);
>  
> ndst = vxlan6_get_route(vxlan, dev, sock6, skb,
> rdst ? rdst->remote_ifindex : 0, tos,
> 

Ugh.  Looks like this may not work even if it makes the splat go away.
synchronize_net() doesn't seem to wait for the _bh() flavor of RCU, so
we need to add syncronize_rcu_bh() call before freeing the socket or do
a normal rcu_read_lock()/unlock() on the fast path.  Any RCU experts
want to comment? :)

FWIW geneve will need similar fix, I presume.


Re: lib: Introduce priority array area manager

2017-02-23 Thread Jiri Pirko
Thu, Feb 23, 2017 at 08:56:26AM CET, ge...@linux-m68k.org wrote:
>Hi Jiri,
>
>On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
> wrote:
>> Web:
>> https://git.kernel.org/torvalds/c/44091d29f2075972aede47ef17e1e70db3d51190
>> Commit: 44091d29f2075972aede47ef17e1e70db3d51190
>> Parent: b862815c3ee7b49ec20a9ab25da55a5f0bcbb95e
>> Refname:refs/heads/master
>> Author: Jiri Pirko 
>> AuthorDate: Fri Feb 3 10:29:06 2017 +0100
>> Committer:  David S. Miller 
>> CommitDate: Fri Feb 3 16:35:42 2017 -0500
>>
>> lib: Introduce priority array area manager
>>
>> This introduces a infrastructure for management of linear priority
>> areas. Priority order in an array matters, however order of items inside
>> a priority group does not matter.
>>
>> As an initial implementation, L-sort algorithm is used. It is quite
>> trivial. More advanced algorithm called P-sort will be introduced as a
>> follow-up. The infrastructure is prepared for other algos.
>>
>> Alongside this, a testing module is introduced as well.
>>
>> Signed-off-by: Jiri Pirko 
>> Signed-off-by: David S. Miller 
>
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -550,4 +550,7 @@ config STACKDEPOT
>>  config SBITMAP
>> bool
>>
>> +config PARMAN
>> +   tristate "parman"
>
>| parman (PARMAN) [N/m/y] (NEW) ?
>|
>| There is no help available for this option.
>
>Can you please add a description for this option?
>Or drop the "parman" string if this is always selected by its kernel users, and
>never intended to be enabled by the end user.

I did it in the same way other similar lib dependencies do that. Does
not make sense to have separate description for this, cause this is
always only a dependency of a kernel user.

You suggeste to 'drop the "parman" string'. What do you mean by that
exactly?

Thanks.

Jiri


Software loopback with phy 88E1116R and marvell MV78100 gbe

2017-02-23 Thread Paolo Minazzi
Hi to all,
I have written a low level driver for MV78100 arm board.
All works well.
I have built a physical loopback cable to cross rx and tx to test my
driver and all is good.

I have an other board (with imx6).
On this board I was able to put the PHY (micrel) in loopback mode, so
I do not need a physycal loop. All works well.

I tried to do the same things on 88E1116R, setting the but 14 of reg 0.
But If I do it I lose the link, and the test program does not work.
I tried to force the link in software, but seems the controller send
packets but it is not able to receive them.
Is possibile to do such a software loopback on 88E1116R ?

Thanks in advance,
Paolo Minazzi


Re: [PATCH net-next v4 4/7] gtp: consolidate gtp socket rx path

2017-02-23 Thread Andreas Schultz
Hi Tom,

- On Feb 22, 2017, at 6:41 PM, Tom Herbert t...@herbertland.com wrote:

> On Tue, Feb 21, 2017 at 2:18 AM, Andreas Schultz  wrote:
>> Add network device to gtp context in preparation for splitting
>> the TEID from the network device.
>>
>> Use this to rework the socker rx path. Move the common RX part
>> of v0 and v1 into a helper. Also move the final rx part into
>> that helper as well.
>>
> Andeas,
> 
> How are these GTP kernel patches being tested?

We rn each version in a test setup with a ePDG and a SGW that
connects to a full GGSN/P-GW instance (based on erGW).
We also run performance test (less often) with a commercial
test software.

> Is it possible to > create some sort of GTP network device
> that separates out just the datapath for development in the
> same way that VXLAN did this?

We had this discussion about another patch:
(http://marc.info/?l=linux-netdev&m=148611438811696&w=2)

Currently the kernel module only supports the GGSN/P-GW side
of the GTP tunnel. This is because we check the UE IP address
in the GTP socket and use the destination IP in the network
interface to find the PDP context.

For a deployment in a real EPC, doing it the other way makes no
sense with the current code. However for a test setup it makes
perfect sense (either to use it as a driver to test other GTP
nodes or to test out own implementation).

So, I hope that we can integrate this soonish.

libgtpnl contains two tools that be used for testing. gtp-link
creates a network device and GTP sockets and keeps them alive.
gtp-tunnel can then be used add PDP context to that. The only
missing part for a bidirectional test setup is the above
mentioned patch with the direction flag and support for that
in the libgtpnl tools.

Adding static tunnel support into the kernel module in any form
makes IMHO no sense. GTP as defined by 3GPP always need a control
instance and there are much better options for static tunnel
encapsulations.

Andreas

> 
> Tom
> 
>> Signed-off-by: Andreas Schultz 
>> ---
>>  drivers/net/gtp.c | 80 
>> ++-
>>  1 file changed, 44 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
>> index 961fb3c..fc0fff5 100644
>> --- a/drivers/net/gtp.c
>> +++ b/drivers/net/gtp.c
>> @@ -58,6 +58,8 @@ struct pdp_ctx {
>> struct in_addr  ms_addr_ip4;
>> struct in_addr  sgsn_addr_ip4;
>>
>> +   struct net_device   *dev;
>> +
>> atomic_ttx_seq;
>> struct rcu_head rcu_head;
>>  };
>> @@ -175,6 +177,40 @@ static bool gtp_check_src_ms(struct sk_buff *skb, struct
>> pdp_ctx *pctx,
>> return false;
>>  }
>>
>> +static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, unsigned int
>> hdrlen,
>> + bool xnet)
>> +{
>> +   struct pcpu_sw_netstats *stats;
>> +
>> +   if (!gtp_check_src_ms(skb, pctx, hdrlen)) {
>> +   netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
>> +   return 1;
>> +   }
>> +
>> +   /* Get rid of the GTP + UDP headers. */
>> +   if (iptunnel_pull_header(skb, hdrlen, skb->protocol, xnet))
>> +   return -1;
>> +
>> +   netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
>> +
>> +   /* Now that the UDP and the GTP header have been removed, set up the
>> +* new network header. This is required by the upper layer to
>> +* calculate the transport header.
>> +*/
>> +   skb_reset_network_header(skb);
>> +
>> +   skb->dev = pctx->dev;
>> +
>> +   stats = this_cpu_ptr(pctx->dev->tstats);
>> +   u64_stats_update_begin(&stats->syncp);
>> +   stats->rx_packets++;
>> +   stats->rx_bytes += skb->len;
>> +   u64_stats_update_end(&stats->syncp);
>> +
>> +   netif_rx(skb);
>> +   return 0;
>> +}
>> +
>>  /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>>  static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb,
>>bool xnet)
>> @@ -201,13 +237,7 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, 
>> struct
>> sk_buff *skb,
>> return 1;
>> }
>>
>> -   if (!gtp_check_src_ms(skb, pctx, hdrlen)) {
>> -   netdev_dbg(gtp->dev, "No PDP ctx for this MS\n");
>> -   return 1;
>> -   }
>> -
>> -   /* Get rid of the GTP + UDP headers. */
>> -   return iptunnel_pull_header(skb, hdrlen, skb->protocol, xnet);
>> +   return gtp_rx(pctx, skb, hdrlen, xnet);
>>  }
>>
>>  static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb,
>> @@ -250,13 +280,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
>> struct
>> sk_buff *skb,
>> return 1;
>> }
>>
>> -   if (!gtp_check_src_ms(skb, pctx, hdrlen)) {
>> -   netdev_dbg(gtp->dev, "No PDP ctx for this MS\n");
>> -   return 1;
>> -   }
>> -
>> -

Re: lib: Introduce priority array area manager

2017-02-23 Thread Jiri Pirko
Thu, Feb 23, 2017 at 10:22:22AM CET, ge...@linux-m68k.org wrote:
>Hi Jiri,
>
>On Thu, Feb 23, 2017 at 9:32 AM, Jiri Pirko  wrote:
>> Thu, Feb 23, 2017 at 08:56:26AM CET, ge...@linux-m68k.org wrote:
>>>On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
>>> wrote:
 Web:
 https://git.kernel.org/torvalds/c/44091d29f2075972aede47ef17e1e70db3d51190
 Commit: 44091d29f2075972aede47ef17e1e70db3d51190
 Parent: b862815c3ee7b49ec20a9ab25da55a5f0bcbb95e
 Refname:refs/heads/master
 Author: Jiri Pirko 
 AuthorDate: Fri Feb 3 10:29:06 2017 +0100
 Committer:  David S. Miller 
 CommitDate: Fri Feb 3 16:35:42 2017 -0500

 lib: Introduce priority array area manager

 This introduces a infrastructure for management of linear priority
 areas. Priority order in an array matters, however order of items 
 inside
 a priority group does not matter.

 As an initial implementation, L-sort algorithm is used. It is quite
 trivial. More advanced algorithm called P-sort will be introduced as a
 follow-up. The infrastructure is prepared for other algos.

 Alongside this, a testing module is introduced as well.

 Signed-off-by: Jiri Pirko 
 Signed-off-by: David S. Miller 
>>>
 --- a/lib/Kconfig
 +++ b/lib/Kconfig
 @@ -550,4 +550,7 @@ config STACKDEPOT
  config SBITMAP
 bool

 +config PARMAN
 +   tristate "parman"
>>>
>>>| parman (PARMAN) [N/m/y] (NEW) ?
>>>|
>>>| There is no help available for this option.
>>>
>>>Can you please add a description for this option?
>>>Or drop the "parman" string if this is always selected by its kernel users, 
>>>and
>>>never intended to be enabled by the end user.
>>
>> I did it in the same way other similar lib dependencies do that. Does
>> not make sense to have separate description for this, cause this is
>> always only a dependency of a kernel user.
>
>OK, so the user should not be asked about it...
>
>> You suggeste to 'drop the "parman" string'. What do you mean by that
>> exactly?
>
>... and
>
>-   tristate "parman"
>+   tristate
>
>should do the trick.

Okay. I will push this through the net tree. Thanks!


Re: lib: Introduce priority array area manager

2017-02-23 Thread Geert Uytterhoeven
Hi Jiri,

On Thu, Feb 23, 2017 at 9:32 AM, Jiri Pirko  wrote:
> Thu, Feb 23, 2017 at 08:56:26AM CET, ge...@linux-m68k.org wrote:
>>On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
>> wrote:
>>> Web:
>>> https://git.kernel.org/torvalds/c/44091d29f2075972aede47ef17e1e70db3d51190
>>> Commit: 44091d29f2075972aede47ef17e1e70db3d51190
>>> Parent: b862815c3ee7b49ec20a9ab25da55a5f0bcbb95e
>>> Refname:refs/heads/master
>>> Author: Jiri Pirko 
>>> AuthorDate: Fri Feb 3 10:29:06 2017 +0100
>>> Committer:  David S. Miller 
>>> CommitDate: Fri Feb 3 16:35:42 2017 -0500
>>>
>>> lib: Introduce priority array area manager
>>>
>>> This introduces a infrastructure for management of linear priority
>>> areas. Priority order in an array matters, however order of items inside
>>> a priority group does not matter.
>>>
>>> As an initial implementation, L-sort algorithm is used. It is quite
>>> trivial. More advanced algorithm called P-sort will be introduced as a
>>> follow-up. The infrastructure is prepared for other algos.
>>>
>>> Alongside this, a testing module is introduced as well.
>>>
>>> Signed-off-by: Jiri Pirko 
>>> Signed-off-by: David S. Miller 
>>
>>> --- a/lib/Kconfig
>>> +++ b/lib/Kconfig
>>> @@ -550,4 +550,7 @@ config STACKDEPOT
>>>  config SBITMAP
>>> bool
>>>
>>> +config PARMAN
>>> +   tristate "parman"
>>
>>| parman (PARMAN) [N/m/y] (NEW) ?
>>|
>>| There is no help available for this option.
>>
>>Can you please add a description for this option?
>>Or drop the "parman" string if this is always selected by its kernel users, 
>>and
>>never intended to be enabled by the end user.
>
> I did it in the same way other similar lib dependencies do that. Does
> not make sense to have separate description for this, cause this is
> always only a dependency of a kernel user.

OK, so the user should not be asked about it...

> You suggeste to 'drop the "parman" string'. What do you mean by that
> exactly?

... and

-   tristate "parman"
+   tristate

should do the trick.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: create drivers/net/mdio and move mdio drivers into it

2017-02-23 Thread Andrew Lunn
> Big picture is we can remove struct mii_bus,

So if you remove this, how do you represent MII as a bus? It is a bus,
clause 22 allows up to 32 devices on it, and i have boards with more
than 8 devices on the bus. Clause 44 allows many more devices on the
bus.

Andrew



Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Andreas Schultz
Hi Tom,

- On Feb 22, 2017, at 10:47 PM, Tom Herbert t...@herbertland.com wrote:

> On Wed, Feb 22, 2017 at 1:29 PM, Or Gerlitz  wrote:
>> On Thu, Feb 16, 2017 at 11:58 PM, Andreas Schultz  wrote:
>>> Hi Or,
>>> - On Feb 16, 2017, at 3:59 PM, Or Gerlitz ogerl...@mellanox.com wrote:
>>>
 Generate the source udp header according to the flow represented by
 the packet we are encapsulating, as done for other udp tunnels. This
 helps on the receiver side to apply RSS spreading.
>>>
>>> This might work for GTPv0-U, However, for GTPv1-U this could interfere
>>> with error handling in the user space control process when the UDP port
>>> extension  header is used in error indications.
>>
>>
>> in the document you posted there's this quote "The source IP and port
>> have no meaning and can change at any time" -- I assume it refers to
>> v0? can we identify in the kernel code that we're on v0 and have the
>> patch come into play?
>>
>>> 3GPP TS 29.281 Rel 13, section 5.2.2.1 defines the UDP port extension and
>>> section 7.3.1 says that the UDP source port extension can be used to
>>> mitigate DOS attacks. This would IMHO imply that the user space control
>>> process needs to know the TEID to UDP source port mapping.
>>
>>> The other question is, on what is this actually hashing. When I understand
>>> the code correctly, this will hash on the source/destination of the orignal
>>> flow. I would expect that a SGSN/SGW/eNodeB would like the keep flow
>>> processing on a per TEID base, so the port hashing should be base on the 
>>> TEID.
>>
>> is it possible for packets belonging to the same TCP session or UDP
>> "pseudo session" (given pair of src/dst ip/port) to be encapsulated
>> using different TEID?
>>
>> hashing on the TEID imposes a harder requirement on the NIC HW vs.
>> just UDP based RSS.
> 
> This shouldn't be taken as a HW requirement and it's unlikely we'd add
> explicit GTP support in flow_dissector. If we can't get entropy in the
> UDP source port then IPv6 flow label is a potential alternative (so
> that should be supported in NICs for RSS).
> 
> I'll also reiterate my previous point about the need for GTP testing--
> in order for us to be able to evaluate the GTP datapath for things
> like performance or how they withstand against DDOS we really need an
> easy way to isolate the datapath.

GTP as specified is very unsecure by definition. It is meant to be run
only on *private* mobile carrier and intra mobile carrier EPC networks.
Running it openly on the public internet would be extremly foolish.

There are some mechanisms in GTPv1-C on how to handle overload and
more extensive mechanisms in GTPv2-C for overload handling. The basic
guiding principle is to simply drop any traffic that it can't handle.

Anyhow, I havn't seen anything in 3GPP or GSMA documents that deals
with DDOS.

There are guidelines like the GSMA's IR.88 that describe how the intra
carrier roaming should work and what security measures should be
implemented.

Traffic coming in at Gi/SGi or form the UE could create a DDOS on tunnel.
However, on the UE side you still have the RAN (eNODE, SGSN, S-GW) or
an ePDG that has to apply QoS and thereby limit traffic. On the Gi/SGi
side side you have the PCEF that does the same.

So in a complete 3GPP node (GGSN, P-GW) that uses the GTP tunnel
implementation, malicious traffic should be blocked before it can reach
the tunnel.

And as I stated before, the GTP tunnel module is not supposed to be
use without any of those components. So the DDOS concern should not
be handled at the tunnel level.

Andreas

> 
> Tom


RE: create drivers/net/mdio and move mdio drivers into it

2017-02-23 Thread YUAN Linyu


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Andrew Lunn
> Sent: Thursday, February 23, 2017 5:30 PM
> To: YUAN Linyu
> Cc: Florian Fainelli; David S . Miller; netdev@vger.kernel.org; cug...@163.com
> Subject: Re: create drivers/net/mdio and move mdio drivers into it
> 
> > Big picture is we can remove struct mii_bus,
> 
> So if you remove this, how do you represent MII as a bus? It is a bus,
> clause 22 allows up to 32 devices on it, and i have boards with more
> than 8 devices on the bus. Clause 44 allows many more devices on the
> bus.
> 
add a phy device list to mdio_device.
>   Andrew



Re: netfilter: nft_ct: add zone id set support

2017-02-23 Thread Geert Uytterhoeven
Hi Florian,

On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
 wrote:
> Web:
> https://git.kernel.org/torvalds/c/edee4f1e92458299505ff007733f676b00c516a1
> Commit: edee4f1e92458299505ff007733f676b00c516a1
> Parent: 5c178d81b69f08ca3195427a6ea9a46d9af23127
> Refname:refs/heads/master
> Author: Florian Westphal 
> AuthorDate: Fri Feb 3 13:35:50 2017 +0100
> Committer:  Pablo Neira Ayuso 
> CommitDate: Wed Feb 8 14:16:23 2017 +0100
>
> netfilter: nft_ct: add zone id set support
>
> zones allow tracking multiple connections sharing identical tuples,
> this is needed e.g. when tracking distinct vlans with overlapping ip
> addresses (conntrack is l2 agnostic).
>
> Thus the zone has to be set before the packet is picked up by the
> connection tracker.  This is done by means of 'conntrack templates' which
> are conntrack structures used solely to pass this info from one netfilter
> hook to the next.
>
> The iptables CT target instantiates these connection tracking templates
> once per rule, i.e. the template is fixed/tied to particular zone, can
> be read-only and therefore be re-used by as many skbs simultaneously as
> needed.
>
> We can't follow this model because we want to take the zone id from
> an sreg at rule eval time so we could e.g. fill in the zone id from
> the packets vlan id or a e.g. nftables key : value maps.
>
> To avoid cost of per packet alloc/free of the template, use a percpu
> template 'scratch' object and use the refcount to detect the (unlikely)
> case where the template is still attached to another skb (i.e., previous
> skb was nfqueued ...).
>
> Signed-off-by: Florian Westphal 
> Signed-off-by: Pablo Neira Ayuso 

> --- a/net/netfilter/nft_ct.c
> +++ b/net/netfilter/nft_ct.c

> @@ -407,6 +503,7 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
> unsigned int len;

With gcc-4.1.2 and -Os:

net/netfilter/nft_ct.c: In function ‘nft_ct_set_init’:
net/netfilter/nft_ct.c:503: warning: ‘len’ may be used
uninitialized in this function

> int err;
>
> +   priv->dir = IP_CT_DIR_MAX;
> priv->key = ntohl(nla_get_be32(tb[NFTA_CT_KEY]));
> switch (priv->key) {
>  #ifdef CONFIG_NF_CONNTRACK_MARK
> @@ -426,10 +523,28 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
> return err;
> break;
>  #endif
> +#ifdef CONFIG_NF_CONNTRACK_ZONES
> +   case NFT_CT_ZONE:
> +   if (!nft_ct_tmpl_alloc_pcpu())
> +   return -ENOMEM;
> +   nft_ct_pcpu_template_refcnt++;

Unlike for the other cases of the switch statement, "len" is not initialized
here...

> +   break;
> +#endif
> default:
> return -EOPNOTSUPP;
> }
>
> +   if (tb[NFTA_CT_DIRECTION]) {
> +   priv->dir = nla_get_u8(tb[NFTA_CT_DIRECTION]);
> +   switch (priv->dir) {
> +   case IP_CT_DIR_ORIGINAL:
> +   case IP_CT_DIR_REPLY:
> +   break;
> +   default:
> +   return -EINVAL;
> +   }
> +   }
> +
> priv->sreg = nft_parse_register(tb[NFTA_CT_SREG]);
> err = nft_validate_register_load(priv->sreg, len);

... and used here, which may lead to spurious failures of
nft_validate_register_load().

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[patch net] lib: Remove string from parman config selection

2017-02-23 Thread Jiri Pirko
From: Jiri Pirko 

As reported by Geert, remove the string so the user does not see this
config option. The option is explicitly selected only as a dependency of
in-kernel users.

Reported-by: Geert Uytterhoeven 
Fixes: 44091d29f207 ("lib: Introduce priority array area manager")
Signed-off-by: Jiri Pirko 
---
 lib/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index 5d644f1..f355260 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -551,6 +551,6 @@ config SBITMAP
bool
 
 config PARMAN
-   tristate "parman"
+   tristate
 
 endmenu
-- 
2.7.4



Re: [patch net] lib: Remove string from parman config selection

2017-02-23 Thread Geert Uytterhoeven
Hi Jiri,

On Thu, Feb 23, 2017 at 10:57 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> As reported by Geert, remove the string so the user does not see this
> config option. The option is explicitly selected only as a dependency of
> in-kernel users.

Thanks!

> Reported-by: Geert Uytterhoeven 
> Fixes: 44091d29f207 ("lib: Introduce priority array area manager")
> Signed-off-by: Jiri Pirko 

Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH net V3 1/5] net/mlx4: Change ENOTSUPP to EOPNOTSUPP

2017-02-23 Thread Tariq Toukan
From: Or Gerlitz 

As ENOTSUPP is specific to NFS, change the return error value to
EOPNOTSUPP in various places in the mlx4 driver.

Signed-off-by: Or Gerlitz 
Suggested-by: Yotam Gigi 
Reviewed-by: Matan Barak 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c| 2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c | 2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c | 6 +++---
 drivers/net/ethernet/mellanox/mlx4/mr.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c 
b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
index b04760a5034b..1dae8e40fb25 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -319,7 +319,7 @@ static int mlx4_en_ets_validate(struct mlx4_en_priv *priv, 
struct ieee_ets *ets)
default:
en_err(priv, "TC[%d]: Not supported TSA: %d\n",
i, ets->tc_tsa[i]);
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
}
}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 3fe885ce1902..37e84a59e751 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2436,7 +2436,7 @@ int mlx4_config_dev_retrieval(struct mlx4_dev *dev,
 #define CONFIG_DEV_RX_CSUM_MODE_PORT2_BIT_OFFSET   4
 
if (!(dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_CONFIG_DEV))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
err = mlx4_CONFIG_DEV_get(dev, &config_dev);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c 
b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 8258d08acd8c..e00f627331cb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -136,7 +136,7 @@ int mlx4_do_bond(struct mlx4_dev *dev, bool enable)
LIST_HEAD(bond_list);
 
if (!(dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_PORT_REMAP))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
ret = mlx4_disable_rx_port_check(dev, enable);
if (ret) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 15ef787e71ba..683234221741 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1447,7 +1447,7 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct 
mlx4_port_map *v2p)
int err;
 
if (!(dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_PORT_REMAP))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
mutex_lock(&priv->bond_mutex);
 
@@ -1884,7 +1884,7 @@ int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
struct mlx4_priv *priv = mlx4_priv(dev);
 
if (mlx4_is_slave(dev))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
if (!params)
return -EINVAL;
@@ -2384,7 +2384,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 
/* Query CONFIG_DEV parameters */
err = mlx4_config_dev_retrieval(dev, ¶ms);
-   if (err && err != -ENOTSUPP) {
+   if (err && err != -EOPNOTSUPP) {
mlx4_err(dev, "Failed to query CONFIG_DEV parameters\n");
} else if (!err) {
dev->caps.rx_checksum_flags_port[1] = 
params.rx_csum_flags_port_1;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c 
b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..db65f72879e9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -823,7 +823,7 @@ int mlx4_mw_alloc(struct mlx4_dev *dev, u32 pd, enum 
mlx4_mw_type type,
 !(dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW)) ||
 (type == MLX4_MW_TYPE_2 &&
 !(dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN)))
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
index = mlx4_mpt_reserve(dev);
if (index == -1)
diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c 
b/drivers/net/ethernet/mellanox/mlx4/qp.c
index d1cd9c32a9ae..2d6abd4662b1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
@@ -447,7 +447,7 @@ int mlx4_update_qp(struct mlx4_dev *dev, u32 qpn,
  & MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
mlx4_warn(dev,
  "Trying to set src check LB, but it isn't 
supported\n");
-   err = -ENOTSUPP;
+   err = -EOPNOTSUPP;
goto out;
}

[PATCH net V3 3/5] net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs

2017-02-23 Thread Tariq Toukan
From: Majd Dibbiny 

In the VF driver, module parameter mlx4_log_num_mgm_entry_size was
mistakenly overwritten -- and in a manner which overrode the
device-managed flow steering option encoded in the parameter.

log_num_mgm_entry_size is a global module parameter which
affects all ConnectX-3 PFs installed on that host.
If a VF changes log_num_mgm_entry_size, this will affect all PFs
which are probed subsequent to the change (by disabling DMFS for
those PFs).

Fixes: 3c439b5586e9 ("mlx4_core: Allow choosing flow steering mode")
Signed-off-by: Majd Dibbiny 
Reviewed-by: Jack Morgenstein 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 683234221741..005e1049c977 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -841,8 +841,6 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
return -EINVAL;
}
 
-   mlx4_log_num_mgm_entry_size = hca_param.log_mc_entry_sz;
-
dev->caps.hca_core_clock = hca_param.hca_core_clock;
 
memset(&dev_cap, 0, sizeof(dev_cap));
-- 
1.8.3.1



[PATCH net V3 4/5] net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

2017-02-23 Thread Tariq Toukan
From: Jack Morgenstein 

When creating EQs to handle CQ completion events for the PF
or for VFs, we create enough EQE entries to handle completions
for the max number of CQs that can use that EQ.

When SRIOV is activated, the max number of CQs a VF (or the PF) can
obtain is its CQ quota (determined by the Hypervisor resource tracker).
Therefore, when creating an EQ, the number of EQE entries that the VF
should request for that EQ is the CQ quota value (and not the total
number of CQs available in the FW).

Under SRIOV, the PF, also must use its CQ quota, because
the resource tracker also controls how many CQs the PF can obtain.

Using the FW total CQs instead of the CQ quota when creating EQs resulted
wasting MTT entries, due to allocating more EQEs than were needed.

Fixes: 5a0d0a6161ae ("mlx4: Structures and init/teardown for VF resource 
quotas")
Signed-off-by: Jack Morgenstein 
Reported-by: Dexuan Cui 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/eq.c   | 5 ++---
 drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 39232b6a974f..07406cf2eacd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1249,9 +1249,8 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
mlx4_warn(dev, "Failed adding irq 
rmap\n");
}
 #endif
-   err = mlx4_create_eq(dev, dev->caps.num_cqs -
- dev->caps.reserved_cqs +
- MLX4_NUM_SPARE_EQE,
+   err = mlx4_create_eq(dev, dev->quotas.cq +
+MLX4_NUM_SPARE_EQE,
 (dev->flags & MLX4_FLAG_MSI_X) ?
 i + 1 - !!(i > MLX4_EQ_ASYNC) : 0,
 eq);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 005e1049c977..21377c315083 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3501,6 +3501,8 @@ static int mlx4_load_one(struct pci_dev *pdev, int 
pci_dev_data,
goto err_disable_msix;
}
 
+   mlx4_init_quotas(dev);
+
err = mlx4_setup_hca(dev);
if (err == -EBUSY && (dev->flags & MLX4_FLAG_MSI_X) &&
!mlx4_is_mfunc(dev)) {
@@ -3513,7 +3515,6 @@ static int mlx4_load_one(struct pci_dev *pdev, int 
pci_dev_data,
if (err)
goto err_steer;
 
-   mlx4_init_quotas(dev);
/* When PF resources are ready arm its comm channel to enable
 * getting commands
 */
-- 
1.8.3.1



[PATCH net V3 0/5] mlx4 misc fixes

2017-02-23 Thread Tariq Toukan
Hi Dave,

This patchset contains misc bug fixes from Eric Dumazet and our team
to the mlx4 Core and Eth drivers.

Series generated against net commit:
eee2faabc63d tcp: account for ts offset only if tsecr not zero

Thanks,
Tariq.

v3:
* Rebased, conflict solved.

v2:
* Added Eric's fix (patch 5/5).

Eric Dumazet (1):
  net/mlx4_en: Use __skb_fill_page_desc()

Eugenia Emantayev (1):
  net/mlx4: Spoofcheck and zero MAC can't coexist

Jack Morgenstein (1):
  net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

Majd Dibbiny (1):
  net/mlx4_core: Fix VF overwrite of module param which disables DMFS on
new probed PFs

Or Gerlitz (1):
  net/mlx4: Change ENOTSUPP to EOPNOTSUPP

 drivers/net/ethernet/mellanox/mlx4/cmd.c   | 22 --
 drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  6 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  8 
 drivers/net/ethernet/mellanox/mlx4/eq.c|  5 ++---
 drivers/net/ethernet/mellanox/mlx4/fw.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 11 +--
 drivers/net/ethernet/mellanox/mlx4/mr.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c|  2 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  2 +-
 include/linux/mlx4/cmd.h   |  2 +-
 include/linux/mlx4/driver.h| 10 ++
 13 files changed, 49 insertions(+), 27 deletions(-)

-- 
1.8.3.1



[PATCH net V3 5/5] net/mlx4_en: Use __skb_fill_page_desc()

2017-02-23 Thread Tariq Toukan
From: Eric Dumazet 

Or we might miss the fact that a page was allocated from memory reserves.

Fixes: dceeab0e5258 ("mlx4: support __GFP_MEMALLOC for rx")
Signed-off-by: Eric Dumazet 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index d85e6446f9d9..867292880c07 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -604,10 +604,10 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv 
*priv,
dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size,
DMA_FROM_DEVICE);
 
-   /* Save page reference in skb */
-   __skb_frag_set_page(&skb_frags_rx[nr], frags[nr].page);
-   skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size);
-   skb_frags_rx[nr].page_offset = frags[nr].page_offset;
+   __skb_fill_page_desc(skb, nr, frags[nr].page,
+frags[nr].page_offset,
+frag_info->frag_size);
+
skb->truesize += frag_info->frag_stride;
frags[nr].page = NULL;
}
-- 
1.8.3.1



[PATCH net V3 2/5] net/mlx4: Spoofcheck and zero MAC can't coexist

2017-02-23 Thread Tariq Toukan
From: Eugenia Emantayev 

Spoofcheck can't be enabled if VF MAC is zero.
Vice versa, can't zero MAC if spoofcheck is on.

Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c   | 22 --
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  6 +-
 include/linux/mlx4/cmd.h   |  2 +-
 include/linux/mlx4/driver.h| 10 ++
 4 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index a49072b4fa52..e8c105164931 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -2955,7 +2956,7 @@ static bool mlx4_valid_vf_state_change(struct mlx4_dev 
*dev, int port,
return false;
 }
 
-int mlx4_set_vf_mac(struct mlx4_dev *dev, int port, int vf, u64 mac)
+int mlx4_set_vf_mac(struct mlx4_dev *dev, int port, int vf, u8 *mac)
 {
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_vport_state *s_info;
@@ -2964,13 +2965,22 @@ int mlx4_set_vf_mac(struct mlx4_dev *dev, int port, int 
vf, u64 mac)
if (!mlx4_is_master(dev))
return -EPROTONOSUPPORT;
 
+   if (is_multicast_ether_addr(mac))
+   return -EINVAL;
+
slave = mlx4_get_slave_indx(dev, vf);
if (slave < 0)
return -EINVAL;
 
port = mlx4_slaves_closest_port(dev, slave, port);
s_info = &priv->mfunc.master.vf_admin[slave].vport[port];
-   s_info->mac = mac;
+
+   if (s_info->spoofchk && is_zero_ether_addr(mac)) {
+   mlx4_info(dev, "MAC invalidation is not allowed when spoofchk 
is on\n");
+   return -EPERM;
+   }
+
+   s_info->mac = mlx4_mac_to_u64(mac);
mlx4_info(dev, "default mac on vf %d port %d to %llX will take effect 
only after vf restart\n",
  vf, port, s_info->mac);
return 0;
@@ -3143,6 +3153,7 @@ int mlx4_set_vf_spoofchk(struct mlx4_dev *dev, int port, 
int vf, bool setting)
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_vport_state *s_info;
int slave;
+   u8 mac[ETH_ALEN];
 
if ((!mlx4_is_master(dev)) ||
!(dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_FSM))
@@ -3154,6 +3165,13 @@ int mlx4_set_vf_spoofchk(struct mlx4_dev *dev, int port, 
int vf, bool setting)
 
port = mlx4_slaves_closest_port(dev, slave, port);
s_info = &priv->mfunc.master.vf_admin[slave].vport[port];
+
+   mlx4_u64_to_mac(mac, s_info->mac);
+   if (setting && !is_valid_ether_addr(mac)) {
+   mlx4_info(dev, "Illegal MAC with spoofchk\n");
+   return -EPERM;
+   }
+
s_info->spoofchk = setting;
 
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index afee5434..61420473fe5f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2485,12 +2485,8 @@ static int mlx4_en_set_vf_mac(struct net_device *dev, 
int queue, u8 *mac)
 {
struct mlx4_en_priv *en_priv = netdev_priv(dev);
struct mlx4_en_dev *mdev = en_priv->mdev;
-   u64 mac_u64 = mlx4_mac_to_u64(mac);
 
-   if (is_multicast_ether_addr(mac))
-   return -EINVAL;
-
-   return mlx4_set_vf_mac(mdev->dev, en_priv->port, queue, mac_u64);
+   return mlx4_set_vf_mac(mdev->dev, en_priv->port, queue, mac);
 }
 
 static int mlx4_en_set_vf_vlan(struct net_device *dev, int vf, u16 vlan, u8 
qos,
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 1f3568694a57..7b74afcbbab2 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -308,7 +308,7 @@ int mlx4_get_counter_stats(struct mlx4_dev *dev, int 
counter_index,
 int mlx4_get_vf_stats(struct mlx4_dev *dev, int port, int vf_idx,
  struct ifla_vf_stats *vf_stats);
 u32 mlx4_comm_get_version(void);
-int mlx4_set_vf_mac(struct mlx4_dev *dev, int port, int vf, u64 mac);
+int mlx4_set_vf_mac(struct mlx4_dev *dev, int port, int vf, u8 *mac);
 int mlx4_set_vf_vlan(struct mlx4_dev *dev, int port, int vf, u16 vlan,
 u8 qos, __be16 proto);
 int mlx4_set_vf_rate(struct mlx4_dev *dev, int port, int vf, int min_tx_rate,
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index bd0e7075ea6d..e965e5090d96 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -104,4 +104,14 @@ static inline u64 mlx4_mac_to_u64(u8 *addr)
return mac;
 }
 
+static inline void mlx4_u64_to_mac(u8 *addr, u64 mac)
+{
+   int i;
+
+   for (i = ETH_ALEN; i > 0; i--) {
+   addr[i - 1] = mac && 0xFF;
+   mac >>= 8;
+   }
+}
+
 #endif 

SIPHASH (was: Re: [GIT] Networking)

2017-02-23 Thread Geert Uytterhoeven
Hi Jason,

On Wed, Feb 22, 2017 at 5:31 AM, David Miller  wrote:
> 3) Introduce SIPHASH and it's usage for secure sequence numbers and
>syncookies.  From Jason A. Donenfeld.

> Jason A. Donenfeld (4):
>   siphash: add cryptographically secure PRF
>   siphash: implement HalfSipHash1-3 for hash tables
>   secure_seq: use SipHash in place of MD5
>   syncookies: use SipHash in place of SHA1

bloat-o-meter v4.10.. says:

add/remove: 338/127 grow/shrink: 604/310 up/down: 86156/-24117 (62039)
...
siphash_4u64   -5006   +5006
siphash_3u64   -4298   +4298
siphash_2u64   -3582   +3582
__siphash_unaligned-3052   +3052
__siphash_aligned  -3052   +3052
siphash_3u32   -2976   +2976
siphash_1u64   -2870   +2870
siphash_1u32   -2172   +2172
...

Do we need all of this builtin, unconditionally?

Thanks for your answer!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: Software loopback with phy 88E1116R and marvell MV78100 gbe

2017-02-23 Thread Andrew Lunn
> I tried to do the same things on 88E1116R, setting the but 14 of reg 0.
> But If I do it I lose the link, and the test program does not work.
> I tried to force the link in software, but seems the controller send
> packets but it is not able to receive them.
> Is possibile to do such a software loopback on 88E1116R ?

Hi Paolo

What you are talking about here is MAC loopback. Packets are looped
back at the MAC level. The copper side will be left idle, and so the
link will be lost. This explains why you are seeing link down..

What you might need to do is extend marvell_update_link() to check if
MAC loopback is happening, and if so, say the link is up.

But this is all a bit questionable. How are you setting the PHY into
loopback? I guess the rest of the stack has no idea this is happening,
in particular the phy state machine. What happens when it comes out of
loopback? How is autoneg kicked off.

You might want to think about the big picture, and how this can be
incorporated into ethtool, and phylib. MAC loopback is pretty much
standard, so you should be able to solve this for all PHYs, not just
Marvell.

 Andrew


[PATCH 4/8] Fix bug: sometimes valid entries in hash:* types of sets were evicted

2017-02-23 Thread Pablo Neira Ayuso
From: Jozsef Kadlecsik 

Wrong index was used and therefore when shrinking a hash bucket at
deleting an entry, valid entries could be evicted as well.
Thanks to Eric Ewanco for the thorough bugreport.

Fixes netfilter bugzilla #1119

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 1b05d4a7d5a1..f236c0bc7b3f 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -897,7 +897,7 @@ mtype_del(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
continue;
data = ahash_data(n, j, dsize);
memcpy(tmp->value + k * dsize, data, dsize);
-   set_bit(j, tmp->used);
+   set_bit(k, tmp->used);
k++;
}
tmp->pos = k;
-- 
2.1.4



[PATCH 6/8] netfilter: xt_hashlimit: Fix integer divide round to zero.

2017-02-23 Thread Pablo Neira Ayuso
From: Alban Browaeys 

Diving the divider by the multiplier before applying to the input.
When this would "divide by zero", divide the multiplier by the divider
first then multiply the input by this value.

Currently user2creds outputs zero when input value is bigger than the
number of slices and  lower than scale.
This as then user input is applied an integer divide operation to
a number greater than itself (scale).
That rounds up to zero, then we multiply zero by the credits slice size.

  iptables -t filter -I INPUT --protocol tcp --match hashlimit
  --hashlimit 40/second --hashlimit-burst 20 --hashlimit-mode srcip
  --hashlimit-name syn-flood --jump RETURN

thus trigger the overflow detection code:

xt_hashlimit: overflow, try lower: 25000/20

(25000 as hashlimit avg and 20 the burst)

Here:
134217 slices of (HZ * CREDITS_PER_JIFFY) size.
50 is user input value
100 is XT_HASHLIMIT_SCALE_v2
gives: 0 as user2creds output
Setting burst to "1" typically solve the issue ...
but setting it to "40" does too !

This is on 32bit arch calling into revision 2 of hashlimit.

Signed-off-by: Alban Browaeys 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_hashlimit.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 10063408141d..84ad5ab34558 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -463,23 +463,16 @@ static u32 xt_hashlimit_len_to_chunks(u32 len)
 /* Precision saver. */
 static u64 user2credits(u64 user, int revision)
 {
-   if (revision == 1) {
-   /* If multiplying would overflow... */
-   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
-   /* Divide first. */
-   return div64_u64(user, XT_HASHLIMIT_SCALE)
-   * HZ * CREDITS_PER_JIFFY_v1;
-
-   return div64_u64(user * HZ * CREDITS_PER_JIFFY_v1,
-XT_HASHLIMIT_SCALE);
-   } else {
-   if (user > 0xULL / (HZ*CREDITS_PER_JIFFY))
-   return div64_u64(user, XT_HASHLIMIT_SCALE_v2)
-   * HZ * CREDITS_PER_JIFFY;
+   u64 scale = (revision == 1) ?
+   XT_HASHLIMIT_SCALE : XT_HASHLIMIT_SCALE_v2;
+   u64 cpj = (revision == 1) ?
+   CREDITS_PER_JIFFY_v1 : CREDITS_PER_JIFFY;
 
-   return div64_u64(user * HZ * CREDITS_PER_JIFFY,
-XT_HASHLIMIT_SCALE_v2);
-   }
+   /* Avoid overflow: divide the constant operands first */
+   if (scale >= HZ * cpj)
+   return div64_u64(user, div64_u64(scale, HZ * cpj));
+
+   return user * div64_u64(HZ * cpj, scale);
 }
 
 static u32 user2credits_byte(u32 user)
-- 
2.1.4



[PATCH 3/8] netfilter: ctnetlink: Fix regression in CTA_HELP processing

2017-02-23 Thread Pablo Neira Ayuso
From: Kevin Cernekee 

Prior to Linux 4.4, it was usually harmless to send a CTA_HELP attribute
containing the name of the current helper.  That is no longer the case:
as of Linux 4.4, if ctnetlink_change_helper() returns an error from
the ct->master check, processing of the request will fail, skipping the
NFQA_EXP attribute (if present).

This patch changes the behavior to improve compatibility with user
programs that expect the kernel interface to work the way it did prior
to Linux 4.4.  If a user program specifies CTA_HELP but the argument
matches the current conntrack helper name, ignore it instead of generating
an error.

Signed-off-by: Kevin Cernekee 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index bf04b7e9d6f7..6806b5e73567 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1478,14 +1478,23 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
struct nlattr *helpinfo = NULL;
int err;
 
-   /* don't change helper of sibling connections */
-   if (ct->master)
-   return -EBUSY;
-
err = ctnetlink_parse_help(cda[CTA_HELP], &helpname, &helpinfo);
if (err < 0)
return err;
 
+   /* don't change helper of sibling connections */
+   if (ct->master) {
+   /* If we try to change the helper to the same thing twice,
+* treat the second attempt as a no-op instead of returning
+* an error.
+*/
+   if (help && help->helper &&
+   !strcmp(help->helper->name, helpname))
+   return 0;
+   else
+   return -EBUSY;
+   }
+
if (!strcmp(helpname, "")) {
if (help && help->helper) {
/* we had a helper before ... */
-- 
2.1.4



[PATCH 8/8] netfilter: nfnetlink: remove static declaration from err_list

2017-02-23 Thread Pablo Neira Ayuso
From: Liping Zhang 

Otherwise, different subsys will race to access the err_list, with holding
the different nfnl_lock(subsys_id).

But this will not happen now, since ->call_batch is only implemented by
nftables, so the err_list is protected by nfnl_lock(NFNL_SUBSYS_NFTABLES).

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index a09fa9fd8f3d..6fa448478cba 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -279,7 +279,7 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
const struct nfnetlink_subsystem *ss;
const struct nfnl_callback *nc;
-   static LIST_HEAD(err_list);
+   LIST_HEAD(err_list);
u32 status;
int err;
 
-- 
2.1.4



[PATCH 2/8] netfilter: ctnetlink: Fix regression in CTA_STATUS processing

2017-02-23 Thread Pablo Neira Ayuso
From: Kevin Cernekee 

The libnetfilter_conntrack userland library always sets IPS_CONFIRMED
when building a CTA_STATUS attribute.  If this toggles the bit from
0->1, the parser will return an error.  On Linux 4.4+ this will cause any
NFQA_EXP attribute in the packet to be ignored.  This breaks conntrackd's
userland helpers because they operate on unconfirmed connections.

Instead of returning -EBUSY if the user program asks to modify an
unchangeable bit, simply ignore the change.

Also, fix the logic so that user programs are allowed to clear
the bits that they are allowed to change.

Signed-off-by: Kevin Cernekee 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/nf_conntrack_common.h |  4 
 net/netfilter/nf_conntrack_netlink.c   | 26 +-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h 
b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 6d074d14ee27..6a8e33dd4ecb 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -82,6 +82,10 @@ enum ip_conntrack_status {
IPS_DYING_BIT = 9,
IPS_DYING = (1 << IPS_DYING_BIT),
 
+   /* Bits that cannot be altered from userland. */
+   IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
+IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING),
+
/* Connection has fixed timeout. */
IPS_FIXED_TIMEOUT_BIT = 10,
IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT),
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 27540455dc62..bf04b7e9d6f7 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2270,6 +2270,30 @@ ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn 
*ct,
 }
 
 static int
+ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
+{
+   unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
+   unsigned long d = ct->status ^ status;
+
+   if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
+   /* SEEN_REPLY bit can only be set */
+   return -EBUSY;
+
+   if (d & IPS_ASSURED && !(status & IPS_ASSURED))
+   /* ASSURED bit can only be set */
+   return -EBUSY;
+
+   /* This check is less strict than ctnetlink_change_status()
+* because callers often flip IPS_EXPECTED bits when sending
+* an NFQA_CT attribute to the kernel.  So ignore the
+* unchangeable bits but do not error out.
+*/
+   ct->status = (status & ~IPS_UNCHANGEABLE_MASK) |
+(ct->status & IPS_UNCHANGEABLE_MASK);
+   return 0;
+}
+
+static int
 ctnetlink_glue_parse_ct(const struct nlattr *cda[], struct nf_conn *ct)
 {
int err;
@@ -2280,7 +2304,7 @@ ctnetlink_glue_parse_ct(const struct nlattr *cda[], 
struct nf_conn *ct)
return err;
}
if (cda[CTA_STATUS]) {
-   err = ctnetlink_change_status(ct, cda);
+   err = ctnetlink_update_status(ct, cda);
if (err < 0)
return err;
}
-- 
2.1.4



[PATCH 7/8] netfilter: nfnetlink_queue: fix NFQA_VLAN_MAX definition

2017-02-23 Thread Pablo Neira Ayuso
From: Ken-ichirou MATSUZAWA 

Should be - 1 as in other _MAX definitions.

Signed-off-by: Ken-ichirou MATSUZAWA 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/nfnetlink_queue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h 
b/include/uapi/linux/netfilter/nfnetlink_queue.h
index ae30841ff94e..d42f0396fe30 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -36,7 +36,7 @@ enum nfqnl_vlan_attr {
NFQA_VLAN_TCI,  /* __be16 skb htons(vlan_tci) */
__NFQA_VLAN_MAX,
 };
-#define NFQA_VLAN_MAX (__NFQA_VLAN_MAX + 1)
+#define NFQA_VLAN_MAX (__NFQA_VLAN_MAX - 1)
 
 enum nfqnl_attr_type {
NFQA_UNSPEC,
-- 
2.1.4



[PATCH 0/8] Netfilter fixes for net

2017-02-23 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Revisit warning logic when not applying default helper assignment.
   Jiri Kosina considers we are breaking existing setups and not warning
   our users accordinly now that automatic helper assignment has been
   turned off by default. So let's make him happy by spotting the warning
   by when we find a helper but we cannot attach, instead of warning on the
   former deprecated behaviour. Patch from Jiri Kosina.

2) Two patches to fix regression in ctnetlink interfaces with
   nfnetlink_queue. Specifically, perform more relaxed in CTA_STATUS
   and do not bail out if CTA_HELP indicates the same helper that we
   already have. Patches from Kevin Cernekee.

3) A couple of bugfixes for ipset via Jozsef Kadlecsik. Due to wrong
   index logic in hash set types and null pointer exception in the
   list:set type.

4) hashlimit bails out with correct userspace parameters due to wrong
   arithmetics in the code that avoids "divide by zero" when
   transforming the userspace timing in milliseconds to token credits.
   Patch from Alban Browaeys.

5) Fix incorrect NFQA_VLAN_MAX definition, patch from
   Ken-ichirou MATSUZAWA.

6) Don't not declare nfnetlink batch error list as static, since this
   may be used by several subsystems at the same time. Patch from
   Liping Zhang.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit cafe8df8b9bc9aa3dffa827c1a6757c6cd36f657:

  net: phy: Fix lack of reference count on PHY driver (2017-02-02 22:59:43 
-0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 3ef767e5cbd405abfd01339c7e5daaf98e037be2:

  Merge branch 'master' of git://blackhole.kfki.hu/nf (2017-02-21 14:01:05 
+0100)


Alban Browaeys (1):
  netfilter: xt_hashlimit: Fix integer divide round to zero.

Jiri Kosina (1):
  netfilter: nf_ct_helper: warn when not applying default helper assignment

Jozsef Kadlecsik (1):
  Fix bug: sometimes valid entries in hash:* types of sets were evicted

Ken-ichirou MATSUZAWA (1):
  netfilter: nfnetlink_queue: fix NFQA_VLAN_MAX definition

Kevin Cernekee (2):
  netfilter: ctnetlink: Fix regression in CTA_STATUS processing
  netfilter: ctnetlink: Fix regression in CTA_HELP processing

Liping Zhang (1):
  netfilter: nfnetlink: remove static declaration from err_list

Pablo Neira Ayuso (1):
  Merge branch 'master' of git://blackhole.kfki.hu/nf

Vishwanath Pai (1):
  netfilter: ipset: Null pointer exception in ipset list:set

 include/uapi/linux/netfilter/nf_conntrack_common.h |  4 ++
 include/uapi/linux/netfilter/nfnetlink_queue.h |  2 +-
 net/netfilter/ipset/ip_set_hash_gen.h  |  2 +-
 net/netfilter/ipset/ip_set_list_set.c  |  9 +++--
 net/netfilter/nf_conntrack_helper.c| 39 +---
 net/netfilter/nf_conntrack_netlink.c   | 43 +++---
 net/netfilter/nfnetlink.c  |  2 +-
 net/netfilter/xt_hashlimit.c   | 25 +
 8 files changed, 86 insertions(+), 40 deletions(-)


[PATCH 5/8] netfilter: ipset: Null pointer exception in ipset list:set

2017-02-23 Thread Pablo Neira Ayuso
From: Vishwanath Pai 

If we use before/after to add an element to an empty list it will cause
a kernel panic.

$> cat crash.restore
create a hash:ip
create b hash:ip
create test list:set timeout 5 size 4
add test b before a

$> ipset -R < crash.restore

Executing the above will crash the kernel.

Signed-off-by: Vishwanath Pai 
Reviewed-by: Josh Hunt 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_list_set.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index 51077c53d76b..178d4eba013b 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -260,11 +260,14 @@ list_set_uadd(struct ip_set *set, void *value, const 
struct ip_set_ext *ext,
else
prev = e;
}
+
+   /* If before/after is used on an empty set */
+   if ((d->before > 0 && !next) ||
+   (d->before < 0 && !prev))
+   return -IPSET_ERR_REF_EXIST;
+
/* Re-add already existing element */
if (n) {
-   if ((d->before > 0 && !next) ||
-   (d->before < 0 && !prev))
-   return -IPSET_ERR_REF_EXIST;
if (!flag_exist)
return -IPSET_ERR_EXIST;
/* Update extensions */
-- 
2.1.4



[PATCH 1/8] netfilter: nf_ct_helper: warn when not applying default helper assignment

2017-02-23 Thread Pablo Neira Ayuso
From: Jiri Kosina 

Commit 3bb398d925 ("netfilter: nf_ct_helper: disable automatic helper
assignment") is causing behavior regressions in firewalls, as traffic
handled by conntrack helpers is now by default not passed through even
though it was before due to missing CT targets (which were not necessary
before this commit).

The default had to be switched off due to security reasons [1] [2] and
therefore should stay the way it is, but let's be friendly to firewall
admins and issue a warning the first time we're in situation where packet
would be likely passed through with the old default but we're likely going
to drop it on the floor now.

Rewrite the code a little bit as suggested by Linus, so that we avoid
spaghettiing the code even more -- namely the whole decision making
process regarding helper selection (either automatic or not) is being
separated, so that the whole logic can be simplified and code (condition)
duplication reduced.

[1] https://cansecwest.com/csw12/conntrack-attack.pdf
[2] https://home.regit.org/netfilter-en/secure-use-of-helpers/

Signed-off-by: Jiri Kosina 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_helper.c | 39 -
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 7341adf7059d..6dc44d9b4190 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -188,6 +188,26 @@ nf_ct_helper_ext_add(struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper_ext_add);
 
+static struct nf_conntrack_helper *
+nf_ct_lookup_helper(struct nf_conn *ct, struct net *net)
+{
+   if (!net->ct.sysctl_auto_assign_helper) {
+   if (net->ct.auto_assign_helper_warned)
+   return NULL;
+   if (!__nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple))
+   return NULL;
+   pr_info("nf_conntrack: default automatic helper assignment "
+   "has been turned off for security reasons and CT-based "
+   " firewall rule not found. Use the iptables CT target "
+   "to attach helpers instead.\n");
+   net->ct.auto_assign_helper_warned = 1;
+   return NULL;
+   }
+
+   return __nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+}
+
+
 int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
  gfp_t flags)
 {
@@ -213,21 +233,14 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct 
nf_conn *tmpl,
}
 
help = nfct_help(ct);
-   if (net->ct.sysctl_auto_assign_helper && helper == NULL) {
-   helper = 
__nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
-   if (unlikely(!net->ct.auto_assign_helper_warned && helper)) {
-   pr_info("nf_conntrack: automatic helper "
-   "assignment is deprecated and it will "
-   "be removed soon. Use the iptables CT target "
-   "to attach helpers instead.\n");
-   net->ct.auto_assign_helper_warned = true;
-   }
-   }
 
if (helper == NULL) {
-   if (help)
-   RCU_INIT_POINTER(help->helper, NULL);
-   return 0;
+   helper = nf_ct_lookup_helper(ct, net);
+   if (helper == NULL) {
+   if (help)
+   RCU_INIT_POINTER(help->helper, NULL);
+   return 0;
+   }
}
 
if (help == NULL) {
-- 
2.1.4



Re: [PATCH] uapi: fix linux/rds.h userspace compilation errors

2017-02-23 Thread Sergei Shtylyov

Hello!

On 2/23/2017 4:13 AM, Dmitry V. Levin wrote:


Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:

/usr/include/linux/rds.h:198:2: error: unknown type name 'u8'
  u8 rx_traces;
/usr/include/linux/rds.h:199:2: error: unknown type name 'u8'
  u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:203:2: error: unknown type name 'u8'
  u8 rx_traces;
/usr/include/linux/rds.h:204:2: error: unknown type name 'u8'
  u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:205:2: error: unknown type name 'u64'
  u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];

Fixes: 3289025a("RDS: add receive message trace used by application")


   Need at least 12 hex digits and a space before (.


Signed-off-by: Dmitry V. Levin 

[...]

MBR, Sergei



Re: Software loopback with phy 88E1116R and marvell MV78100 gbe

2017-02-23 Thread Paolo Minazzi
On Thu, Feb 23, 2017 at 11:59 AM, Andrew Lunn  wrote:
>> I tried to do the same things on 88E1116R, setting the but 14 of reg 0.
>> But If I do it I lose the link, and the test program does not work.
>> I tried to force the link in software, but seems the controller send
>> packets but it is not able to receive them.
>> Is possibile to do such a software loopback on 88E1116R ?
>
> Hi Paolo
>
> What you are talking about here is MAC loopback. Packets are looped
> back at the MAC level. The copper side will be left idle, and so the
> link will be lost. This explains why you are seeing link down..

Hi Andrew,
if I understand correctly there are 3 type of loopback.
[1] loopback at the MAC level
[2] loopback at the phy level
[3] loopback with a physical loopback cable

[1] is enabled programming ethernet registers
[2] is enabled programming the PHY
[3] is done at hardware level with a physical loopback cable

> What you might need to do is extend marvell_update_link() to check if
> MAC loopback is happening, and if so, say the link is up.

I agree. For example some driver  (also marvell driver) check the link before
do a TX. If there is not a good link the TX is dropped.
So I have to force link up with a bit in a register (or hacking the
driver) to permit TX.

> But this is all a bit questionable. How are you setting the PHY into
> loopback? I guess the rest of the stack has no idea this is happening,
> in particular the phy state machine. What happens when it comes out of
> loopback? How is autoneg kicked off.

Tried to set the BIT14 of REG0 (LOOPBACK).
Tried also changing BIT12 of REG0 (autoneg anabled and disabled).
Tried to reset the PHY (BIT15 of REG0).
Nothing works,

I have 3 ethernet different card. With 2 of them I can do the software
loopback, both with MAC and PHY loopback.
With marvell gbe I'm not able. There is not MAC loopback (not
documented). There is only PHY loopback.
The physical loopback (with a cable) works correctly.

> You might want to think about the big picture, and how this can be
> incorporated into ethtool, and phylib. MAC loopback is pretty much
> standard, so you should be able to solve this for all PHYs, not just
> Marvell.

It should be,
Paolo


[PATCH v2] uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors

2017-02-23 Thread Dmitry V. Levin
Include  in uapi/linux/seg6.h to fix the following
linux/seg6.h userspace compilation error:

/usr/include/linux/seg6.h:31:18: error: array type has incomplete element type 
'struct in6_addr'
  struct in6_addr segments[0];

Include  in uapi/linux/seg6_iptunnel.h to fix
the following linux/seg6_iptunnel.h userspace compilation error:

/usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete 
element type 'struct ipv6_sr_hdr'
  struct ipv6_sr_hdr srh[0];

Fixes: a50a05f497a2 ("ipv6: sr: add missing Kbuild export for header files")
Signed-off-by: Dmitry V. Levin 
---
v2: fixed "Fixes:" line

 include/uapi/linux/seg6.h  | 1 +
 include/uapi/linux/seg6_iptunnel.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/uapi/linux/seg6.h b/include/uapi/linux/seg6.h
index 61df8d3..7278511 100644
--- a/include/uapi/linux/seg6.h
+++ b/include/uapi/linux/seg6.h
@@ -15,6 +15,7 @@
 #define _UAPI_LINUX_SEG6_H
 
 #include 
+#include  /* For struct in6_addr. */
 
 /*
  * SRH
diff --git a/include/uapi/linux/seg6_iptunnel.h 
b/include/uapi/linux/seg6_iptunnel.h
index 7a7183d..b6e5a0a 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -14,6 +14,8 @@
 #ifndef _UAPI_LINUX_SEG6_IPTUNNEL_H
 #define _UAPI_LINUX_SEG6_IPTUNNEL_H
 
+#include /* For struct ipv6_sr_hdr. */
+
 enum {
SEG6_IPTUNNEL_UNSPEC,
SEG6_IPTUNNEL_SRH,
-- 
ldv


[PATCH v2] uapi: fix linux/rds.h userspace compilation errors

2017-02-23 Thread Dmitry V. Levin
Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:

/usr/include/linux/rds.h:198:2: error: unknown type name 'u8'
  u8 rx_traces;
/usr/include/linux/rds.h:199:2: error: unknown type name 'u8'
  u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:203:2: error: unknown type name 'u8'
  u8 rx_traces;
/usr/include/linux/rds.h:204:2: error: unknown type name 'u8'
  u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
/usr/include/linux/rds.h:205:2: error: unknown type name 'u64'
  u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];

Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
Signed-off-by: Dmitry V. Levin 
Acked-by: Santosh Shilimkar 
---
v2: fixed "Fixes:" line

 include/uapi/linux/rds.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 47c03ca..198892b 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -195,14 +195,14 @@ enum rds_message_rxpath_latency {
 };
 
 struct rds_rx_trace_so {
-   u8 rx_traces;
-   u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
+   __u8 rx_traces;
+   __u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
 };
 
 struct rds_cmsg_rx_trace {
-   u8 rx_traces;
-   u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
-   u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
+   __u8 rx_traces;
+   __u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
+   __u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
 };
 
 /*
-- 
ldv


Re: netfilter: nft_ct: add zone id set support

2017-02-23 Thread Florian Westphal
Geert Uytterhoeven  wrote:
> On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
>  wrote:
> > Web:
> > https://git.kernel.org/torvalds/c/edee4f1e92458299505ff007733f676b00c516a1
> > Commit: edee4f1e92458299505ff007733f676b00c516a1
> > Parent: 5c178d81b69f08ca3195427a6ea9a46d9af23127
> > Refname:refs/heads/master
> > Author: Florian Westphal 
> > AuthorDate: Fri Feb 3 13:35:50 2017 +0100
> > Committer:  Pablo Neira Ayuso 
> > CommitDate: Wed Feb 8 14:16:23 2017 +0100
> >
> Unlike for the other cases of the switch statement, "len" is not initialized
> here...
> 
> > +   break;
> > priv->sreg = nft_parse_register(tb[NFTA_CT_SREG]);
> > err = nft_validate_register_load(priv->sreg, len);
> 
> ... and used here, which may lead to spurious failures of
> nft_validate_register_load().

Yes, Dan reported this and a patch is queued at
http://patchwork.ozlabs.org/patch/727573/

Pablo, any reason why this is still waiting?
Do you want me to run more tests?



Re: [PATCH net 3/6] net/mlx5e: Do not reduce LRO WQE size when not using build_skb

2017-02-23 Thread Saeed Mahameed
On Wed, Feb 22, 2017 at 9:45 PM, Alexei Starovoitov
 wrote:
> On Wed, Feb 22, 2017 at 7:20 AM, Saeed Mahameed  wrote:
>> From: Tariq Toukan 
>>
>> When rq_type is Striding RQ, no room of SKB_RESERVE is needed
>> as SKB allocation is not done via build_skb.
>>
>> Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size")
>> Signed-off-by: Tariq Toukan 
>> Signed-off-by: Saeed Mahameed 
>
> why this one is a bug fix?
> Sound like an optimization from commit log.

It is a regression since  ("net/mlx5e: Slightly reduce hardware LRO size").
And we see due to this a small drop in HW LRO performance.
We just fixed the LRO size to be the same as it was before the
offending patch for striding RQ case.


Re: create drivers/net/mdio and move mdio drivers into it

2017-02-23 Thread Andrew Lunn
On Thu, Feb 23, 2017 at 09:51:17AM +, YUAN Linyu wrote:
> 
> 
> > -Original Message-
> > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> > On Behalf Of Andrew Lunn
> > Sent: Thursday, February 23, 2017 5:30 PM
> > To: YUAN Linyu
> > Cc: Florian Fainelli; David S . Miller; netdev@vger.kernel.org; 
> > cug...@163.com
> > Subject: Re: create drivers/net/mdio and move mdio drivers into it
> > 
> > > Big picture is we can remove struct mii_bus,
> > 
> > So if you remove this, how do you represent MII as a bus? It is a bus,
> > clause 22 allows up to 32 devices on it, and i have boards with more
> > than 8 devices on the bus. Clause 44 allows many more devices on the
> > bus.
> > 
> add a phy device list to mdio_device.

Hi Yuan

And where do you put the bus name? The bus operations? The mutex for
accessing devices on the bus? The bus is a device, so you need a
device structure somewhere, to make it part of the device
hierarchy. Where do you put that?

I think it is time to stop this discussion, and you need to submit
patches we can review, how you think this would work. That will make
you look at the details.

Andrew


Re: netfilter: nft_ct: add zone id set support

2017-02-23 Thread Pablo Neira Ayuso
On Thu, Feb 23, 2017 at 12:34:35PM +0100, Florian Westphal wrote:
> Geert Uytterhoeven  wrote:
> > On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
> >  wrote:
> > > Web:
> > > https://git.kernel.org/torvalds/c/edee4f1e92458299505ff007733f676b00c516a1
> > > Commit: edee4f1e92458299505ff007733f676b00c516a1
> > > Parent: 5c178d81b69f08ca3195427a6ea9a46d9af23127
> > > Refname:refs/heads/master
> > > Author: Florian Westphal 
> > > AuthorDate: Fri Feb 3 13:35:50 2017 +0100
> > > Committer:  Pablo Neira Ayuso 
> > > CommitDate: Wed Feb 8 14:16:23 2017 +0100
> > >
> > Unlike for the other cases of the switch statement, "len" is not initialized
> > here...
> > 
> > > +   break;
> > > priv->sreg = nft_parse_register(tb[NFTA_CT_SREG]);
> > > err = nft_validate_register_load(priv->sreg, len);
> > 
> > ... and used here, which may lead to spurious failures of
> > nft_validate_register_load().
> 
> Yes, Dan reported this and a patch is queued at
> http://patchwork.ozlabs.org/patch/727573/
> 
> Pablo, any reason why this is still waiting?

I just flushing out my nf.git tree via pull request.

Once these changes are pulled, I'll fetch recent net-next changes that
were just merged via net. Then, I'll pick this so we can calm down
these compilation warnings.

Are you OK with this procedure? Thanks!


Re: netfilter: nft_ct: add zone id set support

2017-02-23 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Thu, Feb 23, 2017 at 12:34:35PM +0100, Florian Westphal wrote:
> > Yes, Dan reported this and a patch is queued at
> > http://patchwork.ozlabs.org/patch/727573/
> > 
> > Pablo, any reason why this is still waiting?
> 
> I just flushing out my nf.git tree via pull request.
> 
> Once these changes are pulled, I'll fetch recent net-next changes that
> were just merged via net. Then, I'll pick this so we can calm down
> these compilation warnings.
> 
> Are you OK with this procedure? Thanks!

Sure.


Re: [PATCH] uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors

2017-02-23 Thread Sergei Shtylyov

On 2/23/2017 4:13 AM, Dmitry V. Levin wrote:


Include  in uapi/linux/seg6.h to fix the following
linux/seg6.h userspace compilation error:

/usr/include/linux/seg6.h:31:18: error: array type has incomplete element type 
'struct in6_addr'
  struct in6_addr segments[0];

Include  in uapi/linux/seg6_iptunnel.h to fix
the following linux/seg6_iptunnel.h userspace compilation error:

/usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete 
element type 'struct ipv6_sr_hdr'
  struct ipv6_sr_hdr srh[0];

Fixes: a50a05f4("ipv6: sr: add missing Kbuild export for header files")


   12 digits and space before ( as well.


Signed-off-by: Dmitry V. Levin 

[...]

MBR, Sergei



[PATCH iproute2 master] {f,m}_bpf: dump tag over insns

2017-02-23 Thread Daniel Borkmann
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.

Signed-off-by: Daniel Borkmann 
---
 tc/f_bpf.c | 9 +
 tc/m_bpf.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/tc/f_bpf.c b/tc/f_bpf.c
index c4764d8..df8a259 100644
--- a/tc/f_bpf.c
+++ b/tc/f_bpf.c
@@ -216,6 +216,15 @@ static int bpf_print_opt(struct filter_util *qu, FILE *f,
bpf_print_ops(f, tb[TCA_BPF_OPS],
  rta_getattr_u16(tb[TCA_BPF_OPS_LEN]));
 
+   if (tb[TCA_BPF_TAG]) {
+   SPRINT_BUF(b);
+
+   fprintf(f, "tag %s ",
+   hexstring_n2a(RTA_DATA(tb[TCA_BPF_TAG]),
+ RTA_PAYLOAD(tb[TCA_BPF_TAG]),
+ b, sizeof(b)));
+   }
+
if (tb[TCA_BPF_POLICE]) {
fprintf(f, "\n");
tc_print_police(f, tb[TCA_BPF_POLICE]);
diff --git a/tc/m_bpf.c b/tc/m_bpf.c
index f043ae4..1ddc334 100644
--- a/tc/m_bpf.c
+++ b/tc/m_bpf.c
@@ -177,6 +177,15 @@ static int bpf_print_opt(struct action_util *au, FILE *f, 
struct rtattr *arg)
fprintf(f, " ");
}
 
+   if (tb[TCA_ACT_BPF_TAG]) {
+   SPRINT_BUF(b);
+
+   fprintf(f, "tag %s ",
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_BPF_TAG]),
+ RTA_PAYLOAD(tb[TCA_ACT_BPF_TAG]),
+ b, sizeof(b)));
+   }
+
fprintf(f, "default-action %s\n", action_n2a(parm->action));
fprintf(f, "\tindex %u ref %d bind %d", parm->index, parm->refcnt,
parm->bindcnt);
-- 
1.9.3



[PATCH net] sctp: deny peeloff operation on asocs with threads sleeping on it

2017-02-23 Thread Marcelo Ricardo Leitner
commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.

As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.

Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.

This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).

Joint work with Xin Long.

Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov 
Cc: Ben Hutchings 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
Hi, please consider this one for -stable too. Thanks

 net/sctp/socket.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 
1b5d669e30292a57ed57dd920d81be2a57f97b22..d04a8b66098c8a574642b026bff990ac64c21468
 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4734,6 +4734,12 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, 
struct socket **sockp)
if (!asoc)
return -EINVAL;
 
+   /* If there is a thread waiting on more sndbuf space for
+* sending on this asoc, it cannot be peeled.
+*/
+   if (waitqueue_active(&asoc->wait))
+   return -EBUSY;
+
/* An association cannot be branched off from an already peeled-off
 * socket, nor is this supported for tcp style sockets.
 */
@@ -7426,8 +7432,6 @@ static int sctp_wait_for_sndbuf(struct sctp_association 
*asoc, long *timeo_p,
 */
release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
-   if (sk != asoc->base.sk)
-   goto do_error;
lock_sock(sk);
 
*timeo_p = current_timeo;
-- 
2.9.3



[PATCH] RDS: IB: fix ifnullfree.cocci warnings

2017-02-23 Thread kbuild test robot
net/rds/ib.c:115:2-7: WARNING: NULL check before freeing functions like kfree, 
debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe 
consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Signed-off-by: Fengguang Wu 
---

 ib.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -111,8 +111,7 @@ static void rds_ib_dev_free(struct work_
kfree(i_ipaddr);
}
 
-   if (rds_ibdev->vector_load)
-   kfree(rds_ibdev->vector_load);
+   kfree(rds_ibdev->vector_load);
 
kfree(rds_ibdev);
 }


[PATCH net] vxlan: correctly validate VXLAN ID against VXLAN_VID_MASK

2017-02-23 Thread Matthias Schiffer
The incorrect check caused an off-by-one error: the maximum VID 0xff
was unusable.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Matthias Schiffer 
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 556953f53437..f89428fb7389 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2675,7 +2675,7 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[])
 
if (data[IFLA_VXLAN_ID]) {
__u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
-   if (id >= VXLAN_VID_MASK)
+   if (id & ~VXLAN_VID_MASK)
return -ERANGE;
}
 
-- 
2.11.1



[PATCH] net: stmmac: unify registers dumps methods

2017-02-23 Thread Corentin Labbe
The stmmac driver have two methods for registers dumps: via ethtool and
at init (if NETIF_MSG_HW is enabled).

It is better to keep only one method, ethtool, since the other was ugly.

This patch convert all dump_regs() function from "printing regs" to
"fill the reg_space used by ethtool".

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/common.h   |  4 +-
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   | 10 +--
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c| 16 ++---
 .../net/ethernet/stmicro/stmmac/dwmac100_core.c| 30 +++--
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c | 15 ++---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 12 +---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   | 78 +++---
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   | 21 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  5 --
 9 files changed, 71 insertions(+), 120 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index daaafa9..7552775 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -416,7 +416,7 @@ struct stmmac_dma_ops {
/* Configure the AXI Bus Mode Register */
void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
/* Dump DMA registers */
-   void (*dump_regs) (void __iomem *ioaddr);
+   void (*dump_regs)(void __iomem *ioaddr, u32 *reg_space);
/* Set tx/rx threshold in the csr6 register
 * An invalid value enables the store-and-forward mode */
void (*dma_mode)(void __iomem *ioaddr, int txmode, int rxmode,
@@ -456,7 +456,7 @@ struct stmmac_ops {
/* Enable RX Queues */
void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
/* Dump MAC registers */
-   void (*dump_regs)(struct mac_device_info *hw);
+   void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
/* Handle extra events on specific interrupts hw dependent */
int (*host_irq_status)(struct mac_device_info *hw,
   struct stmmac_extra_stats *x);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 91c8926..19b9b30 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -92,17 +92,13 @@ static int dwmac1000_rx_ipc_enable(struct mac_device_info 
*hw)
return !!(value & GMAC_CONTROL_IPC);
 }
 
-static void dwmac1000_dump_regs(struct mac_device_info *hw)
+static void dwmac1000_dump_regs(struct mac_device_info *hw, u32 *reg_space)
 {
void __iomem *ioaddr = hw->pcsr;
int i;
-   pr_info("\tDWMAC1000 regs (base addr = 0x%p)\n", ioaddr);
 
-   for (i = 0; i < 55; i++) {
-   int offset = i * 4;
-   pr_info("\tReg No. %d (offset 0x%x): 0x%08x\n", i,
-   offset, readl(ioaddr + offset));
-   }
+   for (i = 0; i < 55; i++)
+   reg_space[i] = readl(ioaddr + i * 4);
 }
 
 static void dwmac1000_set_umac_addr(struct mac_device_info *hw,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index fbaec0f..d3654a4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -201,18 +201,14 @@ static void dwmac1000_dma_operation_mode(void __iomem 
*ioaddr, int txmode,
writel(csr6, ioaddr + DMA_CONTROL);
 }
 
-static void dwmac1000_dump_dma_regs(void __iomem *ioaddr)
+static void dwmac1000_dump_dma_regs(void __iomem *ioaddr, u32 *reg_space)
 {
int i;
-   pr_info(" DMA registers\n");
-   for (i = 0; i < 22; i++) {
-   if ((i < 9) || (i > 17)) {
-   int offset = i * 4;
-   pr_err("\t Reg No. %d (offset 0x%x): 0x%08x\n", i,
-  (DMA_BUS_MODE + offset),
-  readl(ioaddr + DMA_BUS_MODE + offset));
-   }
-   }
+
+   for (i = 0; i < 22; i++)
+   if ((i < 9) || (i > 17))
+   reg_space[DMA_BUS_MODE / 4 + i] =
+   readl(ioaddr + DMA_BUS_MODE + i * 4);
 }
 
 static void dwmac1000_get_hw_feature(void __iomem *ioaddr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 8ab5189..e370cce 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -40,28 +40,18 @@ static void dwmac100_core_init(struct mac_device_info *hw, 
int mtu)
 #endif
 }
 
-static void dwmac100_dump_mac_regs(struct mac_device_info *hw)
+static void dwmac100_dump_mac_regs(struct mac_device_info *hw, u32 *reg_space)
 {
vo

Re: Software loopback with phy 88E1116R and marvell MV78100 gbe

2017-02-23 Thread Andrew Lunn
On Thu, Feb 23, 2017 at 12:28:10PM +0100, Paolo Minazzi wrote:
> On Thu, Feb 23, 2017 at 11:59 AM, Andrew Lunn  wrote:
> >> I tried to do the same things on 88E1116R, setting the but 14 of reg 0.
> >> But If I do it I lose the link, and the test program does not work.
> >> I tried to force the link in software, but seems the controller send
> >> packets but it is not able to receive them.
> >> Is possibile to do such a software loopback on 88E1116R ?
> >
> > Hi Paolo
> >
> > What you are talking about here is MAC loopback. Packets are looped
> > back at the MAC level. The copper side will be left idle, and so the
> > link will be lost. This explains why you are seeing link down..
> 
> Hi Andrew,
> if I understand correctly there are 3 type of loopback.
> [1] loopback at the MAC level
> [2] loopback at the phy level
> [3] loopback with a physical loopback cable

Plus there is [4] loopback in the other direction. I.E, everything
received on the copper is sent straight back out the copper.

The PHY datasheet often call [2] above MAC loopback, since what is
receives from the MAC it loops back to the MAC. [4] can be called line
loopback, what comes in on the line is looped back to the line.
 
> [1] is enabled programming ethernet registers
> [2] is enabled programming the PHY
> [3] is done at hardware level with a physical loopback cable
> 
> > What you might need to do is extend marvell_update_link() to check if
> > MAC loopback is happening, and if so, say the link is up.
> 
> I agree. For example some driver  (also marvell driver) check the link before
> do a TX. If there is not a good link the TX is dropped.
> So I have to force link up with a bit in a register (or hacking the
> driver) to permit TX.

Try bit 10, register 16 for the Marvell PHY. This should force the
link up.
 
 Andrew


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Pablo Neira Ayuso
On Wed, Feb 22, 2017 at 01:47:17PM -0800, Tom Herbert wrote:
> On Wed, Feb 22, 2017 at 1:29 PM, Or Gerlitz  wrote:
> > On Thu, Feb 16, 2017 at 11:58 PM, Andreas Schultz  wrote:
> >> Hi Or,
> >> - On Feb 16, 2017, at 3:59 PM, Or Gerlitz ogerl...@mellanox.com wrote:
> >>
> >>> Generate the source udp header according to the flow represented by
> >>> the packet we are encapsulating, as done for other udp tunnels. This
> >>> helps on the receiver side to apply RSS spreading.
> >>
> >> This might work for GTPv0-U, However, for GTPv1-U this could interfere
> >> with error handling in the user space control process when the UDP port
> >> extension  header is used in error indications.
> >
> >
> > in the document you posted there's this quote "The source IP and port
> > have no meaning and can change at any time" -- I assume it refers to
> > v0? can we identify in the kernel code that we're on v0 and have the
> > patch come into play?
> >
> >> 3GPP TS 29.281 Rel 13, section 5.2.2.1 defines the UDP port extension and
> >> section 7.3.1 says that the UDP source port extension can be used to
> >> mitigate DOS attacks. This would IMHO imply that the user space control
> >> process needs to know the TEID to UDP source port mapping.
> >
> >> The other question is, on what is this actually hashing. When I understand
> >> the code correctly, this will hash on the source/destination of the orignal
> >> flow. I would expect that a SGSN/SGW/eNodeB would like the keep flow
> >> processing on a per TEID base, so the port hashing should be base on the 
> >> TEID.
> >
> > is it possible for packets belonging to the same TCP session or UDP
> > "pseudo session" (given pair of src/dst ip/port) to be encapsulated
> > using different TEID?
> >
> > hashing on the TEID imposes a harder requirement on the NIC HW vs.
> > just UDP based RSS.
> 
> This shouldn't be taken as a HW requirement and it's unlikely we'd add
> explicit GTP support in flow_dissector. If we can't get entropy in the
> UDP source port then IPv6 flow label is a potential alternative (so
> that should be supported in NICs for RSS).

According to specs, section 4.4.2.3 Encapsulated T-PDU, TS 29.281.

"The UDP Source Port is a locally allocated port number at the sending
GTP-U entity."

Older specs that refer to GTP-U such as TS 09.60 and TS 29.060 also
state the same.

So Or patch looks fine to me.


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Or Gerlitz

On 2/23/2017 3:49 PM, Pablo Neira Ayuso wrote:

"The UDP Source Port is a locally allocated port number at the sending
GTP-U entity."


and the proposed patch goes in that direction, agree?



Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Pablo Neira Ayuso
On Thu, Feb 23, 2017 at 10:35:56AM +0100, Andreas Schultz wrote:
> - On Feb 22, 2017, at 10:47 PM, Tom Herbert t...@herbertland.com wrote:
[...]
> > This shouldn't be taken as a HW requirement and it's unlikely we'd add
> > explicit GTP support in flow_dissector. If we can't get entropy in the
> > UDP source port then IPv6 flow label is a potential alternative (so
> > that should be supported in NICs for RSS).
> > 
> > I'll also reiterate my previous point about the need for GTP testing--
> > in order for us to be able to evaluate the GTP datapath for things
> > like performance or how they withstand against DDOS we really need an
> > easy way to isolate the datapath.
> 
> GTP as specified is very unsecure by definition. It is meant to be run
> only on *private* mobile carrier and intra mobile carrier EPC networks.
> Running it openly on the public internet would be extremly foolish.
> 
> There are some mechanisms in GTPv1-C on how to handle overload and
> more extensive mechanisms in GTPv2-C for overload handling. The basic
> guiding principle is to simply drop any traffic that it can't handle.
> 
> Anyhow, I havn't seen anything in 3GPP or GSMA documents that deals
> with DDOS.
> 
> There are guidelines like the GSMA's IR.88 that describe how the intra
> carrier roaming should work and what security measures should be
> implemented.
> 
> Traffic coming in at Gi/SGi or form the UE could create a DDOS on tunnel.
> However, on the UE side you still have the RAN (eNODE, SGSN, S-GW) or
> an ePDG that has to apply QoS and thereby limit traffic. On the Gi/SGi
> side side you have the PCEF that does the same.
> 
> So in a complete 3GPP node (GGSN, P-GW) that uses the GTP tunnel
> implementation, malicious traffic should be blocked before it can reach
> the tunnel.
> 
> And as I stated before, the GTP tunnel module is not supposed to be
> use without any of those components. So the DDOS concern should not
> be handled at the tunnel level.

I think that Tom's point is that this tunnel driver will have to deal
with DDOS scenarios anyway, because reality is that you can't always
block it before it can reach the tunnel.

Or's patch helps us deal with this scenario.


Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

2017-02-23 Thread Tariq Toukan



On 23/02/2017 4:18 AM, Alexander Duyck wrote:

On Wed, Feb 22, 2017 at 6:06 PM, Eric Dumazet  wrote:

On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:



Right but you were talking about using both halves one after the
other.  If that occurs you have nothing left that you can reuse.  That
was what I was getting at.  If you use up both halves you end up
having to unmap the page.



You must have misunderstood me.

Once we use both halves of a page, we _keep_ the page, we do not unmap
it.

We save the page pointer in a ring buffer of pages.
Call it the 'quarantine'

When we _need_ to replenish the RX desc, we take a look at the oldest
entry in the quarantine ring.

If page count is 1 (or pagecnt_bias if needed) -> we immediately reuse
this saved page.

If not, _then_ we unmap and release the page.


Okay, that was what I was referring to when I mentioned a "hybrid
between the mlx5 and the Intel approach".  Makes sense.



Indeed, in mlx5 Striding RQ (mpwqe) we do something similar.
Our NIC (ConnectX-4 LX and newer) knows to write multiple _consecutive_ 
packets into the same RX buffer (page).

AFAIU, this is what Eric suggests to do in SW in mlx4.

Here are the main characteristics of our page-cache in mlx5:
1) FIFO (for higher chances of an available page).
2) If the page-cache head is busy, it is not freed. This has its pros 
and cons. We might reconsider.
3) Pages in cache have no page-to-WQE assignment (WQE is for Work Queue 
Element, == RX descriptor). They are shared for all WQEs of an RQ and 
might be used by different WQEs in different rounds.
4) Cache size is smaller than suggested, we would happily increase it to 
reflect a whole ring.


Still, performance tests over mlx5 show that on high load we quickly end 
up allocating pages as the stack does not release its ref count on time.

Increasing the cache size helps of course.
As there's no _fixed_ fair size that guarantees the availability of 
pages every ring cycle, reflecting a ring size can help, and would give 
the opportunity for users to tune their performance by setting their 
ring size according to how powerful their CPUs are, and what traffic 
type/load they're running.



Note that we would have received 4096 frames before looking at the page
count, so there is high chance both halves were consumed.

To recap on x86 :

2048 active pages would be visible by the device, because 4096 RX desc
would contain dma addresses pointing to the 4096 halves.

And 2048 pages would be in the reserve.


The buffer info layout for something like that would probably be
pretty interesting.  Basically you would be doubling up the ring so
that you handle 2 Rx descriptors per a single buffer info since you
would automatically know that it would be an even/odd setup in terms
of the buffer offsets.

If you get a chance to do something like that I would love to know the
result.  Otherwise if I get a chance I can try messing with i40e or
ixgbe some time and see what kind of impact it has.


The whole idea behind using only half the page per descriptor is to
allow us to loop through the ring before we end up reusing it again.
That buys us enough time that usually the stack has consumed the frame
before we need it again.



The same will happen really.

Best maybe is for me to send the patch ;)


I think I have the idea now.  However patches are always welcome..  :-)



Same here :-)


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Andreas Schultz
- On Feb 23, 2017, at 2:49 PM, pablo pa...@netfilter.org wrote:

> On Wed, Feb 22, 2017 at 01:47:17PM -0800, Tom Herbert wrote:
>> On Wed, Feb 22, 2017 at 1:29 PM, Or Gerlitz  wrote:
>> > On Thu, Feb 16, 2017 at 11:58 PM, Andreas Schultz  
>> > wrote:
>> >> Hi Or,
>> >> - On Feb 16, 2017, at 3:59 PM, Or Gerlitz ogerl...@mellanox.com wrote:
>> >>
>> >>> Generate the source udp header according to the flow represented by
>> >>> the packet we are encapsulating, as done for other udp tunnels. This
>> >>> helps on the receiver side to apply RSS spreading.
>> >>
>> >> This might work for GTPv0-U, However, for GTPv1-U this could interfere
>> >> with error handling in the user space control process when the UDP port
>> >> extension  header is used in error indications.
>> >
>> >
>> > in the document you posted there's this quote "The source IP and port
>> > have no meaning and can change at any time" -- I assume it refers to
>> > v0? can we identify in the kernel code that we're on v0 and have the
>> > patch come into play?
>> >
>> >> 3GPP TS 29.281 Rel 13, section 5.2.2.1 defines the UDP port extension and
>> >> section 7.3.1 says that the UDP source port extension can be used to
>> >> mitigate DOS attacks. This would IMHO imply that the user space control
>> >> process needs to know the TEID to UDP source port mapping.
>> >
>> >> The other question is, on what is this actually hashing. When I understand
>> >> the code correctly, this will hash on the source/destination of the 
>> >> orignal
>> >> flow. I would expect that a SGSN/SGW/eNodeB would like the keep flow
>> >> processing on a per TEID base, so the port hashing should be base on the 
>> >> TEID.
>> >
>> > is it possible for packets belonging to the same TCP session or UDP
>> > "pseudo session" (given pair of src/dst ip/port) to be encapsulated
>> > using different TEID?
>> >
>> > hashing on the TEID imposes a harder requirement on the NIC HW vs.
>> > just UDP based RSS.
>> 
>> This shouldn't be taken as a HW requirement and it's unlikely we'd add
>> explicit GTP support in flow_dissector. If we can't get entropy in the
>> UDP source port then IPv6 flow label is a potential alternative (so
>> that should be supported in NICs for RSS).
> 
> According to specs, section 4.4.2.3 Encapsulated T-PDU, TS 29.281.
> 
> "The UDP Source Port is a locally allocated port number at the sending
> GTP-U entity."
> 
> Older specs that refer to GTP-U such as TS 09.60 and TS 29.060 also
> state the same.

It is absolutely valid the choose any sending port you want. I only
think you should know which port you did send on.

TS 29.281, Sect. 5.2.2.1 defines the UDP port extension to be used
in error indications. It provides the UDP source port of a G-PDU
that triggered an error.

If the send side does not know which port it uses to send, how
can it use this indication to correlate an error? That's the reason
I thought it would be better to add the UDP source port to the
PDP context and allow the control path to assign the source port
on context creation.

Of course, this header is optional and the receiving side is not
required to use it.

About the RSS spreading in the receive side, I would think that
a receiver would prefer to process all packets that belong to a
give TEID with the same receive instance. So keeping the UDP
source port for a given TEID stable would be beneficial. As far
as I understand it, the hash used in the patch uses the source
and destination values from the original flow. This would mean
that GTP packets that belong to the same TEID would end up with
different UDP source ports.

So what about this as a compromise, we dd a UDP source port field
to the PDP context, it defaults to 0 (zero), the control instance
can optionally initialize this field, when we hit the xmit code
and the port is non zero, use that value, if it is zero use the hash?

Regards
Andreas

> 
> So Or patch looks fine to me.


ip route get oddly fails for ipv6

2017-02-23 Thread Leon Goldberg
Hey,

For some reason ip route get fails to retrieve ipv6 routes, although
the subject destination is indeed reachable:

[root@localhost tests]# ip -6 route
2001:1::/64 dev eth1  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev eth1  proto kernel  metric 256

[root@localhost tests]# ping6 2001:1::1
PING 2001:1::1(2001:1::1) 56 data bytes
64 bytes from 2001:1::1: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 2001:1::1: icmp_seq=2 ttl=64 time=0.055 ms
64 bytes from 2001:1::1: icmp_seq=3 ttl=64 time=0.055 ms
^C
--- 2001:1::1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.035/0.048/0.055/0.011 ms

[root@localhost tests]# ip route get 2001:1::1
[root@localhost tests]#
[root@localhost tests]# ip -6 route get 2001:1::1
[root@localhost tests]#

ipv4 ip route get works fine. Any hints?

Thanks,
Leon


Re: ip route get oddly fails for ipv6

2017-02-23 Thread Leon Goldberg
Forgot to mention:

[root@localhost tests]# uname -r
3.10.0-514.2.2.el7.x86_64
[root@localhost tests]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)


On Thu, Feb 23, 2017 at 4:43 PM, Leon Goldberg  wrote:
> Hey,
>
> For some reason ip route get fails to retrieve ipv6 routes, although
> the subject destination is indeed reachable:
>
> [root@localhost tests]# ip -6 route
> 2001:1::/64 dev eth1  proto kernel  metric 256
> fe80::/64 dev eth0  proto kernel  metric 256
> fe80::/64 dev eth1  proto kernel  metric 256
>
> [root@localhost tests]# ping6 2001:1::1
> PING 2001:1::1(2001:1::1) 56 data bytes
> 64 bytes from 2001:1::1: icmp_seq=1 ttl=64 time=0.035 ms
> 64 bytes from 2001:1::1: icmp_seq=2 ttl=64 time=0.055 ms
> 64 bytes from 2001:1::1: icmp_seq=3 ttl=64 time=0.055 ms
> ^C
> --- 2001:1::1 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.035/0.048/0.055/0.011 ms
>
> [root@localhost tests]# ip route get 2001:1::1
> [root@localhost tests]#
> [root@localhost tests]# ip -6 route get 2001:1::1
> [root@localhost tests]#
>
> ipv4 ip route get works fine. Any hints?
>
> Thanks,
> Leon


Re: [PATCH net v5] bpf: add helper to compare network namespaces

2017-02-23 Thread Eric W. Biederman
David Ahern  writes:

> On 2/19/17 9:17 PM, Eric W. Biederman wrote:
 @@ -2597,6 +2598,39 @@ static const struct bpf_func_proto 
 bpf_xdp_event_output_proto = {
.arg5_type  = ARG_CONST_STACK_SIZE,
   };

 +BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
 +{
 +  return netns_cmp(sock_net(sk), ns_dev, ns_ino);
 +}
>>>
>>> Is there anything that speaks against doing the comparison itself
>>> outside of the helper? Meaning, the helper would get a buffer
>>> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
>>> and fills both out with the netns info belonging to the sk/skb.
>> 
>> Yes.  The dev/ino pair is not necessarily unique so it is not at all
>> clear that the returned value would be what the program is expecting.
>
> How does the comparison inside a helper change the fact that a dev and
> inode number are compared? ie., inside or outside of a helper, the end
> result is that a bpf program has a dev/inode pair that is compared to
> that of a socket or skb.

With the comparison inside a helper if the kernel has more than one
dev+inode that maps to the same network namespace (as we had just
recently until the inodes were moved from proc to nsfs) then the helper
can lookup the the dev+inode and see which network namespace it maps
to and then compare network namespaces.  So logically the helper really
is doing more than more than comparing dev+inode.

With the helper doing the comparison the kernel implementation details
can change and everything will continue to work.

> Ideally, it would be nice to have a bpf equivalent to net_eq(), but it
> is not possible from a practical perspective to have bpf programs load a
> namespace reference (address really) from a given pid or fd.

Which is why I am not at all keen on support for maps etc.  It is not
clear how to do something more elegant.

If there was an environmental restriction on the bpf program where we
knew all references had to be from the perspective of the initial set of
namespaces there would be a unique dev+inode we could deal with.  But
again that obvious solution that works so often elsewhere appears to be
a non-starter here.

Eric



Re: [PATCH 00/35] treewide trivial patches converting pr_warning to pr_warn

2017-02-23 Thread Rob Herring
On Fri, Feb 17, 2017 at 1:11 AM, Joe Perches  wrote:
> There are ~4300 uses of pr_warn and ~250 uses of the older
> pr_warning in the kernel source tree.
>
> Make the use of pr_warn consistent across all kernel files.
>
> This excludes all files in tools/ as there is a separate
> define pr_warning for that directory tree and pr_warn is
> not used in tools/.
>
> Done with 'sed s/\bpr_warning\b/pr_warn/' and some emacsing.
>
> Miscellanea:
>
> o Coalesce formats and realign arguments
>
> Some files not compiled - no cross-compilers
>
> Joe Perches (35):
>   alpha: Convert remaining uses of pr_warning to pr_warn
>   ARM: ep93xx: Convert remaining uses of pr_warning to pr_warn
>   arm64: Convert remaining uses of pr_warning to pr_warn
>   arch/blackfin: Convert remaining uses of pr_warning to pr_warn
>   ia64: Convert remaining use of pr_warning to pr_warn
>   powerpc: Convert remaining uses of pr_warning to pr_warn
>   sh: Convert remaining uses of pr_warning to pr_warn
>   sparc: Convert remaining use of pr_warning to pr_warn
>   x86: Convert remaining uses of pr_warning to pr_warn
>   drivers/acpi: Convert remaining uses of pr_warning to pr_warn
>   block/drbd: Convert remaining uses of pr_warning to pr_warn
>   gdrom: Convert remaining uses of pr_warning to pr_warn
>   drivers/char: Convert remaining use of pr_warning to pr_warn
>   clocksource: Convert remaining use of pr_warning to pr_warn
>   drivers/crypto: Convert remaining uses of pr_warning to pr_warn
>   fmc: Convert remaining use of pr_warning to pr_warn
>   drivers/gpu: Convert remaining uses of pr_warning to pr_warn
>   drivers/ide: Convert remaining uses of pr_warning to pr_warn
>   drivers/input: Convert remaining uses of pr_warning to pr_warn
>   drivers/isdn: Convert remaining uses of pr_warning to pr_warn
>   drivers/macintosh: Convert remaining uses of pr_warning to pr_warn
>   drivers/media: Convert remaining use of pr_warning to pr_warn
>   drivers/mfd: Convert remaining uses of pr_warning to pr_warn
>   drivers/mtd: Convert remaining uses of pr_warning to pr_warn
>   drivers/of: Convert remaining uses of pr_warning to pr_warn
>   drivers/oprofile: Convert remaining uses of pr_warning to pr_warn
>   drivers/platform: Convert remaining uses of pr_warning to pr_warn
>   drivers/rapidio: Convert remaining use of pr_warning to pr_warn
>   drivers/scsi: Convert remaining use of pr_warning to pr_warn
>   drivers/sh: Convert remaining use of pr_warning to pr_warn
>   drivers/tty: Convert remaining uses of pr_warning to pr_warn
>   drivers/video: Convert remaining uses of pr_warning to pr_warn
>   kernel/trace: Convert remaining uses of pr_warning to pr_warn
>   lib: Convert remaining uses of pr_warning to pr_warn
>   sound/soc: Convert remaining uses of pr_warning to pr_warn

Where's the removal of pr_warning so we don't have more sneak in?

Rob


Re: [PATCH net] vxlan: correctly validate VXLAN ID against VXLAN_VID_MASK

2017-02-23 Thread Jiri Benc
On Thu, 23 Feb 2017 13:59:02 +0100, Matthias Schiffer wrote:
> The incorrect check caused an off-by-one error: the maximum VID 0xff
> was unusable.
> 
> Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
> Signed-off-by: Matthias Schiffer 
> ---
>  drivers/net/vxlan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 556953f53437..f89428fb7389 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2675,7 +2675,7 @@ static int vxlan_validate(struct nlattr *tb[], struct 
> nlattr *data[])
>  
>   if (data[IFLA_VXLAN_ID]) {
>   __u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
> - if (id >= VXLAN_VID_MASK)
> + if (id & ~VXLAN_VID_MASK)
>   return -ERANGE;
>   }
>  

"if (id >= VXLAN_N_VID)" would be cleaner and the meaning more obvious.

Thanks,

 Jiri


Re: [PATCH net 0/6] Mellanox mlx5e fixes for 4.11-rc1

2017-02-23 Thread David Miller
From: Saeed Mahameed 
Date: Wed, 22 Feb 2017 17:20:10 +0200

> This series includes some important bug fixes for mlx5e driver.
> 
> Three misc fixes:
> From Mohamad, compilation fix on s390 system
> From Me, A fix for driver unload when switchdev mode is on. 
> From Tariq, HW LRO frag size optimization for when build_skb is not used
> (striding RQ mode).
> 
> Three CQE compression related fixes:
> Two fixes from Tariq and I, to correctly setup CQE compression
> parameters on driver load and on arbitrary user modifications.
> Last patch, fixes a very critical issue that was originally reported
> by Tom, where the driver reported csum errors or even page ref issues
> for when cqe compression is enabled and rapidly active.
> 
> For your convenience this series was generated on top of net-next branch:
> 005c3490e9db ('Revert "ath10k: Search SMBIOS for OEM board file extension"')

Series applied.

> for -stable:
> net/mlx5e: Register/unregister vport representors on interface (for kernel >= 
> 4.9)
> net/mlx5e: Do not reduce LRO WQE size when not using build_skb (for kernel >= 
> 4.9)
> net/mlx5e: Fix broken CQE compression initialization (for kernel >= 4.9)
> net/mlx5e: Update MPWQE stride size when modifying CQE compress state (for 
> kernel >= 4.7)
> net/mlx5e: Fix wrong CQE decompression (for kernel >= 4.7)

Queued up, thanks.


Re: [PATCH RFC v2 00/12] socket sendmsg MSG_ZEROCOPY

2017-02-23 Thread David Miller
From: Willem de Bruijn 
Date: Wed, 22 Feb 2017 11:38:49 -0500

> From: Willem de Bruijn 
> 
> RFCv2:
> 
> I have received a few requests for status and rebased code of this
> feature. We have been running this code internally, discovering and
> fixing various bugs. With net-next closed, now seems like a good time
> to share an updated patchset with fixes. The rebase from RFCv1/v4.2
> was mostly straightforward: mainly iov_iter changes. Full changelog:

I've been over this series once or twice and generally speaking it looks
fine to me.


Re: [PATCH] uapi: fix linux/llc.h userspace compilation error

2017-02-23 Thread David Miller
From: "Dmitry V. Levin" 
Date: Thu, 23 Feb 2017 01:38:26 +0300

> Include  to fix the following linux/llc.h userspace
> compilation error:
> 
> /usr/include/linux/llc.h:26:27: error: 'IFHWADDRLEN' undeclared here (not in 
> a function)
>   unsigned char   sllc_mac[IFHWADDRLEN];
> 
> Signed-off-by: Dmitry V. Levin 

Applied.


Re: [PATCH] uapi: fix linux/ip6_tunnel.h userspace compilation errors

2017-02-23 Thread David Miller
From: "Dmitry V. Levin" 
Date: Thu, 23 Feb 2017 01:38:03 +0300

> Include  and  to fix the following
> linux/ip6_tunnel.h userspace compilation errors:
> 
> /usr/include/linux/ip6_tunnel.h:23:12: error: 'IFNAMSIZ' undeclared here (not 
> in a function)
>   char name[IFNAMSIZ]; /* name of tunnel device */
> /usr/include/linux/ip6_tunnel.h:30:18: error: field 'laddr' has incomplete 
> type
>   struct in6_addr laddr; /* local tunnel end-point address */
> 
> Signed-off-by: Dmitry V. Levin 

Applied.


Re: [PATCH v2] uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors

2017-02-23 Thread David Miller
From: "Dmitry V. Levin" 
Date: Thu, 23 Feb 2017 14:30:34 +0300

> Include  in uapi/linux/seg6.h to fix the following
> linux/seg6.h userspace compilation error:
> 
> /usr/include/linux/seg6.h:31:18: error: array type has incomplete element 
> type 'struct in6_addr'
>   struct in6_addr segments[0];
> 
> Include  in uapi/linux/seg6_iptunnel.h to fix
> the following linux/seg6_iptunnel.h userspace compilation error:
> 
> /usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete 
> element type 'struct ipv6_sr_hdr'
>   struct ipv6_sr_hdr srh[0];
> 
> Fixes: a50a05f497a2 ("ipv6: sr: add missing Kbuild export for header files")
> Signed-off-by: Dmitry V. Levin 
> ---
> v2: fixed "Fixes:" line

Applied.


Re: [PATCH v2] uapi: fix linux/rds.h userspace compilation errors

2017-02-23 Thread David Miller
From: "Dmitry V. Levin" 
Date: Thu, 23 Feb 2017 14:35:23 +0300

> Consistently use types from linux/types.h to fix the following
> linux/rds.h userspace compilation errors:
> 
> /usr/include/linux/rds.h:198:2: error: unknown type name 'u8'
>   u8 rx_traces;
> /usr/include/linux/rds.h:199:2: error: unknown type name 'u8'
>   u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
> /usr/include/linux/rds.h:203:2: error: unknown type name 'u8'
>   u8 rx_traces;
> /usr/include/linux/rds.h:204:2: error: unknown type name 'u8'
>   u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX];
> /usr/include/linux/rds.h:205:2: error: unknown type name 'u64'
>   u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
> 
> Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
> Signed-off-by: Dmitry V. Levin 
> Acked-by: Santosh Shilimkar 
> ---
> v2: fixed "Fixes:" line

Applied.


Re: [patch net] lib: Remove string from parman config selection

2017-02-23 Thread David Miller
From: Jiri Pirko 
Date: Thu, 23 Feb 2017 10:57:45 +0100

> From: Jiri Pirko 
> 
> As reported by Geert, remove the string so the user does not see this
> config option. The option is explicitly selected only as a dependency of
> in-kernel users.
> 
> Reported-by: Geert Uytterhoeven 
> Fixes: 44091d29f207 ("lib: Introduce priority array area manager")
> Signed-off-by: Jiri Pirko 

Applied.


Re: [PATCH 1/1] forcedeth: Remove return from a void function

2017-02-23 Thread David Miller
From: Zhu Yanjun 
Date: Thu, 23 Feb 2017 03:22:49 -0500

> In a void function, it is not necessary to append a return statement in it.
> 
> Signed-off-by: Zhu Yanjun 

Applied.


Re: [PATCH net V3 0/5] mlx4 misc fixes

2017-02-23 Thread David Miller
From: Tariq Toukan 
Date: Thu, 23 Feb 2017 12:02:40 +0200

> This patchset contains misc bug fixes from Eric Dumazet and our team
> to the mlx4 Core and Eth drivers.
> 
> Series generated against net commit:
> eee2faabc63d tcp: account for ts offset only if tsecr not zero
 ...
> v3:
> * Rebased, conflict solved.
> 
> v2:
> * Added Eric's fix (patch 5/5).

Series applied, thanks.


Re: [PATCH 0/8] Netfilter fixes for net

2017-02-23 Thread David Miller
From: Pablo Neira Ayuso 
Date: Thu, 23 Feb 2017 12:14:01 +0100

> The following patchset contains Netfilter fixes for your net tree,
> they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks a lot!


Re: [PATCH] bpf: fix spelling mistake: "proccessed" -> "processed"

2017-02-23 Thread David Miller
From: Colin King 
Date: Thu, 23 Feb 2017 00:20:53 +

> From: Colin Ian King 
> 
> trivial fix to spelling mistake in verbose log message
> 
> Signed-off-by: Colin Ian King 

Applied.


Re: [PATCH net] vxlan: don't allow overwrite of config src addr

2017-02-23 Thread Jiri Benc
On Mon, 20 Feb 2017 17:25:28 +, Brian Russell wrote:
> When using IPv6 transport and a default dst, a pointer to the configured
> source address is passed into the route lookup. If no source address is
> configured, then the value is overwritten.
> 
> IPv6 route lookup ignores egress ifindex match if the source adress is set,
> so if egress ifindex match is desired, the source address must be passed
> as any. The overwrite breaks this for subsequent lookups.
> 
> Avoid this by copying the configured address to an existing stack variable
> and pass a pointer to that instead.

Seems there were another patches applied between the time you created
the patch and sent it and it doesn't apply.

Feel free to add to v2:

Acked-by: Jiri Benc 

 Jiri


RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE

2017-02-23 Thread David Laight
From: Xin Long
> Sent: 23 February 2017 03:46
> On Tue, Feb 21, 2017 at 10:27 PM, David Laight  
> wrote:
> > From: Xin Long
> >> Sent: 18 February 2017 17:53
> >> This patch is to add support for MSG_MORE on sctp.
> >>
> >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> >> creating datamsg according to the send flag. sctp_packet_can_append_data
> >> then uses it to decide if the chunks of this msg will be sent at once or
> >> delay it.
> >>
> >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> >> affect other chunks' bundling.
> >
> > I thought about that and decided that the MSG_MORE flag on the last data
> > chunk was the only one that mattered.
> > Indeed looking at any others is broken.
> >
> > Consider what happens if you have two small chunks queued, the first
> > with MSG_MORE set, the second with it clear.
> >
> > I think that sctp_outq_flush() will look at the first chunk and decide it
> > doesn't need to do anything because sctp_packet_transmit_chunk()
> > returns SCTP_XMIT_DELAY.
> > The data chunk with MSG_MORE clear won't even be looked at.
> > So the data will never be sent.

> It's not that bad as you thought, in sctp_packet_can_append_data():
> when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> would be still sent out.

One of us isn't understanding the other :-)

IIRC sctp_packet_can_append_data() is called for the first queued
data chunk in order to decide whether to generate a message that
consists only of data chunks.
If it returns SCTP_XMIT_OK then a message is built collecting the
rest of the queued data chunks (until the window fills).

So if I send a message with MSG_MORE set (on an idle connection)
SCTP_XMIT_DELAY is returned and a message isn't sent.

I now send a second small message, this time with MSG_MORE clear.
The message is queued, then the code looks to see if it can send anything.

sctp_packet_can_append_data() is called for the first queued chunk.
Since it has force_delay set SCTP_XMIT_DELAY is returned and no
message is built.
The second message isn't even looked at.

> What MSG_MORE flag actually does is ignore inflight == 0 and
> sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> it has to respect the original logic (like !chunk->msg->can_delay
> || !sctp_packet_empty(packet) || ...)
> 
> To delay the chunks with MSG_MORE set even when inflight is 0
> it especially important here for users.

I'm not too worried about that.
Sending the first message was a cheap way to ensure something got
sent if the application lied and didn't send a subsequent message.

The change has hit Linus's tree, I'll should be able to test that
and confirm what I think is going on.

David



[PATCH net 1/1] tipc: move premature initilalization of stack variables

2017-02-23 Thread Jon Maloy
In the function tipc_rcv() we initialize a couple of stack variables
from the message header before that same header has been validated.
In rare cases when the arriving header is non-linar, the validation
function itself may linearize the buffer by calling skb_may_pull(),
while the wrongly initialized stack fields are not updated accordingly.

We fix this in this commit.

Reported-by: Matthew Wong 
Signed-off-by: Jon Maloy 
---
 net/tipc/node.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index e9295fa..4512e83 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1505,19 +1505,21 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, 
struct tipc_bearer *b)
 {
struct sk_buff_head xmitq;
struct tipc_node *n;
-   struct tipc_msg *hdr = buf_msg(skb);
-   int usr = msg_user(hdr);
+   struct tipc_msg *hdr;
int bearer_id = b->identity;
struct tipc_link_entry *le;
-   u16 bc_ack = msg_bcast_ack(hdr);
u32 self = tipc_own_addr(net);
-   int rc = 0;
+   int usr, rc = 0;
+   u16 bc_ack;
 
__skb_queue_head_init(&xmitq);
 
-   /* Ensure message is well-formed */
+   /* Ensure message is well-formed before touching the header */
if (unlikely(!tipc_msg_validate(skb)))
goto discard;
+   hdr = buf_msg(skb);
+   usr = msg_user(hdr);
+   bc_ack = msg_bcast_ack(hdr);
 
/* Handle arrival of discovery or broadcast packet */
if (unlikely(msg_non_seq(hdr))) {
-- 
2.7.4



[PATCH net v2] vxlan: correctly validate VXLAN ID against VXLAN_N_VID

2017-02-23 Thread Matthias Schiffer
The incorrect check caused an off-by-one error: the maximum VID 0xff
was unusable.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Matthias Schiffer 
---
v2: check against VXLAN_N_VID instead of changing operator


 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 556953f53437..268c2a12e61d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2675,7 +2675,7 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[])
 
if (data[IFLA_VXLAN_ID]) {
__u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
-   if (id >= VXLAN_VID_MASK)
+   if (id >= VXLAN_N_VID)
return -ERANGE;
}
 
-- 
2.11.1



Re: [PATCH net] vxlan: don't allow overwrite of config src addr

2017-02-23 Thread Jiri Benc
On Mon, 20 Feb 2017 17:25:28 +, Brian Russell wrote:
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2019,7 +2019,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
>  
>   dst_port = rdst->remote_port ? rdst->remote_port : 
> vxlan->cfg.dst_port;
>   vni = rdst->remote_vni;
> - src = &vxlan->cfg.saddr;
> + local_ip = vxlan->cfg.saddr;
>   dst_cache = &rdst->dst_cache;
>   md->gbp = skb->mark;
>   ttl = vxlan->cfg.ttl;
> @@ -2052,7 +2052,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
>   dst = &remote_ip;
>   dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
>   vni = tunnel_id_to_key32(info->key.tun_id);
> - src = &local_ip;
>   dst_cache = &info->dst_cache;
>   if (info->options_len)
>   md = ip_tunnel_info_opts(info);
> @@ -2061,6 +2060,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
>   label = info->key.label;
>   udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM);
>   }
> + src = &local_ip;

Btw, you can simplify this even more, get rid of src completely and
just use local_ip.

And please also add to v2:

Fixes: 272d96a5ab10 ("net: vxlan: lwt: Use source ip address during route 
lookup.")

 Jiri


[PATCH] wireless/atmel: remove time_t usage

2017-02-23 Thread Alexandre Belloni
last_qual never really holds a time. It only holds jiffies. Make it the
same type as jiffies.

Signed-off-by: Alexandre Belloni 
---
 drivers/net/wireless/atmel/atmel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/atmel/atmel.c 
b/drivers/net/wireless/atmel/atmel.c
index e12f62356fd1..27b110dc8cc6 100644
--- a/drivers/net/wireless/atmel/atmel.c
+++ b/drivers/net/wireless/atmel/atmel.c
@@ -513,7 +513,7 @@ struct atmel_private {
} station_state;
 
int operating_mode, power_mode;
-   time_t last_qual;
+   unsigned long last_qual;
int beacons_this_sec;
int channel;
int reg_domain, config_reg_domain;
-- 
2.11.0



Re: [PATCH net-next v4 4/7] gtp: consolidate gtp socket rx path

2017-02-23 Thread Tom Herbert
On Thu, Feb 23, 2017 at 1:19 AM, Andreas Schultz  wrote:
> Hi Tom,
>
> - On Feb 22, 2017, at 6:41 PM, Tom Herbert t...@herbertland.com wrote:
>
>> On Tue, Feb 21, 2017 at 2:18 AM, Andreas Schultz  wrote:
>>> Add network device to gtp context in preparation for splitting
>>> the TEID from the network device.
>>>
>>> Use this to rework the socker rx path. Move the common RX part
>>> of v0 and v1 into a helper. Also move the final rx part into
>>> that helper as well.
>>>
>> Andeas,
>>
>> How are these GTP kernel patches being tested?
>
> We rn each version in a test setup with a ePDG and a SGW that
> connects to a full GGSN/P-GW instance (based on erGW).
> We also run performance test (less often) with a commercial
> test software.
>
>> Is it possible to > create some sort of GTP network device
>> that separates out just the datapath for development in the
>> same way that VXLAN did this?
>
> We had this discussion about another patch:
> (http://marc.info/?l=linux-netdev&m=148611438811696&w=2)
>
> Currently the kernel module only supports the GGSN/P-GW side
> of the GTP tunnel. This is because we check the UE IP address
> in the GTP socket and use the destination IP in the network
> interface to find the PDP context.
>
> For a deployment in a real EPC, doing it the other way makes no
> sense with the current code. However for a test setup it makes
> perfect sense (either to use it as a driver to test other GTP
> nodes or to test out own implementation).
>
> So, I hope that we can integrate this soonish.
>
> libgtpnl contains two tools that be used for testing. gtp-link
> creates a network device and GTP sockets and keeps them alive.
> gtp-tunnel can then be used add PDP context to that. The only
> missing part for a bidirectional test setup is the above
> mentioned patch with the direction flag and support for that
> in the libgtpnl tools.
>
If there is a way for us to test this without setting up a full mobile
network please provide details on how to do that in the commit log.

> Adding static tunnel support into the kernel module in any form
> makes IMHO no sense. GTP as defined by 3GPP always need a control
> instance and there are much better options for static tunnel
> encapsulations.
>
That misses the point. From the kernel point of view GTP is just
another encapsulation protocol and we now have a lot of experience
with those. The problem is when you post patches to improve it or fix
issues, if there is no practical way to perform independent  analysis
or investigation. This makes GTP no different than a proprietary
protocol to us and that severely limits the value of trying to work
with the community.

Tom

> Andreas
>
>>
>> Tom
>>
>>> Signed-off-by: Andreas Schultz 
>>> ---
>>>  drivers/net/gtp.c | 80 
>>> ++-
>>>  1 file changed, 44 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
>>> index 961fb3c..fc0fff5 100644
>>> --- a/drivers/net/gtp.c
>>> +++ b/drivers/net/gtp.c
>>> @@ -58,6 +58,8 @@ struct pdp_ctx {
>>> struct in_addr  ms_addr_ip4;
>>> struct in_addr  sgsn_addr_ip4;
>>>
>>> +   struct net_device   *dev;
>>> +
>>> atomic_ttx_seq;
>>> struct rcu_head rcu_head;
>>>  };
>>> @@ -175,6 +177,40 @@ static bool gtp_check_src_ms(struct sk_buff *skb, 
>>> struct
>>> pdp_ctx *pctx,
>>> return false;
>>>  }
>>>
>>> +static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, unsigned int
>>> hdrlen,
>>> + bool xnet)
>>> +{
>>> +   struct pcpu_sw_netstats *stats;
>>> +
>>> +   if (!gtp_check_src_ms(skb, pctx, hdrlen)) {
>>> +   netdev_dbg(pctx->dev, "No PDP ctx for this MS\n");
>>> +   return 1;
>>> +   }
>>> +
>>> +   /* Get rid of the GTP + UDP headers. */
>>> +   if (iptunnel_pull_header(skb, hdrlen, skb->protocol, xnet))
>>> +   return -1;
>>> +
>>> +   netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
>>> +
>>> +   /* Now that the UDP and the GTP header have been removed, set up the
>>> +* new network header. This is required by the upper layer to
>>> +* calculate the transport header.
>>> +*/
>>> +   skb_reset_network_header(skb);
>>> +
>>> +   skb->dev = pctx->dev;
>>> +
>>> +   stats = this_cpu_ptr(pctx->dev->tstats);
>>> +   u64_stats_update_begin(&stats->syncp);
>>> +   stats->rx_packets++;
>>> +   stats->rx_bytes += skb->len;
>>> +   u64_stats_update_end(&stats->syncp);
>>> +
>>> +   netif_rx(skb);
>>> +   return 0;
>>> +}
>>> +
>>>  /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>>>  static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb,
>>>bool xnet)
>>> @@ -201,13 +237,7 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, 
>>> struct
>>> sk_buff *skb,
>>> retu

Re: [PATCH] RDS: IB: fix ifnullfree.cocci warnings

2017-02-23 Thread Santosh Shilimkar

On 2/23/2017 4:47 AM, kbuild test robot wrote:

net/rds/ib.c:115:2-7: WARNING: NULL check before freeing functions like kfree, 
debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe 
consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Signed-off-by: Fengguang Wu 
---

Acked-by: Santosh Shilimkar 


Re: [PATCH net v2] vxlan: correctly validate VXLAN ID against VXLAN_N_VID

2017-02-23 Thread Jiri Benc
On Thu, 23 Feb 2017 17:19:41 +0100, Matthias Schiffer wrote:
> The incorrect check caused an off-by-one error: the maximum VID 0xff
> was unusable.
> 
> Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
> Signed-off-by: Matthias Schiffer 
> ---
> v2: check against VXLAN_N_VID instead of changing operator

Acked-by: Jiri Benc 

Thanks!

 Jiri


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Tom Herbert
On Thu, Feb 23, 2017 at 6:00 AM, Pablo Neira Ayuso  wrote:
> On Thu, Feb 23, 2017 at 10:35:56AM +0100, Andreas Schultz wrote:
>> - On Feb 22, 2017, at 10:47 PM, Tom Herbert t...@herbertland.com wrote:
> [...]
>> > This shouldn't be taken as a HW requirement and it's unlikely we'd add
>> > explicit GTP support in flow_dissector. If we can't get entropy in the
>> > UDP source port then IPv6 flow label is a potential alternative (so
>> > that should be supported in NICs for RSS).
>> >
>> > I'll also reiterate my previous point about the need for GTP testing--
>> > in order for us to be able to evaluate the GTP datapath for things
>> > like performance or how they withstand against DDOS we really need an
>> > easy way to isolate the datapath.
>>
>> GTP as specified is very unsecure by definition. It is meant to be run
>> only on *private* mobile carrier and intra mobile carrier EPC networks.
>> Running it openly on the public internet would be extremly foolish.
>>
>> There are some mechanisms in GTPv1-C on how to handle overload and
>> more extensive mechanisms in GTPv2-C for overload handling. The basic
>> guiding principle is to simply drop any traffic that it can't handle.
>>
>> Anyhow, I havn't seen anything in 3GPP or GSMA documents that deals
>> with DDOS.
>>
>> There are guidelines like the GSMA's IR.88 that describe how the intra
>> carrier roaming should work and what security measures should be
>> implemented.
>>
>> Traffic coming in at Gi/SGi or form the UE could create a DDOS on tunnel.
>> However, on the UE side you still have the RAN (eNODE, SGSN, S-GW) or
>> an ePDG that has to apply QoS and thereby limit traffic. On the Gi/SGi
>> side side you have the PCEF that does the same.
>>
>> So in a complete 3GPP node (GGSN, P-GW) that uses the GTP tunnel
>> implementation, malicious traffic should be blocked before it can reach
>> the tunnel.
>>
>> And as I stated before, the GTP tunnel module is not supposed to be
>> use without any of those components. So the DDOS concern should not
>> be handled at the tunnel level.
>
> I think that Tom's point is that this tunnel driver will have to deal
> with DDOS scenarios anyway, because reality is that you can't always
> block it before it can reach the tunnel.
>
Right, if only we had a dime for every time someone thought their
perimeter security was sufficient only to be proven wrong!


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Pablo Neira Ayuso
On Thu, Feb 23, 2017 at 03:21:13PM +0100, Andreas Schultz wrote:
> - On Feb 23, 2017, at 2:49 PM, pablo pa...@netfilter.org wrote:
[...]
> > According to specs, section 4.4.2.3 Encapsulated T-PDU, TS 29.281.
> > 
> > "The UDP Source Port is a locally allocated port number at the sending
> > GTP-U entity."
> > 
> > Older specs that refer to GTP-U such as TS 09.60 and TS 29.060 also
> > state the same.
> 
> It is absolutely valid the choose any sending port you want. I only
> think you should know which port you did send on.
> 
> TS 29.281, Sect. 5.2.2.1 defines the UDP port extension to be used
> in error indications. It provides the UDP source port of a G-PDU
> that triggered an error.
> 
> If the send side does not know which port it uses to send, how
> can it use this indication to correlate an error? That's the reason
> I thought it would be better to add the UDP source port to the
> PDP context and allow the control path to assign the source port
> on context creation.
> 
> Of course, this header is optional and the receiving side is not
> required to use it.
> 
> About the RSS spreading in the receive side, I would think that
> a receiver would prefer to process all packets that belong to a
> give TEID with the same receive instance. So keeping the UDP
> source port for a given TEID stable would be beneficial. As far
> as I understand it, the hash used in the patch uses the source
> and destination values from the original flow. This would mean
> that GTP packets that belong to the same TEID would end up with
> different UDP source ports.

The receiver needs to scale up, and that happens if packets are
distributed between CPUs in a way that make sense. I don't think it
makes sense to pass packets that belong to the same tunnel to the same
CPU, this is exactly the scenario that makes DDOS easier to trigger.

> So what about this as a compromise, we dd a UDP source port field
> to the PDP context, it defaults to 0 (zero), the control instance
> can optionally initialize this field, when we hit the xmit code
> and the port is non zero, use that value, if it is zero use the hash?

You want to use this for your VRF concept? I would like that you take
the time to explain us your usecases. How are you going to use this
mapping between tunnels and UDP source ports? An explaination would be
better than searching at some optional (corner case) extension in the
specs whose usage is questionable.

In any case, I think we want a good default behaviour, and I think
Or's patch provides it.


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Tom Herbert
On Thu, Feb 23, 2017 at 6:21 AM, Andreas Schultz  wrote:
> - On Feb 23, 2017, at 2:49 PM, pablo pa...@netfilter.org wrote:
>
>> On Wed, Feb 22, 2017 at 01:47:17PM -0800, Tom Herbert wrote:
>>> On Wed, Feb 22, 2017 at 1:29 PM, Or Gerlitz  wrote:
>>> > On Thu, Feb 16, 2017 at 11:58 PM, Andreas Schultz  
>>> > wrote:
>>> >> Hi Or,
>>> >> - On Feb 16, 2017, at 3:59 PM, Or Gerlitz ogerl...@mellanox.com 
>>> >> wrote:
>>> >>
>>> >>> Generate the source udp header according to the flow represented by
>>> >>> the packet we are encapsulating, as done for other udp tunnels. This
>>> >>> helps on the receiver side to apply RSS spreading.
>>> >>
>>> >> This might work for GTPv0-U, However, for GTPv1-U this could interfere
>>> >> with error handling in the user space control process when the UDP port
>>> >> extension  header is used in error indications.
>>> >
>>> >
>>> > in the document you posted there's this quote "The source IP and port
>>> > have no meaning and can change at any time" -- I assume it refers to
>>> > v0? can we identify in the kernel code that we're on v0 and have the
>>> > patch come into play?
>>> >
>>> >> 3GPP TS 29.281 Rel 13, section 5.2.2.1 defines the UDP port extension and
>>> >> section 7.3.1 says that the UDP source port extension can be used to
>>> >> mitigate DOS attacks. This would IMHO imply that the user space control
>>> >> process needs to know the TEID to UDP source port mapping.
>>> >
>>> >> The other question is, on what is this actually hashing. When I 
>>> >> understand
>>> >> the code correctly, this will hash on the source/destination of the 
>>> >> orignal
>>> >> flow. I would expect that a SGSN/SGW/eNodeB would like the keep flow
>>> >> processing on a per TEID base, so the port hashing should be base on the 
>>> >> TEID.
>>> >
>>> > is it possible for packets belonging to the same TCP session or UDP
>>> > "pseudo session" (given pair of src/dst ip/port) to be encapsulated
>>> > using different TEID?
>>> >
>>> > hashing on the TEID imposes a harder requirement on the NIC HW vs.
>>> > just UDP based RSS.
>>>
>>> This shouldn't be taken as a HW requirement and it's unlikely we'd add
>>> explicit GTP support in flow_dissector. If we can't get entropy in the
>>> UDP source port then IPv6 flow label is a potential alternative (so
>>> that should be supported in NICs for RSS).
>>
>> According to specs, section 4.4.2.3 Encapsulated T-PDU, TS 29.281.
>>
>> "The UDP Source Port is a locally allocated port number at the sending
>> GTP-U entity."
>>
>> Older specs that refer to GTP-U such as TS 09.60 and TS 29.060 also
>> state the same.
>
> It is absolutely valid the choose any sending port you want. I only
> think you should know which port you did send on.
>
> TS 29.281, Sect. 5.2.2.1 defines the UDP port extension to be used
> in error indications. It provides the UDP source port of a G-PDU
> that triggered an error.
>
> If the send side does not know which port it uses to send, how
> can it use this indication to correlate an error? That's the reason
> I thought it would be better to add the UDP source port to the
> PDP context and allow the control path to assign the source port
> on context creation.
>
> Of course, this header is optional and the receiving side is not
> required to use it.
>
> About the RSS spreading in the receive side, I would think that
> a receiver would prefer to process all packets that belong to a
> give TEID with the same receive instance. So keeping the UDP
> source port for a given TEID stable would be beneficial. As far
> as I understand it, the hash used in the patch uses the source
> and destination values from the original flow. This would mean
> that GTP packets that belong to the same TEID would end up with
> different UDP source ports.
>
> So what about this as a compromise, we dd a UDP source port field
> to the PDP context, it defaults to 0 (zero), the control instance
> can optionally initialize this field, when we hit the xmit code
> and the port is non zero, use that value, if it is zero use the hash?
>
Sounds reasonable. It is typical in other UDP encapsulations to allow
the UDP sort port to be configured like this for static tunnels at
least. If you need to set the port based on a hash over atypical
values (like TEID) I suggest you still do jhash with randomized keys
to minimize information leakage.

> Regards
> Andreas
>
>>
>> So Or patch looks fine to me.


Re: [PATCH net-next v4 4/7] gtp: consolidate gtp socket rx path

2017-02-23 Thread Harald Welte
Hi Tom,

On Thu, Feb 23, 2017 at 08:28:47AM -0800, Tom Herbert wrote:

> If there is a way for us to test this without setting up a full mobile
> network please provide details on how to do that in the commit log.

There are different ways how to do this.  With the existing in-kernel
code (that lacks the "SGSN role" patch which would be for testing only),
you can e.g. use two relatively small C-language programs from the
openggnsn package [http://git.osmocom.org/openggsn/]:

* OpenGGSN with support for libgtpnl and the kernel GTP-U module
* sgsnemu (included in openggsn source tree) which implements a minimal
  SGSN-side of the tunnel.  It will
  * perform the GTP-C signaling required with OpenGGSN (which in turn
will then instruct the kernel to open a  tunnel via the netlink
interface).
  * start a tun/tap device for the "mobile station end" of the tunnel
perform en/decapsulation of data between that tun/tap device and GTP
in userspace

The wiki at https://osmocom.org/projects/openggsn/wiki/OpenGGSN contains
step-by-step instructions how to build and configure OpenGGSN with
support for kernel GTP-U. It's nothing more than autotools based
compile+install of libgtpnl followed by "./configure --enable-gtp-linux"
and "make" for OpenGGSN 

What is not described on the above page is how to use sgsnemu to
simulate the SGSN side, as within Osmocom we typically run the open
source OsmoSGSN (a more "real" SGSN).

So using two small binaries that can be compiled without much external
dependencies (and very few lines of configuration), it is possible to do
some functional testing of the kernel GTP module.  For performance
testing this of course won't work, as sgsnemu is running entirely in
userspace.

For performance testing, you would need a SGSN-side implementation of
GTP-U that performs equally well or better than the GGSN-side
implementation in the kernel.  One option is the patch that has recently
been submitted to netdev and which is under discussion.  However, then
you simply test one implementation against itself, which provides no
interoperability guarantees with other implementations, and thus also
limited in different regards.

> > Adding static tunnel support into the kernel module in any form
> > makes IMHO no sense. GTP as defined by 3GPP always need a control
> > instance and there are much better options for static tunnel
> > encapsulations.
> >
> That misses the point. From the kernel point of view GTP is just
> another encapsulation protocol and we now have a lot of experience
> with those. The problem is when you post patches to improve it or fix
> issues, if there is no practical way to perform independent  analysis
> or investigation. This makes GTP no different than a proprietary
> protocol to us and that severely limits the value of trying to work
> with the community.

I wholeheartedly agree.  That's one of the reasons why I recently posted
a RFC about what to do for (regression and other) testing of the kernel
GTP-U module.

I'll try to cook up some instructions extending
https://osmocom.org/projects/openggsn/wiki/OpenGGSN to cover also
sgsnemu for a basic use case of establishing one single tunnel.  That's
of course like a manual "HOWTO" and not yet anything that can be tested
automatically.

Regards,
Harald

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Harald Welte
Hi Tom,

On Thu, Feb 23, 2017 at 08:35:13AM -0800, Tom Herbert wrote:
> On Thu, Feb 23, 2017 at 6:00 AM, Pablo Neira Ayuso  
> wrote:
> > On Thu, Feb 23, 2017 at 10:35:56AM +0100, Andreas Schultz wrote:
> >> - On Feb 22, 2017, at 10:47 PM, Tom Herbert t...@herbertland.com wrote:
> >> So in a complete 3GPP node (GGSN, P-GW) that uses the GTP tunnel
> >> implementation, malicious traffic should be blocked before it can reach
> >> the tunnel.
> >>
> >> And as I stated before, the GTP tunnel module is not supposed to be
> >> use without any of those components. So the DDOS concern should not
> >> be handled at the tunnel level.
> >
> > I think that Tom's point is that this tunnel driver will have to deal
> > with DDOS scenarios anyway, because reality is that you can't always
> > block it before it can reach the tunnel.
> >
> Right, if only we had a dime for every time someone thought their
> perimeter security was sufficient only to be proven wrong!

Yes and no.  Welcome to the cellular world. Everything is designed in
a way that it assumes everyone can be trusted and that none of those
interfaces (except the radio interface) are ever exposed to the public.

There is really very little one can do to change that world.  It's like
the Internet in the early 1990ies, and they are reluctant to learn.  And
whenever a new system is designed (like the step from MAP to DIAMETER),
they make damn sure that all the security issues are inherited from the
previous standards to ensure interoperability ;)

I understand and support the motivation to design robust systsems even
in the presence of broken/ignorant specs, but I think this is one of the
situations where it is useless (and IMHO impossible) to do anything
about it.

-- 
- Harald Weltehttp://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH iproute2 master] {f,m}_bpf: dump tag over insns

2017-02-23 Thread Stephen Hemminger
On Thu, 23 Feb 2017 13:07:14 +0100
Daniel Borkmann  wrote:

> We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
> f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
> it when filter/actions are shown.
> 
> Signed-off-by: Daniel Borkmann 

Applied


Re: [PATCH iproute2] ip: Add support for MPLS netconf

2017-02-23 Thread Stephen Hemminger
On Tue, 21 Feb 2017 09:23:31 -0800
David Ahern  wrote:

> Add support for MPLS netconf to ip monitor and ip netconf commands.
> Changes to header files not included as those are typically pulled
> in my a header sync with the kernel.
> 
> Signed-off-by: David Ahern 

Applied


Re: [PATCH net-next v3 0/3] net: core: Two Helper function about socket information

2017-02-23 Thread Willem de Bruijn
On Wed, Feb 22, 2017 at 11:42 PM, Chenbo Feng
 wrote:
> From: Chenbo Feng 
>
> Introduce two eBpf helper function to get the socket cookie and
> socket uid for each packet. The helper function is useful when
> the *sk field inside sk_buff is not empty. These helper functions
> can be used on socket and uid based traffic monitoring programs.

Net-next is closed. But we can review as RFC.


Re: [PATCH iproute2] tc: flower: Fix parsing ip address

2017-02-23 Thread Stephen Hemminger
On Wed, 22 Feb 2017 16:05:01 +0200
Roi Dayan  wrote:

> Fix order of arguments when passed to __flower_parse_ip_addr.
> 
> Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
> Signed-off-by: Roi Dayan 
> Reviewed-by: Paul Blakey 

Applied


RE: [PATCH 00/11] Omni-Path Virtual Network Interface Controller (VNIC)

2017-02-23 Thread Harold Cook
Thank you! I guess we are on the right track...



--- Harold


-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
Behalf Of Vishwanathapura, Niranjana
Sent: Wednesday, February 22, 2017 11:07 PM
To: dledf...@redhat.com
Cc: linux-r...@vger.kernel.org; netdev@vger.kernel.org; 
dennis.dalessan...@intel.com; ira.we...@intel.com
Subject: [PATCH 00/11] Omni-Path Virtual Network Interface Controller (VNIC)

Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature 
supports Ethernet functionality over Omni-Path fabric by encapsulating the 
Ethernet packets between HFI nodes.

Architecture
=
The patterns of exchanges of Omni-Path encapsulated Ethernet packets involves 
one or more virtual Ethernet switches overlaid on the Omni-Path fabric 
topology. A subset of HFI nodes on the Omni-Path fabric are permitted to 
exchange encapsulated Ethernet packets across a particular virtual Ethernet 
switch. The virtual Ethernet switches are logical abstractions achieved by 
configuring the HFI nodes on the fabric for header generation and processing. 
In the simplest configuration all HFI nodes across the fabric exchange 
encapsulated Ethernet packets over a single virtual Ethernet switch. A virtual 
Ethernet switch, is effectively an independent Ethernet network. The 
configuration is performed by an Ethernet Manager (EM) which is part of the 
trusted Fabric Manager (FM) application. HFI nodes can have multiple VNICs each 
connected to a different virtual Ethernet switch. The below diagram presents a 
case of two virtual Ethernet switches with two HFI nodes.

 +---+
 |  Subnet/  |
 | Ethernet  |
 |  Manager  |
 +---+
/  /
  /   /
//
  / /
+-+  +--+
|  Virtual Ethernet Switch|  |  Virtual Ethernet Switch |
|  +-++-+ |  | +-++-+   |
|  | VPORT   ||  VPORT  | |  | |  VPORT  ||  VPORT  |   |
+--+-++-+-+  +-+-++-+---+
 | \/ |
 |   \/   |
 | \/ |
 |/  \|
 |  /  \  |
 +---++  +---++
 |   VNIC|VNIC|  |VNIC   |VNIC|
 +---++  +---++
 |  HFI   |  |  HFI   |
 ++  ++


The Omni-Path encapsulated Ethernet packet format is as described below.

Bits  Field

Quad Word 0:
0-19  SLID (lower 20 bits)
20-30 Length (in Quad Words)
31BECN bit
32-51 DLID (lower 20 bits)
52-56 SC (Service Class)
57-59 RC (Routing Control)
60FECN bit
61-62 L2 (=10, 16B format)
63LT (=1, Link Transfer Head Flit)

Quad Word 1:
0-7   L4 type (=0x78 ETHERNET)
8-11  SLID[23:20]
12-15 DLID[23:20]
16-31 PKEY
32-47 Entropy
48-63 Reserved

Quad Word 2:
0-15  Reserved
16-31 L4 header
32-63 Ethernet Packet

Quad Words 3 to N-1:
0-63  Ethernet packet (pad extended)

Quad Word N (last):
0-23  Ethernet packet (pad extended)
24-55 ICRC
56-61 Tail
62-63 LT (=01, Link Transfer Tail Flit)

Ethernet packet is padded on the transmit side to ensure that the VNIC OPA 
packet is quad word aligned. The 'Tail' field contains the number of bytes 
padded. On the receive side the 'Tail' field is read and the padding is removed 
(along with ICRC, Tail and OPA header) before passing packet up the network 
stack.

The L4 header field contains the virtual Ethernet switch id the VNIC port 
belongs to. On the receive side, this field is used to de-multiplex the 
received VNIC packets to different VNIC ports.

Driver Design
==
Intel OPA VNIC software design is presented in the below diagram.
OPA VNIC functionality has a HW dependent component and a HW independent 
component.

The support has been added for IB device to allocate and free the RDMA netdev 
devices. The RDMA netdev supports interfacing with the network stack thus 
creating standard network interfaces. OPA_VNIC is an RDMA netdev device type.

The HW dependent VNIC functionality is part of the HFI1 driver. It implements 
the verbs to allocate and free the OPA_VNIC RDMA netdev.
It involves HW resource allocation/management for VNIC functionality.
It interfaces with the network stack and

Re: [PATCH net-next v3 2/3] Add a eBPF helper function to retrieve socket uid

2017-02-23 Thread Willem de Bruijn
> +BPF_CALL_1(bpf_get_socket_uid, struct sk_buff *, skb)
> +{
> +   struct sock *sk = sk_to_full_sk(skb->sk);
> +   kuid_t kuid = sock_net_uid(dev_net(skb->dev), sk);

dev_net cannot handle a NULL skb->dev


Re: [PATCH 00/35] treewide trivial patches converting pr_warning to pr_warn

2017-02-23 Thread Joe Perches
On Thu, 2017-02-23 at 09:28 -0600, Rob Herring wrote:
> On Fri, Feb 17, 2017 at 1:11 AM, Joe Perches  wrote:
> > There are ~4300 uses of pr_warn and ~250 uses of the older
> > pr_warning in the kernel source tree.
> > 
> > Make the use of pr_warn consistent across all kernel files.
> > 
> > This excludes all files in tools/ as there is a separate
> > define pr_warning for that directory tree and pr_warn is
> > not used in tools/.
> > 
> > Done with 'sed s/\bpr_warning\b/pr_warn/' and some emacsing.
[]
> Where's the removal of pr_warning so we don't have more sneak in?

After all of these actually get applied,
and maybe a cycle or two later, one would
get sent.



Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread Andreas Schultz


- On Feb 23, 2017, at 5:42 PM, pablo pa...@netfilter.org wrote:

> On Thu, Feb 23, 2017 at 03:21:13PM +0100, Andreas Schultz wrote:
>> - On Feb 23, 2017, at 2:49 PM, pablo pa...@netfilter.org wrote:
> [...]
>> > According to specs, section 4.4.2.3 Encapsulated T-PDU, TS 29.281.
>> > 
>> > "The UDP Source Port is a locally allocated port number at the sending
>> > GTP-U entity."
>> > 
>> > Older specs that refer to GTP-U such as TS 09.60 and TS 29.060 also
>> > state the same.
>> 
>> It is absolutely valid the choose any sending port you want. I only
>> think you should know which port you did send on.
>> 
>> TS 29.281, Sect. 5.2.2.1 defines the UDP port extension to be used
>> in error indications. It provides the UDP source port of a G-PDU
>> that triggered an error.
>> 
>> If the send side does not know which port it uses to send, how
>> can it use this indication to correlate an error? That's the reason
>> I thought it would be better to add the UDP source port to the
>> PDP context and allow the control path to assign the source port
>> on context creation.
>> 
>> Of course, this header is optional and the receiving side is not
>> required to use it.
>> 
>> About the RSS spreading in the receive side, I would think that
>> a receiver would prefer to process all packets that belong to a
>> give TEID with the same receive instance. So keeping the UDP
>> source port for a given TEID stable would be beneficial. As far
>> as I understand it, the hash used in the patch uses the source
>> and destination values from the original flow. This would mean
>> that GTP packets that belong to the same TEID would end up with
>> different UDP source ports.
> 
> The receiver needs to scale up, and that happens if packets are
> distributed between CPUs in a way that make sense. I don't think it
> makes sense to pass packets that belong to the same tunnel to the same
> CPU, this is exactly the scenario that makes DDOS easier to trigger.

When we are talking about the xmit path, then currently none of the
receivers we are talking to is going to be Linux and we have no
idea how they will behave nor do we have any influence on them. Do
we really need to make assumptions about other vendors implementations?

Traces on live GRX networks show that about 90% of the SGSN/S-GW
that would talk to us always use the default GTP-U port as source
port. Some multi chassis GSN's seem to assign source port ranges to
chassis, but that has nothing todo with DDOS protection.

So, even when do the randomization on TX, it won't help our receiver
scale up. We have to deal what the other (100% non Linux side is
going to throw at us).

>> So what about this as a compromise, we dd a UDP source port field
>> to the PDP context, it defaults to 0 (zero), the control instance
>> can optionally initialize this field, when we hit the xmit code
>> and the port is non zero, use that value, if it is zero use the hash?
> 
> You want to use this for your VRF concept? I would like that you take
> the time to explain us your usecases.

I only want the normal multi APN per GGSN/P-GW setup that every mobile
carrier on this world is running on the big commercial vendor boxes, nothing
more and nothing less.

I have tried to explain this multiple times already, but it seems I
failed every time to put it in intelligible form. The last attempt
was here: http://marc.info/?l=linux-netdev&m=148700022003294&w=2

> How are you going to use this mapping between tunnels and UDP source ports?

There is no mapping between tunnels and UDP source ports. UDP source
ports do not matter to the receiving side at all in GTP tunnels. There
is one error handling case that can benefit from knowing the source port
and I think that case shouldn't be discarded lightly.

A GTP-U tunnel is only defined by its destination IP, destination UDP port
(which is constant 2152) and it's TEID. This also means that a GTP tunnel
is a one-way construct. Only the GTP-C instance knows that a PDP context
is actually two GTP-U tunnels, one in each direction.

We had the discussion if the source IP does play a role in this. The
3GPP specifications do not make a 100% clear statement for GTP-U on that.
So the case can be made that the GTP-U tunnel pair should be symmetric
(the destination IP of one tunnel is the source of the other). There
are various other statements in the documents that imply that this is
not always the case.

I thought we had come to the conclusion (with Harald) to make the reverse
GSN peer filter an option. Then every control path implementation has
the choice to use it or not. The same should IMHO apply to the source port
randomization.

For GTP-C the situation is clear, the control instance has to accept
traffic of a given TEID from any source, otherwise certain handover
procedures would not work.

> An explaination would be better than searching at some optional
> (corner case) extension in the specs whose usage is questionable.
> 
> In any case, I think we want a good defau

Re: [PATCH net-next v3 3/3] A Sample of using socket cookie and uid for traffic monitoring

2017-02-23 Thread Willem de Bruijn
> +/* This test is a demo of using get_socket_uid and get_socket_cookie
> + * helper function to do per socket based network traffic monitoring.
> + * It requires iptable version higher then 1.6.1. to load pined eBPF
> + * program into the xt_bpf match.

iptable -> iptables (everywhre
pined -> pinned

> + * Clean up: if using shell script, the script file will delete the iptables
> + * rule and unmount the bpf program when exit. Else the iptables rule need
> + * to be deleted using:
> + *   iptables -D INPUT -m bpf --object-pinned ${mnt_dir}/bpf_prog -j ACCEPT

this is already in the wrapper script

> +   struct bpf_insn prog[] = {
> +   /*
> +* it for future usage. value stored in R6 to R10 will not be
> +* reset after a bpf helper function call.

garbled comment

> +*/
> +   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +   /*
> +* pc1: BPF_FUNC_get_socket_cookie takes one parameter,
> +* R1: sk_buff
> +*/
> +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +   BPF_FUNC_get_socket_cookie),
> +   /* pc2-4: save &socketCookie to r7 for future usage*/
> +   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> +   BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
> +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, -8),
> +   /*
> +* pc5-8: set up the registers for BPF_FUNC_map_lookup_elem,
> +* it takes two parameters (R1: map_fd,  R2: &socket_cookie)
> +*/
> +   BPF_LD_MAP_FD(BPF_REG_1, map_fd),
> +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
> +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +   BPF_FUNC_map_lookup_elem),
> +   /*
> +* pc9. if r0 != 0x0, go to pc+14, since we have the cookie
> +* stored already
> +* Otherwise do pc10-22 to setup a new data entry.
> +*/
> +   BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 14),
> +   BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +   BPF_FUNC_get_socket_uid),
> +   /*
> +* Place a struct stats in the R10 stack and sequentially
> +* place the member value into the memory. Packets value
> +* is set by directly place a IMM value 1 into the stack.
> +*/
> +   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0,
> +   -32 + offsetof(struct stats, uid)),
> +   BPF_ST_MEM(BPF_DW, BPF_REG_10,
> +   -32 + offsetof(struct stats, packets), 1),
> +   /*
> +* __sk_buff is a special struct used for eBPF program to
> +* directly access some sk_buff field.
> +*/
> +   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
> +   offsetof(struct __sk_buff, len)),
> +   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1,
> +   -32 + offsetof(struct stats, bytes)),
> +   /*
> +* add new map entry using BPF_FUNC_map_update_elem, it takes
> +* 4 parameters (R1: map_fd, R2: &socket_cookie, R3: &stats)

R4: flags

> +*/
> +   BPF_LD_MAP_FD(BPF_REG_1, map_fd),
> +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
> +   BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
> +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -32),
> +   BPF_MOV64_IMM(BPF_REG_4, 0),
> +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +   BPF_FUNC_map_update_elem),
> +   BPF_JMP_IMM(BPF_JA, 0, 0, 5),
> +   /*
> +* pc24-30 update the packet info to a exist data entry, it 
> can
> +* be done by directly write to pointers instead of using
> +* BPF_FUNC_map_update_elem helper function
> +*/
> +   BPF_MOV64_REG(BPF_REG_9, BPF_REG_0),
> +   BPF_MOV64_IMM(BPF_REG_1, 1),
> +   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_9, 
> BPF_REG_1,
> +   offsetof(struct stats, packets), 0),

BPF_STX_XADD here and below

> +   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
> +   offsetof(struct __sk_buff, len)),
> +   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_9, 
> BPF_REG_1,
> + offsetof(struct stats, bytes), 0),
> +   BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6,
> +   offsetof(struct __sk_buff, len)),

No requirement to return len. It goes unused, can just return non-zero
immediate.

> +static void p

[PATCH 0/7] net: stmmac: Fixes and Tegra186 support

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

Hi everyone,

This series of patches start with a few cleanups that I ran across while
adding Tegra186 support to the stmmac driver. It then adds code for FIFO
size parsing from feature registers and finally enables support for the
incarnation of the Synopsys DWC QOS IP found on NVIDIA Tegra186 SoCs.

This is based on next-20170223.

Thanks,
Thierry

Thierry Reding (7):
  net: stmmac: Rename clk_ptp_ref clock to ptp_ref
  net: stmmac: Balance PTP reference clock enable/disable
  net: stmmac: Check for DMA mapping errors
  net: stmmac: Parse FIFO sizes from feature registers
  net: stmmac: Program RX queue size and flow control
  net: stmmac: dwc-qos: Split out ->probe() and ->remove()
  net: stmmac: dwc-qos: Add Tegra186 support

 Documentation/devicetree/bindings/net/stmmac.txt   |   6 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 366 +++--
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h   |  12 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |  45 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   9 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   3 +-
 7 files changed, 411 insertions(+), 33 deletions(-)

-- 
2.11.1



[PATCH 5/7] net: stmmac: Program RX queue size and flow control

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

Program the receive queue size based on the RX FIFO size and enable
hardware flow control for large FIFOs.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h | 12 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 43 ++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index db45134fddf0..9acc1f1252b3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -180,6 +180,7 @@ enum power_event {
 #define MTL_OP_MODE_TSFBIT(1)
 
 #define MTL_OP_MODE_TQS_MASK   GENMASK(24, 16)
+#define MTL_OP_MODE_TQS_SHIFT  16
 
 #define MTL_OP_MODE_TTC_MASK   0x70
 #define MTL_OP_MODE_TTC_SHIFT  4
@@ -193,6 +194,17 @@ enum power_event {
 #define MTL_OP_MODE_TTC_384(6 << MTL_OP_MODE_TTC_SHIFT)
 #define MTL_OP_MODE_TTC_512(7 << MTL_OP_MODE_TTC_SHIFT)
 
+#define MTL_OP_MODE_RQS_MASK   GENMASK(29, 20)
+#define MTL_OP_MODE_RQS_SHIFT  20
+
+#define MTL_OP_MODE_RFD_MASK   GENMASK(19, 14)
+#define MTL_OP_MODE_RFD_SHIFT  14
+
+#define MTL_OP_MODE_RFA_MASK   GENMASK(13, 8)
+#define MTL_OP_MODE_RFA_SHIFT  8
+
+#define MTL_OP_MODE_EHFC   BIT(7)
+
 #define MTL_OP_MODE_RTC_MASK   0x18
 #define MTL_OP_MODE_RTC_SHIFT  3
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 8d249f3b34c8..03d230201960 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -185,8 +185,9 @@ static void dwmac4_rx_watchdog(void __iomem *ioaddr, u32 
riwt)
 }
 
 static void dwmac4_dma_chan_op_mode(void __iomem *ioaddr, int txmode,
-   int rxmode, u32 channel)
+   int rxmode, u32 channel, int rxfifosz)
 {
+   unsigned int rqs = rxfifosz / 256 - 1;
u32 mtl_tx_op, mtl_rx_op, mtl_rx_int;
 
/* Following code only done for channel 0, other channels not yet
@@ -252,6 +253,44 @@ static void dwmac4_dma_chan_op_mode(void __iomem *ioaddr, 
int txmode,
mtl_rx_op |= MTL_OP_MODE_RTC_128;
}
 
+   mtl_rx_op &= ~MTL_OP_MODE_RQS_MASK;
+   mtl_rx_op |= rqs << MTL_OP_MODE_RQS_SHIFT;
+
+   /* enable flow control only if each channel gets 4 KiB or more FIFO */
+   if (rxfifosz >= 4096) {
+   unsigned int rfd, rfa;
+
+   mtl_rx_op |= MTL_OP_MODE_EHFC;
+
+   switch (rxfifosz) {
+   case 4096:
+   rfd = 0x03;
+   rfa = 0x01;
+   break;
+
+   case 8192:
+   rfd = 0x06;
+   rfa = 0x0a;
+   break;
+
+   case 16384:
+   rfd = 0x06;
+   rfa = 0x12;
+   break;
+
+   default:
+   rfd = 0x06;
+   rfa = 0x1e;
+   break;
+   }
+
+   mtl_rx_op &= ~MTL_OP_MODE_RFD_MASK;
+   mtl_rx_op |= rfd << MTL_OP_MODE_RFD_SHIFT;
+
+   mtl_rx_op &= ~MTL_OP_MODE_RFA_MASK;
+   mtl_rx_op |= rfa << MTL_OP_MODE_RFA_SHIFT;
+   }
+
writel(mtl_rx_op, ioaddr + MTL_CHAN_RX_OP_MODE(channel));
 
/* Enable MTL RX overflow */
@@ -264,7 +303,7 @@ static void dwmac4_dma_operation_mode(void __iomem *ioaddr, 
int txmode,
  int rxmode, int rxfifosz)
 {
/* Only Channel 0 is actually configured and used */
-   dwmac4_dma_chan_op_mode(ioaddr, txmode, rxmode, 0);
+   dwmac4_dma_chan_op_mode(ioaddr, txmode, rxmode, 0, rxfifosz);
 }
 
 static void dwmac4_get_hw_feature(void __iomem *ioaddr,
-- 
2.11.1



[PATCH 6/7] net: stmmac: dwc-qos: Split out ->probe() and ->remove()

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

Split out the binding specific parts of ->probe() and ->remove() to
enable the driver to support variants of the binding. This is useful in
order to keep backwards-compatibility while making it easy for a sub-
driver to deal only with the updated bindings rather than having to add
compatibility quirks all over the place.

Signed-off-by: Thierry Reding 
---
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 114 -
 1 file changed, 88 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index 1a3fa3d9f855..5071d3c15adc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -106,13 +107,70 @@ static int dwc_eth_dwmac_config_dt(struct platform_device 
*pdev,
return 0;
 }
 
+static void *dwc_qos_probe(struct platform_device *pdev,
+  struct plat_stmmacenet_data *plat_dat,
+  struct stmmac_resources *stmmac_res)
+{
+   int err;
+
+   plat_dat->stmmac_clk = devm_clk_get(&pdev->dev, "apb_pclk");
+   if (IS_ERR(plat_dat->stmmac_clk)) {
+   dev_err(&pdev->dev, "apb_pclk clock not found.\n");
+   return ERR_CAST(plat_dat->stmmac_clk);
+   }
+
+   clk_prepare_enable(plat_dat->stmmac_clk);
+
+   plat_dat->pclk = devm_clk_get(&pdev->dev, "phy_ref_clk");
+   if (IS_ERR(plat_dat->pclk)) {
+   dev_err(&pdev->dev, "phy_ref_clk clock not found.\n");
+   err = PTR_ERR(plat_dat->pclk);
+   goto disable;
+   }
+
+   clk_prepare_enable(plat_dat->pclk);
+
+   return NULL;
+
+disable:
+   clk_disable_unprepare(plat_dat->stmmac_clk);
+   return ERR_PTR(err);
+}
+
+static int dwc_qos_remove(struct platform_device *pdev)
+{
+   struct net_device *ndev = platform_get_drvdata(pdev);
+   struct stmmac_priv *priv = netdev_priv(ndev);
+
+   clk_disable_unprepare(priv->plat->pclk);
+   clk_disable_unprepare(priv->plat->stmmac_clk);
+
+   return 0;
+}
+
+struct dwc_eth_dwmac_data {
+   void *(*probe)(struct platform_device *pdev,
+  struct plat_stmmacenet_data *data,
+  struct stmmac_resources *res);
+   int (*remove)(struct platform_device *pdev);
+};
+
+static const struct dwc_eth_dwmac_data dwc_qos_data = {
+   .probe = dwc_qos_probe,
+   .remove = dwc_qos_remove,
+};
+
 static int dwc_eth_dwmac_probe(struct platform_device *pdev)
 {
+   const struct dwc_eth_dwmac_data *data;
struct plat_stmmacenet_data *plat_dat;
struct stmmac_resources stmmac_res;
struct resource *res;
+   void *priv;
int ret;
 
+   data = of_device_get_match_data(&pdev->dev);
+
memset(&stmmac_res, 0, sizeof(struct stmmac_resources));
 
/**
@@ -138,39 +196,26 @@ static int dwc_eth_dwmac_probe(struct platform_device 
*pdev)
if (IS_ERR(plat_dat))
return PTR_ERR(plat_dat);
 
-   plat_dat->stmmac_clk = devm_clk_get(&pdev->dev, "apb_pclk");
-   if (IS_ERR(plat_dat->stmmac_clk)) {
-   dev_err(&pdev->dev, "apb_pclk clock not found.\n");
-   ret = PTR_ERR(plat_dat->stmmac_clk);
-   plat_dat->stmmac_clk = NULL;
-   goto err_remove_config_dt;
-   }
-   clk_prepare_enable(plat_dat->stmmac_clk);
-
-   plat_dat->pclk = devm_clk_get(&pdev->dev, "phy_ref_clk");
-   if (IS_ERR(plat_dat->pclk)) {
-   dev_err(&pdev->dev, "phy_ref_clk clock not found.\n");
-   ret = PTR_ERR(plat_dat->pclk);
-   plat_dat->pclk = NULL;
-   goto err_out_clk_dis_phy;
+   priv = data->probe(pdev, plat_dat, &stmmac_res);
+   if (IS_ERR(priv)) {
+   ret = PTR_ERR(priv);
+   dev_err(&pdev->dev, "failed to probe subdriver: %d\n", ret);
+   goto remove_config;
}
-   clk_prepare_enable(plat_dat->pclk);
 
ret = dwc_eth_dwmac_config_dt(pdev, plat_dat);
if (ret)
-   goto err_out_clk_dis_aper;
+   goto remove;
 
ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
if (ret)
-   goto err_out_clk_dis_aper;
+   goto remove;
 
-   return 0;
+   return ret;
 
-err_out_clk_dis_aper:
-   clk_disable_unprepare(plat_dat->pclk);
-err_out_clk_dis_phy:
-   clk_disable_unprepare(plat_dat->stmmac_clk);
-err_remove_config_dt:
+remove:
+   data->remove(pdev);
+remove_config:
stmmac_remove_config_dt(pdev, plat_dat);
 
return ret;
@@ -178,11 +223,28 @@ static int dwc_eth_dwmac_probe(struct platform_device 
*pdev)
 
 static int dwc_eth_dwmac_remove(struct platform_device *pdev)
 {
-   return stmmac_pltfr_

[PATCH 4/7] net: stmmac: Parse FIFO sizes from feature registers

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

New version of this core encode the FIFO sizes in one of the feature
registers. Use these sizes as default, but still allow device tree to
override them for backwards compatibility.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/common.h  | 3 +++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 144fe84e8a53..6ac653845d82 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -324,6 +324,9 @@ struct dma_features {
unsigned int number_tx_queues;
/* Alternate (enhanced) DESC mode */
unsigned int enh_desc;
+   /* TX and RX FIFO sizes */
+   unsigned int tx_fifo_size;
+   unsigned int rx_fifo_size;
 };
 
 /* GMAC TX FIFO is 8K, Rx FIFO is 16K */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 377d1b44d4f2..8d249f3b34c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -296,6 +296,8 @@ static void dwmac4_get_hw_feature(void __iomem *ioaddr,
hw_cap = readl(ioaddr + GMAC_HW_FEATURE1);
dma_cap->av = (hw_cap & GMAC_HW_FEAT_AVSEL) >> 20;
dma_cap->tsoen = (hw_cap & GMAC_HW_TSOEN) >> 18;
+   dma_cap->tx_fifo_size = 128 << ((hw_cap >> 6) & 0x1f);
+   dma_cap->rx_fifo_size = 128 << ((hw_cap >> 0) & 0x1f);
/* MAC HW feature2 */
hw_cap = readl(ioaddr + GMAC_HW_FEATURE2);
/* TX and RX number of channels */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d7387919bdb6..291e34f0ca94 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1281,6 +1281,9 @@ static void stmmac_dma_operation_mode(struct stmmac_priv 
*priv)
 {
int rxfifosz = priv->plat->rx_fifo_size;
 
+   if (rxfifosz == 0)
+   rxfifosz = priv->dma_cap.rx_fifo_size;
+
if (priv->plat->force_thresh_dma_mode)
priv->hw->dma->dma_mode(priv->ioaddr, tc, tc, rxfifosz);
else if (priv->plat->force_sf_dma_mode || priv->plat->tx_coe) {
-- 
2.11.1



[PATCH 3/7] net: stmmac: Check for DMA mapping errors

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

When DMA mapping an SKB fragment, the mapping must be checked for
errors, otherwise the DMA debug code will complain upon unmap.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6b7a5ce19589..d7387919bdb6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2072,6 +2072,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
des = skb_frag_dma_map(priv->device, frag, 0,
   skb_frag_size(frag),
   DMA_TO_DEVICE);
+   if (dma_mapping_error(priv->device, des))
+   goto dma_map_err;
 
stmmac_tso_allocator(priv, des, skb_frag_size(frag),
 (i == nfrags - 1));
-- 
2.11.1



[PATCH 7/7] net: stmmac: dwc-qos: Add Tegra186 support

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

The NVIDIA Tegra186 SoC contains an instance of the Synopsys DWC
ethernet QOS IP core. The binding that it uses is slightly different
from existing ones because of the integration (clocks, resets, ...).

Signed-off-by: Thierry Reding 
---
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 252 +
 1 file changed, 252 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index 5071d3c15adc..54dfbdc48f6d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -22,10 +23,24 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "stmmac_platform.h"
 
+struct tegra_eqos {
+   struct device *dev;
+   void __iomem *regs;
+
+   struct reset_control *rst;
+   struct clk *clk_master;
+   struct clk *clk_slave;
+   struct clk *clk_tx;
+   struct clk *clk_rx;
+
+   struct gpio_desc *reset;
+};
+
 static int dwc_eth_dwmac_config_dt(struct platform_device *pdev,
   struct plat_stmmacenet_data *plat_dat)
 {
@@ -148,6 +163,237 @@ static int dwc_qos_remove(struct platform_device *pdev)
return 0;
 }
 
+#define SDMEMCOMPPADCTRL 0x8800
+#define  SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD BIT(31)
+
+#define AUTO_CAL_CONFIG 0x8804
+#define  AUTO_CAL_CONFIG_START BIT(31)
+#define  AUTO_CAL_CONFIG_ENABLE BIT(29)
+
+#define AUTO_CAL_STATUS 0x880c
+#define  AUTO_CAL_STATUS_ACTIVE BIT(31)
+
+static void tegra_eqos_fix_speed(void *priv, unsigned int speed)
+{
+   struct tegra_eqos *eqos = priv;
+   unsigned long rate = 12500;
+   bool needs_calibration = false;
+   unsigned int i;
+   u32 value;
+
+   switch (speed) {
+   case SPEED_1000:
+   needs_calibration = true;
+   rate = 12500;
+   break;
+
+   case SPEED_100:
+   needs_calibration = true;
+   rate = 2500;
+   break;
+
+   case SPEED_10:
+   rate = 250;
+   break;
+
+   default:
+   dev_err(eqos->dev, "invalid speed %u\n", speed);
+   break;
+   }
+
+   if (needs_calibration) {
+   /* calibrate */
+   value = readl(eqos->regs + SDMEMCOMPPADCTRL);
+   value |= SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD;
+   writel(value, eqos->regs + SDMEMCOMPPADCTRL);
+
+   udelay(1);
+
+   value = readl(eqos->regs + AUTO_CAL_CONFIG);
+   value |= AUTO_CAL_CONFIG_START | AUTO_CAL_CONFIG_ENABLE;
+   writel(value, eqos->regs + AUTO_CAL_CONFIG);
+
+   for (i = 0; i <= 10; i++) {
+   value = readl(eqos->regs + AUTO_CAL_STATUS);
+   if (value & AUTO_CAL_STATUS_ACTIVE)
+   break;
+
+   udelay(1);
+   }
+
+   if ((value & AUTO_CAL_STATUS_ACTIVE) == 0) {
+   dev_err(eqos->dev, "calibration did not start\n");
+   goto failed;
+   }
+
+   for (i = 0; i <= 10; i++) {
+   value = readl(eqos->regs + AUTO_CAL_STATUS);
+   if ((value & AUTO_CAL_STATUS_ACTIVE) == 0)
+   break;
+
+   udelay(20);
+   }
+
+   if (value & AUTO_CAL_STATUS_ACTIVE) {
+   dev_err(eqos->dev, "calibration didn't finish\n");
+   goto failed;
+   }
+
+   failed:
+   value = readl(eqos->regs + SDMEMCOMPPADCTRL);
+   value &= ~SDMEMCOMPPADCTRL_PAD_E_INPUT_OR_E_PWRD;
+   writel(value, eqos->regs + SDMEMCOMPPADCTRL);
+   } else {
+   value = readl(eqos->regs + AUTO_CAL_CONFIG);
+   value &= ~AUTO_CAL_CONFIG_ENABLE;
+   writel(value, eqos->regs + AUTO_CAL_CONFIG);
+   }
+
+   clk_set_rate(eqos->clk_tx, rate);
+}
+
+static int tegra_eqos_init(struct platform_device *pdev, void *priv)
+{
+   struct tegra_eqos *eqos = priv;
+   unsigned long rate;
+   u32 value;
+
+   rate = clk_get_rate(eqos->clk_slave);
+
+   value = readl(eqos->regs + 0xdc);
+   value = (rate / 100) - 1;
+   writel(value, eqos->regs + 0xdc);
+
+   return 0;
+}
+
+static void *tegra_eqos_probe(struct platform_device *pdev,
+ struct plat_stmmacenet_data *data,
+ struct stmmac_resources *res)
+{
+   struct tegra_eqos *eqos;
+   int err;
+
+   eqos = devm_kzalloc(&pdev->dev, sizeof(*eqos), GFP_KERNEL);
+   if (!eqos) {
+   err = -ENOMEM;
+   goto error;
+   }
+

[PATCH 1/7] net: stmmac: Rename clk_ptp_ref clock to ptp_ref

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

There aren't currently any users of the "clk_ptp_ref", but there are
other references to "ptp_ref", so I'm leaning towards considering that a
typo. Fix it.

Signed-off-by: Thierry Reding 
---
 Documentation/devicetree/bindings/net/stmmac.txt  | 6 +++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index d3bfc2b30fb5..11b27dfd1627 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -28,9 +28,9 @@ Optional properties:
   clocks may be specified in derived bindings.
 - clock-names: One name for each entry in the clocks property, the
   first one should be "stmmaceth" and the second one should be "pclk".
-- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is
-  available this clock is used for programming the Timestamp Addend Register.
-  If not passed then the system clock will be used and this is fine on some
+- ptp_ref: this is the PTP reference clock; in case of the PTP is available
+  this clock is used for programming the Timestamp Addend Register. If not
+  passed then the system clock will be used and this is fine on some
   platforms.
 - tx-fifo-depth: See ethernet.txt file in the same directory
 - rx-fifo-depth: See ethernet.txt file in the same directory
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 433a84239a68..5b18355c0d2b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -359,7 +359,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
clk_prepare_enable(plat->pclk);
 
/* Fall-back to main clock in case of no PTP ref is passed */
-   plat->clk_ptp_ref = devm_clk_get(&pdev->dev, "clk_ptp_ref");
+   plat->clk_ptp_ref = devm_clk_get(&pdev->dev, "ptp_ref");
if (IS_ERR(plat->clk_ptp_ref)) {
plat->clk_ptp_rate = clk_get_rate(plat->stmmac_clk);
plat->clk_ptp_ref = NULL;
-- 
2.11.1



[PATCH 2/7] net: stmmac: Balance PTP reference clock enable/disable

2017-02-23 Thread Thierry Reding
From: Thierry Reding 

clk_prepare_enable() and clk_disable_unprepare() for this clock aren't
properly balanced, which can trigger a WARN_ON() in the common clock
framework.

Signed-off-by: Thierry Reding 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3cbe09682afe..6b7a5ce19589 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1711,6 +1711,10 @@ static int stmmac_hw_setup(struct net_device *dev, bool 
init_ptp)
stmmac_mmc_setup(priv);
 
if (init_ptp) {
+   ret = clk_prepare_enable(priv->plat->clk_ptp_ref);
+   if (ret < 0)
+   netdev_warn(priv->dev, "failed to enable PTP reference 
clock: %d\n", ret);
+
ret = stmmac_init_ptp(priv);
if (ret == -EOPNOTSUPP)
netdev_warn(priv->dev, "PTP not supported by HW\n");
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 5b18355c0d2b..d285d6cfbd0d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -365,7 +365,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
plat->clk_ptp_ref = NULL;
dev_warn(&pdev->dev, "PTP uses main clock\n");
} else {
-   clk_prepare_enable(plat->clk_ptp_ref);
plat->clk_ptp_rate = clk_get_rate(plat->clk_ptp_ref);
dev_dbg(&pdev->dev, "PTP rate %d\n", plat->clk_ptp_rate);
}
-- 
2.11.1



Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash

2017-02-23 Thread David Miller
From: Harald Welte 
Date: Thu, 23 Feb 2017 17:50:51 +0100

> I understand and support the motivation to design robust systsems even
> in the presence of broken/ignorant specs, but I think this is one of the
> situations where it is useless (and IMHO impossible) to do anything
> about it.

I think avoiding trying to do something reasonable about this and just
saying "this is how cellular networks are" is not acceptable.

We know how to properly strengthen tunnelling implementations in the
kernel against DDoS attachs, GTP can be treated similarly.


Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE

2017-02-23 Thread Marcelo Ricardo Leitner
On Thu, Feb 23, 2017 at 04:04:10PM +, David Laight wrote:
> From: Xin Long
> > Sent: 23 February 2017 03:46
> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight  
> > wrote:
> > > From: Xin Long
> > >> Sent: 18 February 2017 17:53
> > >> This patch is to add support for MSG_MORE on sctp.
> > >>
> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
> > >> then uses it to decide if the chunks of this msg will be sent at once or
> > >> delay it.
> > >>
> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> > >> affect other chunks' bundling.
> > >
> > > I thought about that and decided that the MSG_MORE flag on the last data
> > > chunk was the only one that mattered.
> > > Indeed looking at any others is broken.
> > >
> > > Consider what happens if you have two small chunks queued, the first
> > > with MSG_MORE set, the second with it clear.
> > >
> > > I think that sctp_outq_flush() will look at the first chunk and decide it
> > > doesn't need to do anything because sctp_packet_transmit_chunk()
> > > returns SCTP_XMIT_DELAY.
> > > The data chunk with MSG_MORE clear won't even be looked at.
> > > So the data will never be sent.
> 
> > It's not that bad as you thought, in sctp_packet_can_append_data():
> > when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> > would be still sent out.
> 
> One of us isn't understanding the other :-)
> 
> IIRC sctp_packet_can_append_data() is called for the first queued
> data chunk in order to decide whether to generate a message that

Perhaps here lies the source of the confusion?
sctp_packet_can_append_data() is called for all queued data chunks, and
not just the first one.

sctp_outq_flush
  (retransmissions here, omitted for simplicity)
  /* Finally, transmit new packets.  */
  while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
sctp_packet_transmit_chunk
  sctp_packet_append_chunk
sctp_packet_can_append_data
__sctp_packet_append_chunk

So chunks are checked one by one.

> consists only of data chunks.

That's not really its purpose. It's to check if it can append a data
chunk to the packet being prepared, while respecting asoc state, cwnd,
etc.

HTH!

  Marcelo

> If it returns SCTP_XMIT_OK then a message is built collecting the
> rest of the queued data chunks (until the window fills).
> 
> So if I send a message with MSG_MORE set (on an idle connection)
> SCTP_XMIT_DELAY is returned and a message isn't sent.
> 
> I now send a second small message, this time with MSG_MORE clear.
> The message is queued, then the code looks to see if it can send anything.
> 
> sctp_packet_can_append_data() is called for the first queued chunk.
> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> message is built.
> The second message isn't even looked at.
> 
> > What MSG_MORE flag actually does is ignore inflight == 0 and
> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> > it has to respect the original logic (like !chunk->msg->can_delay
> > || !sctp_packet_empty(packet) || ...)
> > 
> > To delay the chunks with MSG_MORE set even when inflight is 0
> > it especially important here for users.
> 
> I'm not too worried about that.
> Sending the first message was a cheap way to ensure something got
> sent if the application lied and didn't send a subsequent message.
> 
> The change has hit Linus's tree, I'll should be able to test that
> and confirm what I think is going on.
> 
>   David
> 


  1   2   >