Re: Fw: [Bug 195969] New: ipsec icmp and udp works, tcp doesn't work

2017-06-06 Thread Steffen Klassert
Cc the reporter.

On Tue, Jun 06, 2017 at 12:14:39PM -0700, Stephen Hemminger wrote:
> 
> 
> Begin forwarded message:
> 
> Date: Sat, 03 Jun 2017 06:25:05 +
> From: bugzilla-dae...@bugzilla.kernel.org
> To: step...@networkplumber.org
> Subject: [Bug 195969] New: ipsec icmp and udp works, tcp doesn't work
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195969
> 
> Bug ID: 195969
>Summary: ipsec icmp and udp works, tcp doesn't work
>Product: Networking
>Version: 2.5
> Kernel Version: 4.11.3-1-ARCH
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: Other
>   Assignee: step...@networkplumber.org
>   Reporter: d...@djagoo.io
> Regression: No
> 
> A few days ago I updated to 4.11.3-1-ARCH. After that my VPN access to our
> corporate network was broken.
> 
> The connection is established and I can use UDP (i.e. DNS) and ICMP. All TCP
> connections I tried (ssh, smb, http...) failed.

Is this with UDP encapsulation?

If so, you could try this patch:

Subject: [PATCH v4.11] esp4: Fix udpencap for local TCP packets.

Locally generated TCP packets are usually cloned, so we
do skb_cow_data() on this packets. After that we need to
reload the pointer to the esp header. On udpencap this
header has an offset to skb_transport_header, so take this
offset into account.

This is a backport of:
commit 0e78a87306a ("esp4: Fix udpencap for local TCP packets.")

Fixes: 67d349ed603 ("net/esp4: Fix invalid esph pointer crash")
Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
Reported-by: Don Bowman 
Signed-off-by: Steffen Klassert 
---
 net/ipv4/esp4.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b1e2444..9708a32 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -212,6 +212,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff 
*skb)
u8 *iv;
u8 *tail;
u8 *vaddr;
+   int esph_offset;
int blksize;
int clen;
int alen;
@@ -392,12 +393,14 @@ static int esp_output(struct xfrm_state *x, struct 
sk_buff *skb)
}
 
 cow:
+   esph_offset = (unsigned char *)esph - skb_transport_header(skb);
+
err = skb_cow_data(skb, tailen, );
if (err < 0)
goto error;
nfrags = err;
tail = skb_tail_pointer(trailer);
-   esph = ip_esp_hdr(skb);
+   esph = (struct ip_esp_hdr *)(skb_transport_header(skb) + esph_offset);
 
 skip_cow:
esp_output_fill_trailer(tail, tfclen, plen, proto);
-- 
2.7.4



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 18:34 -0600, David Ahern wrote:
> On 6/6/17 6:27 PM, Eric Dumazet wrote:
> > Good catch, but it looks like similar fix is needed a few lines before.
> > 
> > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > index 
> > deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
> >  100644
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
> > struct sk_buff *skb,
> >  
> > read_lock_bh(>tb6_lock);
> > res = fib6_walk(net, w);
> > -   read_unlock_bh(>tb6_lock);
> > if (res > 0) {
> > cb->args[4] = 1;
> > cb->args[5] = w->root->fn_sernum;
> > }
> > +   read_unlock_bh(>tb6_lock);
> 
> indeed. tunnel vision on Ben's problem

BTW, bug was already Ben's problem when Patrick tried to fix it
in commit 2bec5a369ee79 ("ipv6: fib: fix crash when changing large fib
while dumping it")  seven years ago ;)





Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Saeed Mahameed
On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe
 wrote:
> On Tue, Jun 06, 2017 at 06:52:15AM +, Ilan Tayari wrote:
>
>> So neither the host stack nor the network are aware of them.
>> They exist momentarily only on the internal traces on the board and not
>> anywhere else.
>
> Is that really true? If you are creating rocee QPs' then the RDMA
> stack sees this stuff and now we have buried a RDMA ULP inside an
> ethernet driver which seems really wonky..

It is not an ethernet driver, mlx5_core provides both RDMA and
ethernet interfaces to both mlx5_ib and the mlx5e netdevice.

so it is perfectly capable of creating QPs on its own, after all it is
the one creating QPs for the RDMA stack :).

rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp.


>
>> I don't mind explaining further, but I think you will just see it in the
>> patchset when we submit.
>
> You described exactly what I thought.. I just disagree with you that
> an ethernet connected and controlled IP accelerator is 'part of the
> NIC', even if it happens to be colocated on the same circuit board.
>
> Jason


Re: [PATCH 2/2] tcp: md5: add fields to the tcp_md5sig struct to set a key address prefix

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 17:54 -0700, Ivan Delalande wrote:
> Replace padding in the socket option structure tcp_md5sig with a new
> flag field and address prefix length so it can be specified when
> configuring a new key with the TCP_MD5SIG socket option.
> 
> Signed-off-by: Bob Gilligan 
> Signed-off-by: Eric Mowat 
> Signed-off-by: Ivan Delalande 
> ---
>  include/uapi/linux/tcp.h |  6 +-
>  net/ipv4/tcp_ipv4.c  | 13 +++--
>  net/ipv6/tcp_ipv6.c  | 20 +++-
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 38a2b07afdff..52ac30aa0652 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -234,9 +234,13 @@ enum {
>  /* for TCP_MD5SIG socket option */
>  #define TCP_MD5SIG_MAXKEYLEN 80
>  
> +/* tcp_md5sig flags */
> +#define TCP_MD5SIG_FLAG_PREFIX   1   /* address prefix 
> length */
> +
>  struct tcp_md5sig {
>   struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
> - __u16   __tcpm_pad1;/* zero */
> + __u8tcpm_flags; /* flags */
> + __u8tcpm_prefixlen; /* address prefix */
>   __u16   tcpm_keylen;/* key length */
>   __u32   __tcpm_pad2;/* zero */
>   __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 51ca3bd5a8a3..2b1bb67b3388 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1069,6 +1069,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
> __user *optval,
>  {
>   struct tcp_md5sig cmd;
>   struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
> + u8 prefixlen;
>  
>   if (optlen < sizeof(cmd))
>   return -EINVAL;
> @@ -1079,15 +1080,23 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, 
> char __user *optval,
>   if (sin->sin_family != AF_INET)
>   return -EINVAL;
>  
> + if (cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
> + prefixlen = cmd.tcpm_prefixlen;
> + if (prefixlen > 32)
> + return -EINVAL;
> + } else {
> + prefixlen = 32;
> + }

This will break some applications that maybe did not clear the
__tcpm_pad1 field ?


You need to find another way to maintain compatibility with old
applications.





Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-06-06 Thread Saeed Mahameed
On Wed, Jun 7, 2017 at 12:46 AM, Jes Sorensen  wrote:
> On 06/05/2017 05:53 PM, Saeed Mahameed wrote:
>>
>> On Mon, Jun 5, 2017 at 11:51 PM, Jes Sorensen  wrote:
>>>
>>> On 06/03/2017 03:37 PM, Or Gerlitz wrote:


 On Fri, Jun 2, 2017 at 11:22 PM, Jes Sorensen  wrote:
>
>
> On 05/28/2017 02:03 AM, Or Gerlitz wrote:
>>
>>
>>
>> On Sun, May 28, 2017 at 5:23 AM, Jes Sorensen 
>> wrote:
>>>
>>>
>>>
>>> On 05/27/2017 05:02 PM, Or Gerlitz wrote:




 On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen
 
 wrote:
>
>
>
>
> This gets rid of the temporary #ifdef spaghetti and allows the code
> to
> compile without offload support enabled.
>>
>>
>>
>>
 I am pretty sure we can do that exercise you're up to without any
 spaghetti cooking and even put more code under that CONFIG directive
 (en_rep.c), I'll take that with Saeed.
>>
>>
>>
>>
>>> I want to avoid adding #ifdef CONFIG_foo to the main code in order to
>>> keep
>>> it readable. I did it gradually to make sure I didn't break anything
>>> and
>>> to
>>> allow for it to be bisected in case something did break. If we can
>>> move
>>> out
>>> more code from places like en_rep.c into eswitch_offload.c and get it
>>> disabled that way that would be great, but I like to limit the number
>>> of
>>> #ifdefs we add to the actual code.
>>
>>
>>
>>
>> FWIW (see below), squashing your seven patches to one resulted in a
>> fairly simple/clear
>> patch, so if we go that way, no need to have seven commits just for
>> this
>> piece.
>
>
>
>
> Squashing patches into jumbo patches is inherently broken and bad
> coding
> practice! It makes it way more complicated to debug and bisect in case
> a
> minor detail broke in the process.



 Not that pure LOC ##-s is the only/deep measurement, but your overall
 changes in the the seven patch series account to:

5 files changed, 94 insertions(+), 3 deletions(-)

 and by no mean this is jumbo or inherently broken and bad coded, so
 please slow down please, I looked with care on the resulted patch and
 said it's basically ok.
>>>
>>>
>>>
>>> Squashing patches for the sake of squashing patches is inherently broken
>>> and
>>> bad. So please calm down and stop this mangling of other peoples'
>>> patches.
>>>
>>> If you want an alternative, put up a proposal and look at it for
>>> comparison
>>> somewhere.
>>
>>
>> Hey Jes,
>>
>> It is not just about squashing patches, I am working on a series of
>> patches to allow compiling out eswitch/eswitch_offloads/en_rep.c/en_tc
>> altogether, it will come out cleaner as it will remove all ethernet
>> sriov/eswitch VF representors and eswitch tc offloads stuff with one
>> kconfig flag, and yet preserve standard QoS functionality from en_tc.
>
>
> Saeed,
>
> I realize it is not just about squashing patches, however doing that to
> someone else's patches is just broken. The Linux kernel way is to build on
> top of patches, if they are valid, rather than throwing them all away and
> doing it from scratch again bottom up. If there was something actually wrong
> with my patches, and I would love to understand if that is the case, since I
> don't know 1/100th of the hardware details that you know, then please share
> those details.

Hey Jes,

Sorry for the inconvenience, I am working on a very similar patches,
even before you posted yours.
Your patches are fine, but as i said before, removing eswitch as is
will introduce a small regression in Multi-PF configuration.

the issue is that lately we are having tons of discussions exactly
about this and how to do the driver breakdown
that makes everyone happy, so things are moving relatively slow, but
my work on eswitch is converging.

>
>> BTW today you can just remove eswitch from driver and non sriov
>> configuration will perfectly work with no issues.
>> Even multi PF configuration will also work, but without l2 mac table,
>> which means PFs can only see packets with their own static (permanent)
>> mac addresses, user configured macs will not work on Multi PF
>> configuration.
>
>
> It sounds like this shakes up things a little and we will have things moved
> to where they actually belong in the hierarchy so that will be a good thing
> in the end :)
>
>> For that i will take the l2 table (ConnectX PF mac table) logic out of
>> eswitch as it is not really an eswitch logic, and move it to core
>> driver to allow Multi PF configuration to work without eswitch.
>
>
> Sounds good.
>
>> I will post some patches for you to review by end of week.
>
>
> Could we please start seeing 

Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Saeed Mahameed
On Wed, Jun 7, 2017 at 1:44 AM, Alexei Starovoitov
 wrote:
> On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote:
>> From: Alexei Starovoitov 
>> Date: Tue, 6 Jun 2017 11:55:33 -0700
>>
>> > If in the future mlx will make it into the nic in a way that
>> > encryption shares all memory management logic and there is no fpga
>> > at all then it indeed will be similar to tc offload. Right now it's
>> > not and needs different sw architecture.
>>
>> If the visible effect is identical, I fundamentally disagree with you.
>>
>> I don't care if there is a frog sitting on the PHY that transforms
>> the packets, it's all the same if the visible behavior is identical.
>
> that frog is a good example why we disagree.
> I need to check the pulse of that frog and last time it ate.
> In production I cannot have magical creatures do stuff for me.
> I need to monitor all components, debug and mitigate the issues.

Every HW vendor has his own magical creatures. you don't want just to
have a kernel object representative for every HW unit!
that's just not scalable.

> If encryption is done by the nic, I get all the monitoring and
> debugging as part of the standard tools. When it's a frog
> hidden by the nic, I cannot do much when the fire erupts,
> hence frog and production environment don't mix.
> To move things forward...

Let's assume there was no FPGA but the ASIC provided the encryption
feature, and still a fire erupts,
what would you have done differently ? Only the vendor will know how
to debug regardless of what creature went nuts!

Frog or a Cat the kernel shouldn't care much "it is all implementation
details (HW implementation)", we should use the entry point to this
device (PCI device/netdevice)
as an abstraction point, and standardizing visibility/debuggability
should be per 'kernel<->stack<->driver<->HW'
features/flows/transactions.
Having a full driver<->HW visibility/standardization eliminates the
need of vendor specific drivers and each vendor should implement only
standard HWs that can work with generic drivers.

Re debuggability in ConnectX architecture, there is a well defined
health reporting and monitoring mechanism between FW and driver.
and it is up to the FW to check the pulse of every frog it is
babysitting and report back to the driver.
believe me you don't want the driver to know about every single wire
in the chip.

> how about marking the whole thing CONFIG_EXPERIMENTAL instead of revert?
> Right now it's effectively non-production==experimental code and
> I want to make it clear.
>


Re: [PATCH net-next v2 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Florian Fainelli
2017-06-06 19:11 GMT-07:00 Vivien Didelot :
> Hi Florian,
>
> Florian Fainelli  writes:
>
>> +static inline struct dsa_port *dsa_ds_get_cpu_dp(struct dsa_switch *ds)
>> +{
>> + return >ports[fls(ds->cpu_port_mask) - 1];
>> +}
>
> So as I said in v2, now that a driver is guaranteed that dp->cpu_dp is
> correctly assigned at setup time, isn't better (especially for future
> multi-CPU support) to provide an helper which returns the CPU port for a
> given port? i.e. dsa_get_cpu_port(struct dsa_switch *ds, int port).
>
> Or is there something blocking? I might be wrong.

mt7530.c needs access to the CPU port at ops->setup() time which is
why this is still here.

>
> Note that I'm suggesting s/dsa_ds_get_cpu_dp/dsa_get_cpu_port/ since
> public DSA API does not need to use variable shortcuts such as ds or dp,
> but that's a minor suggestion?

We also have a dsa_dst_get_cpu_dp() but sure dsa_get_cpu_port() works for me.
-- 
Florian


Re: [PATCH net-next v2 1/5] net: dsa: Remove master_netdev and use dst->cpu_dp->netdev

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> In preparation for supporting multiple CPU ports, remove
> dst->master_netdev and ds->master_netdev and replace them with only one
> instance of the common object we have for a port: struct
> dsa_port::netdev. ds->master_netdev is currently write only and would be
> helpful in the case where we have two switches, both with CPU ports, and
> also connected within each other, which the multi-CPU port patch series
> would address.
>
> While at it, introduce a helper function used in net/dsa/slave.c to
> immediately get a reference on the master network device called
> dsa_master_netdev().
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 


Re: [PATCH net-next v2 0/5] net: dsa: Multi-CPU ground work (v2)

2017-06-06 Thread Vivien Didelot
Hi Florian,

Since you'll respin a v3, there are still a few typos here:

Florian Fainelli  writes:

> This patch series prepares the ground for adding mutliple CPU port support to

   multiple

> DSA, and starts by removing redundant pieces of information such as
> master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from

 cpu_dp->netdev

> directly accessing ds->dst->cpu_dp and use appropriate helper functions.


Thanks,

Vivien


Re: [PATCH net-next v2 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> +static inline struct dsa_port *dsa_ds_get_cpu_dp(struct dsa_switch *ds)
> +{
> + return >ports[fls(ds->cpu_port_mask) - 1];
> +}

So as I said in v2, now that a driver is guaranteed that dp->cpu_dp is
correctly assigned at setup time, isn't better (especially for future
multi-CPU support) to provide an helper which returns the CPU port for a
given port? i.e. dsa_get_cpu_port(struct dsa_switch *ds, int port).

Or is there something blocking? I might be wrong.

Note that I'm suggesting s/dsa_ds_get_cpu_dp/dsa_get_cpu_port/ since
public DSA API does not need to use variable shortcuts such as ds or dp,
but that's a minor suggestion?

Thanks,

Vivien


[PATCH net-next] net: fec: Clear and enable MIB counters on imx51

2017-06-06 Thread Andrew Lunn
Both the IMX51 and IMX53 datasheet indicates that the MIB counters
should be cleared during setup. Otherwise random numbers are returned
via ethtool -S.  Add a quirk and a function to do this.

Tested on an IMX51.

Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/freescale/fec.h  |  4 
 drivers/net/ethernet/freescale/fec_main.c | 27 ---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 5ea740b4cf14..38c7b21e5d63 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -446,6 +446,10 @@ struct bufdesc_ex {
 #define FEC_QUIRK_HAS_COALESCE (1 << 13)
 /* Interrupt doesn't wake CPU from deep idle */
 #define FEC_QUIRK_ERR006687(1 << 14)
+/* The MIB counters should be cleared and enabled during
+ * initialisation.
+ */
+#define FEC_QUIRK_MIB_CLEAR(1 << 15)
 
 struct bufdesc_prop {
int qid;
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index f7c8649fd28f..297fd196c879 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -89,10 +89,10 @@ static struct platform_device_id fec_devtype[] = {
.driver_data = 0,
}, {
.name = "imx25-fec",
-   .driver_data = FEC_QUIRK_USE_GASKET,
+   .driver_data = FEC_QUIRK_USE_GASKET | FEC_QUIRK_MIB_CLEAR,
}, {
.name = "imx27-fec",
-   .driver_data = 0,
+   .driver_data = FEC_QUIRK_MIB_CLEAR,
}, {
.name = "imx28-fec",
.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME |
@@ -184,6 +184,9 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #define FEC_RACC_SHIFT16   BIT(7)
 #define FEC_RACC_OPTIONS   (FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
+/* MIB Control Register */
+#define FEC_MIB_CTRLSTAT_DISABLE   BIT(31)
+
 /*
  * The 5270/5271/5280/5282/532x RX control register also contains maximum frame
  * size bits. Other FEC hardware does not, so we need to take that into
@@ -2356,6 +2359,21 @@ static int fec_enet_get_sset_count(struct net_device 
*dev, int sset)
}
 }
 
+static void fec_enet_clear_ethtool_stats(struct net_device *dev)
+{
+   struct fec_enet_private *fep = netdev_priv(dev);
+   int i;
+
+   /* Disable MIB statistics counters */
+   writel(FEC_MIB_CTRLSTAT_DISABLE, fep->hwp + FEC_MIB_CTRLSTAT);
+
+   for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
+   writel(0, fep->hwp + fec_stats[i].offset);
+
+   /* Don't disable MIB statistics counters */
+   writel(0, fep->hwp + FEC_MIB_CTRLSTAT);
+}
+
 #else  /* !defined(CONFIG_M5272) */
 #define FEC_STATS_SIZE 0
 static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
@@ -3182,7 +3200,10 @@ static int fec_enet_init(struct net_device *ndev)
 
fec_restart(ndev);
 
-   fec_enet_update_ethtool_stats(ndev);
+   if (fep->quirks & FEC_QUIRK_MIB_CLEAR)
+   fec_enet_clear_ethtool_stats(ndev);
+   else
+   fec_enet_update_ethtool_stats(ndev);
 
return 0;
 }
-- 
2.11.0



RE: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-06 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
> Of Jeff Kirsher
> Sent: Tuesday, June 6, 2017 1:46 PM
> To: David Miller ; Nikula, Jani
> 
> Cc: Ursulin, Tvrtko ; daniel.vet...@ffwll.ch; intel-
> g...@lists.freedesktop.org; linux-ker...@vger.kernel.org;
> jani.nik...@linux.intel.com; ch...@chris-wilson.co.uk; Ertman, David M
> ; intel-wired-...@lists.osuosl.org; dri-
> de...@lists.freedesktop.org; netdev@vger.kernel.org; airl...@gmail.com
> Subject: Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo
> e1000e_pm_freeze if __e1000_shutdown fails
> 
> On Fri, 2017-06-02 at 14:14 -0400, David Miller wrote:
> > From: Jani Nikula 
> > Date: Wed, 31 May 2017 18:50:43 +0300
> >
> > > From: Chris Wilson 
> > >
> > > An error during suspend (e100e_pm_suspend),
> >
> >  ...
> > > lead to complete failure:
> >
> >  ...
> > > The unwind failures stems from commit 2800209994f8 ("e1000e:
> > > Refactor PM
> > > flows"), but it may be a later patch that introduced the non-
> > > recoverable
> > > behaviour.
> > >
> > > Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
> > > Cc: Tvrtko Ursulin 
> > > Cc: Jeff Kirsher 
> > > Cc: Dave Ertman 
> > > Cc: Bruce Allan 
> > > Cc: intel-wired-...@lists.osuosl.org
> > > Cc: netdev@vger.kernel.org
> > > Signed-off-by: Chris Wilson 
> > > [Jani: bikeshed repainted]
> > > Signed-off-by: Jani Nikula 
> >
> > Jeff, please make sure this gets submitted to me soon.
> 
> Expect it later tonight, just finishing up testing.

Tested-by: Aaron Brown 


Re: [PATCH v4] net: don't call strlen on non-terminated string in dev_set_alias()

2017-06-06 Thread David Miller
From: Florian Westphal 
Date: Tue, 6 Jun 2017 23:57:35 +0200

> David Miller  wrote:
>> From: Alexander Potapenko 
>> Date: Tue,  6 Jun 2017 15:56:54 +0200
>> 
>> > KMSAN reported a use of uninitialized memory in dev_set_alias(),
>> > which was caused by calling strlcpy() (which in turn called strlen())
>> > on the user-supplied non-terminated string.
>> > 
>> > Signed-off-by: Alexander Potapenko 
>> 
>> We should not be allowing non-NULL terminated strings for the
>> IFLA_IFALIAS attribute.  It's defined as type NLA_STRING in
>> the ifla_policy[] array.
> 
> Unfortunately NLA_STRING doesn't check for NUL byte, only
> NLA_NUL_STRING does this.
> 
> So unless you think we can change kernel and make NLA_STRING
> behave like NLA_NUL_STRING I think patch is correct.

Ok, I missed that, thanks for the clarification.

I'll apply this and queue it up for -stable, thanks.


[PATCH 1/2] tcp: md5: add an address prefix for key lookup

2017-06-06 Thread Ivan Delalande
This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/net/tcp.h   |  6 +++--
 net/ipv4/tcp_ipv4.c | 68 ++---
 net/ipv6/tcp_ipv6.c | 12 ++
 3 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 38a7427ae902..2b68023ab095 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1395,6 +1395,7 @@ struct tcp_md5sig_key {
u8  keylen;
u8  family; /* AF_INET or AF_INET6 */
union tcp_md5_addr  addr;
+   u8  prefixlen;
u8  key[TCP_MD5SIG_MAXKEYLEN];
struct rcu_head rcu;
 };
@@ -1438,9 +1439,10 @@ struct tcp_md5sig_pool {
 int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key,
const struct sock *sk, const struct sk_buff *skb);
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp);
+  int family, u8 prefixlen, const u8 *newkey, u8 newkeylen,
+  gfp_t gfp);
 int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr,
-  int family);
+  int family, u8 prefixlen);
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5ab2aac5ca19..51ca3bd5a8a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -80,6 +80,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -906,6 +907,9 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock 
*sk,
struct tcp_md5sig_key *key;
unsigned int size = sizeof(struct in_addr);
const struct tcp_md5sig_info *md5sig;
+   __be32 mask;
+   struct tcp_md5sig_key *best_match = NULL;
+   bool match;
 
/* caller either holds rcu_read_lock() or socket lock */
md5sig = rcu_dereference_check(tp->md5sig_info,
@@ -919,12 +923,55 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct 
sock *sk,
hlist_for_each_entry_rcu(key, >head, node) {
if (key->family != family)
continue;
-   if (!memcmp(>addr, addr, size))
+
+   if (family == AF_INET) {
+   mask = inet_make_mask(key->prefixlen);
+   match = (key->addr.a4.s_addr & mask) ==
+   (addr->a4.s_addr & mask);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (family == AF_INET6) {
+   match = ipv6_prefix_equal(>addr.a6, >a6,
+ key->prefixlen);
+#endif
+   } else {
+   match = false;
+   }
+
+   if (match && (!best_match ||
+ key->prefixlen > best_match->prefixlen))
+   best_match = key;
+   }
+   return best_match;
+}
+EXPORT_SYMBOL(tcp_md5_do_lookup);
+
+struct tcp_md5sig_key *tcp_md5_do_lookup_exact(const struct sock *sk,
+  const union tcp_md5_addr *addr,
+  int family, u8 prefixlen)
+{
+   const struct tcp_sock *tp = tcp_sk(sk);
+   struct tcp_md5sig_key *key;
+   unsigned int size = sizeof(struct in_addr);
+   const struct tcp_md5sig_info *md5sig;
+
+   /* caller either holds rcu_read_lock() or socket lock */
+   md5sig = rcu_dereference_check(tp->md5sig_info,
+  lockdep_sock_is_held(sk));
+   if (!md5sig)
+   return NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   if (family == AF_INET6)
+   size = sizeof(struct in6_addr);
+#endif
+   hlist_for_each_entry_rcu(key, >head, node) {
+   if (key->family != family)
+   continue;
+   if (!memcmp(>addr, addr, size) &&
+   key->prefixlen == prefixlen)
return key;
}
return NULL;
 }
-EXPORT_SYMBOL(tcp_md5_do_lookup);
 
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 const struct sock *addr_sk)
@@ -938,14 +985,15 @@ EXPORT_SYMBOL(tcp_v4_md5_lookup);
 
 /* This can be called on a newly created socket, from other files */
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-  int family, const u8 *newkey, u8 newkeylen, gfp_t gfp)
+  int family, u8 prefixlen, const u8 *newkey, u8 

[PATCH 2/2] tcp: md5: add fields to the tcp_md5sig struct to set a key address prefix

2017-06-06 Thread Ivan Delalande
Replace padding in the socket option structure tcp_md5sig with a new
flag field and address prefix length so it can be specified when
configuring a new key with the TCP_MD5SIG socket option.

Signed-off-by: Bob Gilligan 
Signed-off-by: Eric Mowat 
Signed-off-by: Ivan Delalande 
---
 include/uapi/linux/tcp.h |  6 +-
 net/ipv4/tcp_ipv4.c  | 13 +++--
 net/ipv6/tcp_ipv6.c  | 20 +++-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07afdff..52ac30aa0652 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -234,9 +234,13 @@ enum {
 /* for TCP_MD5SIG socket option */
 #define TCP_MD5SIG_MAXKEYLEN   80
 
+/* tcp_md5sig flags */
+#define TCP_MD5SIG_FLAG_PREFIX 1   /* address prefix length */
+
 struct tcp_md5sig {
struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
-   __u16   __tcpm_pad1;/* zero */
+   __u8tcpm_flags; /* flags */
+   __u8tcpm_prefixlen; /* address prefix */
__u16   tcpm_keylen;/* key length */
__u32   __tcpm_pad2;/* zero */
__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 51ca3bd5a8a3..2b1bb67b3388 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1069,6 +1069,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
 {
struct tcp_md5sig cmd;
struct sockaddr_in *sin = (struct sockaddr_in *)_addr;
+   u8 prefixlen;
 
if (optlen < sizeof(cmd))
return -EINVAL;
@@ -1079,15 +1080,23 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
if (sin->sin_family != AF_INET)
return -EINVAL;
 
+   if (cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
+   prefixlen = cmd.tcpm_prefixlen;
+   if (prefixlen > 32)
+   return -EINVAL;
+   } else {
+   prefixlen = 32;
+   }
+
if (!cmd.tcpm_keylen)
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin_addr.s_addr,
- AF_INET, 32);
+ AF_INET, prefixlen);
 
if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN)
return -EINVAL;
 
return tcp_md5_do_add(sk, (union tcp_md5_addr *)>sin_addr.s_addr,
- AF_INET, 32, cmd.tcpm_key, cmd.tcpm_keylen,
+ AF_INET, prefixlen, cmd.tcpm_key, cmd.tcpm_keylen,
  GFP_KERNEL);
 }
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5cf19dab60aa..f293fc69e88b 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -519,6 +519,7 @@ static int tcp_v6_parse_md5_keys(struct sock *sk, char 
__user *optval,
 {
struct tcp_md5sig cmd;
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)_addr;
+   u8 prefixlen;
 
if (optlen < sizeof(cmd))
return -EINVAL;
@@ -529,12 +530,21 @@ static int tcp_v6_parse_md5_keys(struct sock *sk, char 
__user *optval,
if (sin6->sin6_family != AF_INET6)
return -EINVAL;
 
+   if (cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) {
+   prefixlen = cmd.tcpm_prefixlen;
+   if (prefixlen > 128 || (ipv6_addr_v4mapped(>sin6_addr) &&
+   prefixlen > 32))
+   return -EINVAL;
+   } else {
+   prefixlen = ipv6_addr_v4mapped(>sin6_addr) ? 32 : 128;
+   }
+
if (!cmd.tcpm_keylen) {
if (ipv6_addr_v4mapped(>sin6_addr))
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin6_addr.s6_addr32[3],
- AF_INET, 32);
+ AF_INET, prefixlen);
return tcp_md5_do_del(sk, (union tcp_md5_addr 
*)>sin6_addr,
- AF_INET6, 128);
+ AF_INET6, prefixlen);
}
 
if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN)
@@ -542,12 +552,12 @@ static int tcp_v6_parse_md5_keys(struct sock *sk, char 
__user *optval,
 
if (ipv6_addr_v4mapped(>sin6_addr))
return tcp_md5_do_add(sk, (union tcp_md5_addr 
*)>sin6_addr.s6_addr32[3],
- AF_INET, 32, cmd.tcpm_key,
+ AF_INET, prefixlen, cmd.tcpm_key,
  cmd.tcpm_keylen, GFP_KERNEL);
 
return tcp_md5_do_add(sk, (union tcp_md5_addr *)>sin6_addr,
- AF_INET6, 128, cmd.tcpm_key, cmd.tcpm_keylen,
- 

Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Andrew Lunn
On Tue, Jun 06, 2017 at 03:44:53PM -0700, Alexei Starovoitov wrote:
> On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote:
> > From: Alexei Starovoitov 
> > Date: Tue, 6 Jun 2017 11:55:33 -0700
> > 
> > > If in the future mlx will make it into the nic in a way that
> > > encryption shares all memory management logic and there is no fpga
> > > at all then it indeed will be similar to tc offload. Right now it's
> > > not and needs different sw architecture.
> > 
> > If the visible effect is identical, I fundamentally disagree with you.
> > 
> > I don't care if there is a frog sitting on the PHY that transforms
> > the packets, it's all the same if the visible behavior is identical.
> 
> that frog is a good example why we disagree.
> I need to check the pulse of that frog and last time it ate.

It is probably over-engineered for a single frog, but maybe you could
use a modified RFC 2795?

Andrew


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Ben Greear

On 06/06/2017 05:27 PM, Eric Dumazet wrote:

On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote:

On 6/6/17 3:06 PM, Ben Greear wrote:

This bug has been around forever, and we recently got an intern and
stuck him with
trying to reproduce it on the latest kernel.  It is still here.  I'm not
super excited
about trying to fix this, but we can easily test patches if someone has a
patch to try.


Can you try this (whitespace damaged on paste, but it is moving the lock
ahead of the fn_sernum check):

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..7a44c49055c0 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
cb->args[5] = w->root->fn_sernum;
}
} else {
+   read_lock_bh(>tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(>tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(>tb6_lock);
if (res <= 0) {



Good catch, but it looks like similar fix is needed a few lines before.


We will test this tomorrow.

Thanks,
Ben




diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,

read_lock_bh(>tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(>tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(>tb6_lock);
} else {
+   read_lock_bh(>tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(>tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(>tb6_lock);
if (res <= 0) {





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread David Ahern
On 6/6/17 6:27 PM, Eric Dumazet wrote:
> Good catch, but it looks like similar fix is needed a few lines before.
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 
> deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
>  100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
> struct sk_buff *skb,
>  
>   read_lock_bh(>tb6_lock);
>   res = fib6_walk(net, w);
> - read_unlock_bh(>tb6_lock);
>   if (res > 0) {
>   cb->args[4] = 1;
>   cb->args[5] = w->root->fn_sernum;
>   }
> + read_unlock_bh(>tb6_lock);

indeed. tunnel vision on Ben's problem


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote:
> On 6/6/17 3:06 PM, Ben Greear wrote:
> > This bug has been around forever, and we recently got an intern and
> > stuck him with
> > trying to reproduce it on the latest kernel.  It is still here.  I'm not
> > super excited
> > about trying to fix this, but we can easily test patches if someone has a
> > patch to try.
> 
> Can you try this (whitespace damaged on paste, but it is moving the lock
> ahead of the fn_sernum check):
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index deea901746c8..7a44c49055c0 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
> struct sk_buff *skb,
> cb->args[5] = w->root->fn_sernum;
> }
> } else {
> +   read_lock_bh(>tb6_lock);
> if (cb->args[5] != w->root->fn_sernum) {
> /* Begin at the root if the tree changed */
> cb->args[5] = w->root->fn_sernum;
> @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
> struct sk_buff *skb,
> } else
> w->skip = 0;
> 
> -   read_lock_bh(>tb6_lock);
> res = fib6_walk_continue(w);
> read_unlock_bh(>tb6_lock);
> if (res <= 0) {


Good catch, but it looks like similar fix is needed a few lines before.

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,
 
read_lock_bh(>tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(>tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(>tb6_lock);
} else {
+   read_lock_bh(>tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
} else
w->skip = 0;
 
-   read_lock_bh(>tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(>tb6_lock);
if (res <= 0) {




Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-06 Thread Tobias Diedrich
Oleksij Rempel wrote:
> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
> 
> > *
> > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
[...]

>memdmp 50ae78 50ae88

50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@

[...copy to bin...]
$ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
[..]
   0:   6c1004  entry   a1, 32
   3:   126aa2  l32ra2, 0xfffdaa8c
   6:   0c0200  memw
   9:   8820l32i.n  a8, a2, 0  <--Exception cause PC 
still points at load
   b:   c020movi.n  a2, 0
   d:   081940  extui   a9, a8, 1, 1

Judging from that it should be fairly simple to at least implement
some sort of retry, possible after triggering a PCIe link retrain?
There are some related PCIe root complex registers that may point to
what exactly failed if they were dumped.

The root complex registers live at 0x0004 and I think match the
registers described for the root complex in the AR9344 datasheet.

PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
"A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
the hierarchy reports any of the following errors and the associated
enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
ERR_NONFATAL."

AFAICS link retrain can be done by setting bit3 (INIT_RST,
"Application request to initiate a training reset") in
PCIE_APP (0x4).

See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
flips some bits in the RC to enable the PCIe bus for reading the
EEPROM).

The root complex pci configuration space is at 0x2 which could
have further error details:
>memdmp 2 20200

02: a02a 168c 0010 0006  0001 0001   .*..
020010:          
020020:          
020030:    0040    01ff  ...@
020040: 5bc3 5001        [.P.
020050: 0080 7005        ..p.
020060:          
020070: 0042 0010  8701  2010 0013 4411  .BD.
020080: 3011    00c0 03c0    0...
020090:    0010      
0200a0:          
0200b0:          
0200c0:          
0200d0:          
0200e0:          
0200f0:          
020100: 1401 0001     0006 2030  ...0
020110:    2000  00a0    
020120:          
020130:          
020140: 0001 0002        
020150:   8000 00ff      
020160:          
020170:          
020180:          
020190:          
0201a0:          
0201b0:          
0201c0:          
0201d0:          
0201e0:          
0201f0:          

Transformed into something suitable for feeding into lspci -F:

00:00.0 Description filled in by lspci
00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 

Re: [PATCH net-next v2 0/5] net: dsa: Multi-CPU ground work (v2)

2017-06-06 Thread Florian Fainelli
On 06/06/2017 05:03 PM, Florian Fainelli wrote:
> Hi all,
> 
> This patch series prepares the ground for adding mutliple CPU port support to
> DSA, and starts by removing redundant pieces of information such as
> master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from
> directly accessing ds->dst->cpu_dp and use appropriate helper functions.
> 
> Note that if you have Device Tree blobs/platform configurations that are
> currently listing multiple CPU ports, the proposed behavior in
> dsa_ds_get_cpu_dp() will be to return the last bit set in ds->cpu_port_mask.
> 
> Future plans include:
> - making dst->cpu_dp a flexible data structure (array, list, you name it)
> - having the ability for drivers to return a default/preferred CPU port (if
>   necessary)
> 
> Changes in v2:
> 
> - added Reviewed-by tags
> - assign port->cpu_dp earlier before ops->setup() has run

There are some hunks in patch 5 that actually belong in patch 3, I will
post a v3 after getting some more feedback.

> 
> Florian Fainelli (5):
>   net: dsa: Remove master_netdev and use dst->cpu_dp->netdev
>   net: dsa: Relocate master ethtool operations
>   net: dsa: Associate slave network device with CPU port
>   net: dsa: Introduce dsa_dst_get_cpu_dp()
>   net: dsa: Stop accessing ds->dst->cpu_dp in drivers
> 
>  drivers/net/dsa/b53/b53_common.c |  4 +--
>  drivers/net/dsa/bcm_sf2.c| 10 +---
>  drivers/net/dsa/mt7530.c |  6 +++--
>  drivers/net/dsa/mv88e6060.c  |  2 +-
>  drivers/net/dsa/qca8k.c  |  2 +-
>  include/net/dsa.h| 29 +-
>  net/dsa/dsa.c| 19 --
>  net/dsa/dsa2.c   | 27 
>  net/dsa/dsa_priv.h   | 10 
>  net/dsa/legacy.c | 23 ++---
>  net/dsa/slave.c  | 53 
> 
>  net/dsa/tag_brcm.c   |  5 ++--
>  net/dsa/tag_ksz.c|  5 ++--
>  net/dsa/tag_qca.c|  3 ++-
>  net/dsa/tag_trailer.c|  5 ++--
>  15 files changed, 107 insertions(+), 96 deletions(-)
> 


-- 
Florian


[PATCH net-next v2 2/5] net: dsa: Relocate master ethtool operations

2017-06-06 Thread Florian Fainelli
Relocate master_ethtool_ops and master_orig_ethtool_ops into struct
dsa_port in order to be both consistent, and make things self contained
within the dsa_port structure.

This is a preliminary change to supporting multiple CPU port interfaces.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 17 +
 net/dsa/dsa.c | 16 ++--
 net/dsa/slave.c   | 16 
 3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b2fb53f5e28e..7e93869819f9 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -122,12 +122,6 @@ struct dsa_switch_tree {
 */
struct dsa_platform_data*pd;
 
-   /*
-* Reference to network device to use, and which tagging
-* protocol to use.
-*/
-   struct net_device   *master_netdev;
-
/* Copy of tag_ops->rcv for faster access in hot path */
struct sk_buff *(*rcv)(struct sk_buff *skb,
   struct net_device *dev,
@@ -135,12 +129,6 @@ struct dsa_switch_tree {
   struct net_device *orig_dev);
 
/*
-* Original copy of the master netdev ethtool_ops
-*/
-   struct ethtool_ops  master_ethtool_ops;
-   const struct ethtool_ops *master_orig_ethtool_ops;
-
-   /*
 * The switch port to which the CPU is attached.
 */
struct dsa_port *cpu_dp;
@@ -189,6 +177,11 @@ struct dsa_port {
u8  stp_state;
struct net_device   *bridge_dev;
struct devlink_port devlink_port;
+   /*
+* Original copy of the master netdev ethtool_ops
+*/
+   struct ethtool_ops  ethtool_ops;
+   const struct ethtool_ops *orig_ethtool_ops;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index eaab1affeeeb..2665a66e833d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -118,15 +118,16 @@ int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp)
struct net_device *master;
struct ethtool_ops *cpu_ops;
 
-   master = ds->dst->cpu_dp->netdev;
+   master = cpu_dp->netdev;
+
cpu_ops = devm_kzalloc(ds->dev, sizeof(*cpu_ops), GFP_KERNEL);
if (!cpu_ops)
return -ENOMEM;
 
-   memcpy(>dst->master_ethtool_ops, master->ethtool_ops,
+   memcpy(_dp->ethtool_ops, master->ethtool_ops,
   sizeof(struct ethtool_ops));
-   ds->dst->master_orig_ethtool_ops = master->ethtool_ops;
-   memcpy(cpu_ops, >dst->master_ethtool_ops,
+   cpu_dp->orig_ethtool_ops = master->ethtool_ops;
+   memcpy(cpu_ops, _dp->ethtool_ops,
   sizeof(struct ethtool_ops));
dsa_cpu_port_ethtool_init(cpu_ops);
master->ethtool_ops = cpu_ops;
@@ -136,12 +137,7 @@ int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp)
 
 void dsa_cpu_port_ethtool_restore(struct dsa_port *cpu_dp)
 {
-   struct dsa_switch *ds = cpu_dp->ds;
-   struct net_device *master;
-
-   master = ds->dst->cpu_dp->netdev;
-
-   master->ethtool_ops = ds->dst->master_orig_ethtool_ops;
+   cpu_dp->netdev->ethtool_ops = cpu_dp->orig_ethtool_ops;
 }
 
 void dsa_cpu_dsa_destroy(struct dsa_port *port)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index d52c9ceb0566..ea4ed0285922 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -523,10 +523,10 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
s8 cpu_port = dst->cpu_dp->index;
int count = 0;
 
-   if (dst->master_ethtool_ops.get_sset_count) {
-   count = dst->master_ethtool_ops.get_sset_count(dev,
+   if (dst->cpu_dp->ethtool_ops.get_sset_count) {
+   count = dst->cpu_dp->ethtool_ops.get_sset_count(dev,
   ETH_SS_STATS);
-   dst->master_ethtool_ops.get_ethtool_stats(dev, stats, data);
+   dst->cpu_dp->ethtool_ops.get_ethtool_stats(dev, stats, data);
}
 
if (ds->ops->get_ethtool_stats)
@@ -539,8 +539,8 @@ static int dsa_cpu_port_get_sset_count(struct net_device 
*dev, int sset)
struct dsa_switch *ds = dst->cpu_dp->ds;
int count = 0;
 
-   if (dst->master_ethtool_ops.get_sset_count)
-   count += dst->master_ethtool_ops.get_sset_count(dev, sset);
+   if (dst->cpu_dp->ethtool_ops.get_sset_count)
+   count += dst->cpu_dp->ethtool_ops.get_sset_count(dev, sset);
 
if (sset == ETH_SS_STATS && ds->ops->get_sset_count)
count += ds->ops->get_sset_count(ds);
@@ -564,10 +564,10 @@ static void dsa_cpu_port_get_strings(struct net_device 
*dev,
/* We do not want to be NULL-terminated, since this is a prefix */
pfx[sizeof(pfx) - 1] = '_';
 
-   if (dst->master_ethtool_ops.get_sset_count) {
-  

[PATCH net-next v2 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Florian Fainelli
Out of the few drivers that do access ds->dst->cpu_dp, there is only a
handful for which we cannot substitute that for either an existing and
equivalent piece of information (b53, bcm_sf2, qca8k), and there is only
one for which we need to introduce a helper: mt7530. We do introduce
dsa_ds_get_cpu_dp() which reads the CPU port from ds->cpu_port_mask.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c |  4 ++--
 drivers/net/dsa/bcm_sf2.c| 10 ++
 drivers/net/dsa/mt7530.c |  4 +++-
 drivers/net/dsa/mv88e6060.c  |  2 +-
 drivers/net/dsa/qca8k.c  |  2 +-
 include/net/dsa.h|  6 ++
 net/dsa/dsa2.c   | 11 +++
 net/dsa/legacy.c |  1 +
 net/dsa/slave.c  |  1 -
 9 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index e68d368e20ac..faec6fcacd31 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1341,7 +1341,7 @@ EXPORT_SYMBOL(b53_fdb_dump);
 int b53_br_join(struct dsa_switch *ds, int port, struct net_device *br)
 {
struct b53_device *dev = ds->priv;
-   s8 cpu_port = ds->dst->cpu_dp->index;
+   s8 cpu_port = dev->cpu_port;
u16 pvlan, reg;
unsigned int i;
 
@@ -1387,7 +1387,7 @@ void b53_br_leave(struct dsa_switch *ds, int port, struct 
net_device *br)
 {
struct b53_device *dev = ds->priv;
struct b53_vlan *vl = >vlans[0];
-   s8 cpu_port = ds->dst->cpu_dp->index;
+   s8 cpu_port = dev->cpu_port;
unsigned int i;
u16 pvlan, reg, pvid;
 
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 76e98e8ed315..9744100d0276 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -227,7 +227,7 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
  struct phy_device *phy)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   s8 cpu_port = ds->dst->cpu_dp->index;
+   s8 cpu_port = priv->dev->cpu_port;
unsigned int i;
u32 reg;
 
@@ -806,8 +806,9 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int port,
   struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   struct dsa_port *cpu_dp = ds->ports[port].cpu_dp;
+   struct net_device *p = cpu_dp->netdev;
struct ethtool_wolinfo pwol;
 
/* Get the parent device WoL settings */
@@ -829,9 +830,10 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int 
port,
 static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
  struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   s8 cpu_port = ds->dst->cpu_dp->index;
+   struct dsa_port *cpu_dp = ds->ports[port].cpu_dp;
+   struct net_device *p = cpu_dp->netdev;
+   s8 cpu_port = cpu_dp->index;
struct ethtool_wolinfo pwol;
 
p->ethtool_ops->get_wol(p, );
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 1e46418a3b74..9b1b76c7b927 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -907,6 +907,7 @@ static int
 mt7530_setup(struct dsa_switch *ds)
 {
struct mt7530_priv *priv = ds->priv;
+   struct dsa_port *cpu_dp;
int ret, i;
u32 id, val;
struct device_node *dn;
@@ -916,7 +917,8 @@ mt7530_setup(struct dsa_switch *ds)
 * controller also is the container for two GMACs nodes representing
 * as two netdev instances.
 */
-   dn = ds->dst->cpu_dp->netdev->dev.of_node->parent;
+   cpu_dp = dsa_ds_get_cpu_dp(ds);
+   dn = cpu_dp->netdev->dev.of_node->parent;
priv->ethernet = syscon_node_to_regmap(dn);
if (IS_ERR(priv->ethernet))
return PTR_ERR(priv->ethernet);
diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index dce7fa57eb55..621cdc46ad81 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -176,7 +176,7 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, int 
p)
  ((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
   (dsa_is_cpu_port(ds, p) ?
ds->enabled_port_mask :
-   BIT(ds->dst->cpu_dp->index)));
+   BIT(ds->ports[p].cpu_dp->index)));
 
/* Port Association Vector: when learning source addresses
 * of packets, add the address to the address database using
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index b3bee7eab45f..68b45298f6d1 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ 

[PATCH net-next v2 0/5] net: dsa: Multi-CPU ground work (v2)

2017-06-06 Thread Florian Fainelli
Hi all,

This patch series prepares the ground for adding mutliple CPU port support to
DSA, and starts by removing redundant pieces of information such as
master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from
directly accessing ds->dst->cpu_dp and use appropriate helper functions.

Note that if you have Device Tree blobs/platform configurations that are
currently listing multiple CPU ports, the proposed behavior in
dsa_ds_get_cpu_dp() will be to return the last bit set in ds->cpu_port_mask.

Future plans include:
- making dst->cpu_dp a flexible data structure (array, list, you name it)
- having the ability for drivers to return a default/preferred CPU port (if
  necessary)

Changes in v2:

- added Reviewed-by tags
- assign port->cpu_dp earlier before ops->setup() has run

Florian Fainelli (5):
  net: dsa: Remove master_netdev and use dst->cpu_dp->netdev
  net: dsa: Relocate master ethtool operations
  net: dsa: Associate slave network device with CPU port
  net: dsa: Introduce dsa_dst_get_cpu_dp()
  net: dsa: Stop accessing ds->dst->cpu_dp in drivers

 drivers/net/dsa/b53/b53_common.c |  4 +--
 drivers/net/dsa/bcm_sf2.c| 10 +---
 drivers/net/dsa/mt7530.c |  6 +++--
 drivers/net/dsa/mv88e6060.c  |  2 +-
 drivers/net/dsa/qca8k.c  |  2 +-
 include/net/dsa.h| 29 +-
 net/dsa/dsa.c| 19 --
 net/dsa/dsa2.c   | 27 
 net/dsa/dsa_priv.h   | 10 
 net/dsa/legacy.c | 23 ++---
 net/dsa/slave.c  | 53 
 net/dsa/tag_brcm.c   |  5 ++--
 net/dsa/tag_ksz.c|  5 ++--
 net/dsa/tag_qca.c|  3 ++-
 net/dsa/tag_trailer.c|  5 ++--
 15 files changed, 107 insertions(+), 96 deletions(-)

-- 
2.9.3



[PATCH net-next v2 4/5] net: dsa: Introduce dsa_dst_get_cpu_dp()

2017-06-06 Thread Florian Fainelli
Introduce a helper function which will return a reference to the CPU
port used in a dsa_switch_tree. Right now this is a singleton, but this
will change once we introduce multi-CPU port support, so ease the
transition by converting the affected code paths.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa_priv.h|  5 +
 net/dsa/slave.c   | 31 ---
 net/dsa/tag_brcm.c|  5 ++---
 net/dsa/tag_ksz.c |  5 ++---
 net/dsa/tag_qca.c |  3 ++-
 net/dsa/tag_trailer.c |  5 ++---
 6 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 7c2326f3b538..49b4b047aed0 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -188,4 +188,9 @@ static inline struct net_device *dsa_master_netdev(struct 
dsa_slave_priv *p)
return p->dp->cpu_dp->netdev;
 }
 
+static inline struct dsa_port *dsa_dst_get_cpu_dp(struct dsa_switch_tree *dst)
+{
+   return dst->cpu_dp;
+}
+
 #endif
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index de1ab41cfd38..a73c1de398b5 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -519,14 +519,14 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
   uint64_t *data)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->cpu_dp->ds;
-   s8 cpu_port = dst->cpu_dp->index;
+   struct dsa_port *cpu_dp = dsa_dst_get_cpu_dp(dst);
+   struct dsa_switch *ds = cpu_dp->ds;
+   s8 cpu_port = cpu_dp->index;
int count = 0;
 
-   if (dst->cpu_dp->ethtool_ops.get_sset_count) {
-   count = dst->cpu_dp->ethtool_ops.get_sset_count(dev,
-  ETH_SS_STATS);
-   dst->cpu_dp->ethtool_ops.get_ethtool_stats(dev, stats, data);
+   if (cpu_dp->ethtool_ops.get_sset_count) {
+   count = cpu_dp->ethtool_ops.get_sset_count(dev, ETH_SS_STATS);
+   cpu_dp->ethtool_ops.get_ethtool_stats(dev, stats, data);
}
 
if (ds->ops->get_ethtool_stats)
@@ -536,11 +536,12 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
 static int dsa_cpu_port_get_sset_count(struct net_device *dev, int sset)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->cpu_dp->ds;
+   struct dsa_port *cpu_dp = dsa_dst_get_cpu_dp(dst);
+   struct dsa_switch *ds = cpu_dp->ds;
int count = 0;
 
-   if (dst->cpu_dp->ethtool_ops.get_sset_count)
-   count += dst->cpu_dp->ethtool_ops.get_sset_count(dev, sset);
+   if (cpu_dp->ethtool_ops.get_sset_count)
+   count += cpu_dp->ethtool_ops.get_sset_count(dev, sset);
 
if (sset == ETH_SS_STATS && ds->ops->get_sset_count)
count += ds->ops->get_sset_count(ds);
@@ -552,8 +553,9 @@ static void dsa_cpu_port_get_strings(struct net_device *dev,
 uint32_t stringset, uint8_t *data)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->cpu_dp->ds;
-   s8 cpu_port = dst->cpu_dp->index;
+   struct dsa_port *cpu_dp = dsa_dst_get_cpu_dp(dst);
+   struct dsa_switch *ds = cpu_dp->ds;
+   s8 cpu_port = cpu_dp->index;
int len = ETH_GSTRING_LEN;
int mcount = 0, count;
unsigned int i;
@@ -564,10 +566,9 @@ static void dsa_cpu_port_get_strings(struct net_device 
*dev,
/* We do not want to be NULL-terminated, since this is a prefix */
pfx[sizeof(pfx) - 1] = '_';
 
-   if (dst->cpu_dp->ethtool_ops.get_sset_count) {
-   mcount = dst->cpu_dp->ethtool_ops.get_sset_count(dev,
-   ETH_SS_STATS);
-   dst->cpu_dp->ethtool_ops.get_strings(dev, stringset, data);
+   if (cpu_dp->ethtool_ops.get_sset_count) {
+   mcount = cpu_dp->ethtool_ops.get_sset_count(dev, ETH_SS_STATS);
+   cpu_dp->ethtool_ops.get_strings(dev, stringset, data);
}
 
if (stringset == ETH_SS_STATS && ds->ops->get_strings) {
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index c03860907f28..d7ef2b35e61e 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -93,12 +93,11 @@ static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, 
struct net_device *dev,
struct net_device *orig_dev)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds;
+   struct dsa_port *cpu_dp = dsa_dst_get_cpu_dp(dst);
+   struct dsa_switch *ds = cpu_dp->ds;
int source_port;
u8 *brcm_tag;
 
-   ds = dst->cpu_dp->ds;
-
if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN)))
return NULL;
 
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index b94a334a1d02..c41a24e83e83 

[PATCH net-next v2 1/5] net: dsa: Remove master_netdev and use dst->cpu_dp->netdev

2017-06-06 Thread Florian Fainelli
In preparation for supporting multiple CPU ports, remove
dst->master_netdev and ds->master_netdev and replace them with only one
instance of the common object we have for a port: struct
dsa_port::netdev. ds->master_netdev is currently write only and would be
helpful in the case where we have two switches, both with CPU ports, and
also connected within each other, which the multi-CPU port patch series
would address.

While at it, introduce a helper function used in net/dsa/slave.c to
immediately get a reference on the master network device called
dsa_master_netdev().

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c |  4 ++--
 drivers/net/dsa/mt7530.c  |  4 ++--
 include/net/dsa.h |  5 -
 net/dsa/dsa.c |  9 ++---
 net/dsa/dsa2.c| 18 +++---
 net/dsa/dsa_priv.h|  5 +
 net/dsa/legacy.c  | 22 +-
 net/dsa/slave.c   | 20 +---
 8 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 687a8bae5d73..76e98e8ed315 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -806,7 +806,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int port,
   struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->dst[ds->index].master_netdev;
+   struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
struct ethtool_wolinfo pwol;
 
@@ -829,7 +829,7 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int 
port,
 static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
  struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->dst[ds->index].master_netdev;
+   struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
s8 cpu_port = ds->dst->cpu_dp->index;
struct ethtool_wolinfo pwol;
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 25e00d5e0eec..1e46418a3b74 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -912,11 +912,11 @@ mt7530_setup(struct dsa_switch *ds)
struct device_node *dn;
struct mt7530_dummy_poll p;
 
-   /* The parent node of master_netdev which holds the common system
+   /* The parent node of cpu_dp->netdev which holds the common system
 * controller also is the container for two GMACs nodes representing
 * as two netdev instances.
 */
-   dn = ds->master_netdev->dev.of_node->parent;
+   dn = ds->dst->cpu_dp->netdev->dev.of_node->parent;
priv->ethernet = syscon_node_to_regmap(dn);
if (IS_ERR(priv->ethernet))
return PTR_ERR(priv->ethernet);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 2effb0af9d7c..b2fb53f5e28e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -227,11 +227,6 @@ struct dsa_switch {
s8  rtable[DSA_MAX_SWITCHES];
 
/*
-* The lower device this switch uses to talk to the host
-*/
-   struct net_device *master_netdev;
-
-   /*
 * Slave mii_bus and devices for the individual ports.
 */
u32 dsa_port_mask;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index fdc448b30e56..eaab1affeeeb 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -118,10 +118,7 @@ int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp)
struct net_device *master;
struct ethtool_ops *cpu_ops;
 
-   master = ds->dst->master_netdev;
-   if (ds->master_netdev)
-   master = ds->master_netdev;
-
+   master = ds->dst->cpu_dp->netdev;
cpu_ops = devm_kzalloc(ds->dev, sizeof(*cpu_ops), GFP_KERNEL);
if (!cpu_ops)
return -ENOMEM;
@@ -142,9 +139,7 @@ void dsa_cpu_port_ethtool_restore(struct dsa_port *cpu_dp)
struct dsa_switch *ds = cpu_dp->ds;
struct net_device *master;
 
-   master = ds->dst->master_netdev;
-   if (ds->master_netdev)
-   master = ds->master_netdev;
+   master = ds->dst->cpu_dp->netdev;
 
master->ethtool_ops = ds->dst->master_orig_ethtool_ops;
 }
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index cd13bb54a30c..2674bdf03fef 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -337,7 +337,7 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, struct 
dsa_switch *ds)
return err;
 
if (ds->ops->set_addr) {
-   err = ds->ops->set_addr(ds, dst->master_netdev->dev_addr);
+   err = ds->ops->set_addr(ds, dst->cpu_dp->netdev->dev_addr);
if (err < 0)
return err;
}
@@ -444,7 +444,7 @@ static int dsa_dst_apply(struct dsa_switch_tree *dst)
 * sent to the tag 

[PATCH net-next v2 3/5] net: dsa: Associate slave network device with CPU port

2017-06-06 Thread Florian Fainelli
In preparation for supporting multiple CPU ports with DSA, have the
dsa_port structure know which CPU it is associated with. This will be
important in order to make sure the correct CPU is used for transmission
of the frames. If not for functional reasons, for performance (e.g: load
balancing) and forwarding decisions.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h  | 1 +
 net/dsa/dsa_priv.h | 2 +-
 net/dsa/slave.c| 5 -
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7e93869819f9..58969b9a090c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -171,6 +171,7 @@ struct dsa_port {
struct dsa_switch   *ds;
unsigned intindex;
const char  *name;
+   struct dsa_port *cpu_dp;
struct net_device   *netdev;
struct device_node  *dn;
unsigned intageing_time;
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 5c510f4ba0ce..7c2326f3b538 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -185,7 +185,7 @@ extern const struct dsa_device_ops trailer_netdev_ops;
 
 static inline struct net_device *dsa_master_netdev(struct dsa_slave_priv *p)
 {
-   return p->dp->ds->dst->cpu_dp->netdev;
+   return p->dp->cpu_dp->netdev;
 }
 
 #endif
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index ea4ed0285922..de1ab41cfd38 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1139,9 +1139,11 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
struct net_device *master;
struct net_device *slave_dev;
struct dsa_slave_priv *p;
+   struct dsa_port *cpu_dp;
int ret;
 
-   master = ds->dst->cpu_dp->netdev;
+   cpu_dp = ds->dst->cpu_dp;
+   master = cpu_dp->netdev;
 
slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
 NET_NAME_UNKNOWN, ether_setup);
@@ -1176,6 +1178,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device 
*parent,
p->old_duplex = -1;
 
ds->ports[port].netdev = slave_dev;
+   p->dp->cpu_dp = cpu_dp;
ret = register_netdev(slave_dev);
if (ret) {
netdev_err(master, "error %d registering interface %s\n",
-- 
2.9.3



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread David Ahern
On 6/6/17 3:06 PM, Ben Greear wrote:
> This bug has been around forever, and we recently got an intern and
> stuck him with
> trying to reproduce it on the latest kernel.  It is still here.  I'm not
> super excited
> about trying to fix this, but we can easily test patches if someone has a
> patch to try.

Can you try this (whitespace damaged on paste, but it is moving the lock
ahead of the fn_sernum check):

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..7a44c49055c0 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
cb->args[5] = w->root->fn_sernum;
}
} else {
+   read_lock_bh(>tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(>tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(>tb6_lock);
if (res <= 0) {


[PATCH] i40evf: remove redundant null check on key

2017-06-06 Thread Colin King
From: Colin Ian King 

key has previously been null checked so the subsequent null check
is redundant as key can never be null at that point, so remove it.

Detected by CoverityScan, CID#1357164 ("Logically dead code")

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 9bb2cc7dd4e4..838e57c6e176 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -732,9 +732,7 @@ static int i40evf_set_rxfh(struct net_device *netdev, const 
u32 *indir,
if (!indir)
return 0;
 
-   if (key) {
-   memcpy(adapter->rss_key, key, adapter->rss_key_size);
-   }
+   memcpy(adapter->rss_key, key, adapter->rss_key_size);
 
/* Each 32 bits pointed by 'indir' is stored with a lut entry */
for (i = 0; i < adapter->rss_lut_size; i++)
-- 
2.11.0



Re: [PATCH] ravb: Fix use-after-free on `ifconfig eth0 down`

2017-06-06 Thread Eugeniu Rosca
On Tue, Jun 06, 2017 at 12:35:30PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 6/6/2017 1:08 AM, Eugeniu Rosca wrote:
> 
> >Commit a47b70ea86bd ("ravb: unmap descriptors when freeing rings") has
> >introduced the issue seen in [1] reproduced on H3ULCB board.
> >
> >Fix this by relocating the RX skb ringbuffer free operation, so that
> >swiotlb page unmapping can be done first. Freeing of aligned TX buffers
> >is not relevant to the issue seen in [1]. Still, reposition TX free
> >calls as well, to have all kfree() operations performed consistently
> >_after_ dma_unmap_*()/dma_free_*().
> 
>Perhaps it's a material of a separate cleanup patch?

Many thanks for feedback. For the moment, with a number of sanitizers
and debugging options enabled (UBSAN, KASAN, KMEMLEAK, DMA_API_DEBUG), I
couldn't find any other obvious ravb driver failures in basic usecases
(didn't stress-test it though).

Regarding the reordering of kfree vs dma_* API calls, which might be
needed in other parts of the driver, this possibly will be highlighted
by special usecases like repetitive suspend/resume or the like. I will
happily share any other fixes, if such are developed on our side.

Best regards,
Eugeniu.


[PATCH] net: wireless: intel: iwlwifi: dvm: remove unused defines

2017-06-06 Thread Seraphime Kirkovski
Those constants have been unused for quite some time now.

Signed-off-by: Seraphime Kirkovski 
---
 I've compile-tested it.

 drivers/net/wireless/intel/iwlwifi/dvm/commands.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/commands.h 
b/drivers/net/wireless/intel/iwlwifi/dvm/commands.h
index 2ab2773655a8..37d2ba5ae852 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/commands.h
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/commands.h
@@ -1446,13 +1446,6 @@ struct agg_tx_status {
  *   or rate table color was changed during frame retries
  * refer tlc rate info
  */
-
-#define IWL50_TX_RES_INIT_RATE_INDEX_POS   0
-#define IWL50_TX_RES_INIT_RATE_INDEX_MSK   0x0f
-#define IWL50_TX_RES_RATE_TABLE_COLOR_POS  4
-#define IWL50_TX_RES_RATE_TABLE_COLOR_MSK  0x70
-#define IWL50_TX_RES_INV_RATE_INDEX_MSK0x80
-
 /* refer to ra_tid */
 #define IWLAGN_TX_RES_TID_POS  0
 #define IWLAGN_TX_RES_TID_MSK  0x0f
-- 
2.11.0



Re: SYN cookies: validity range

2017-06-06 Thread Eric Dumazet
On Wed, 2017-06-07 at 00:12 +0200, Juan José Echevarria wrote:
> Hi,
> 
> This is my first post, hope I'm not using the mailing list wrongly.
> 
> As proposed in an old thread
> (https://www.spinics.net/lists/netdev/msg329144.html), when we send
> SYN cookies and then exit this mode, tcp_synq_no_recent_overflow()
> returns false if cookies are received until 2 minutes later
> (TCP_SYNCOOKIE_VALID).
> 
> Despite the rest of the SYN cookie code allows ACKs be aged up to 2
> minutes, we should not accept cookies for that long if the SYN cookie
> episode is over. As we dont keep state, an ACK with a previous cookie
> will be assumed as a valid third packet of a TCP connection.
> 
> In this scenario, the validity range allows a client not to wait for a
> SYN-ACK most of the time. A client could intentionally send the
> required number of packets to fill the queue (eg, with a spoofed IP
> address). Then, it could open a connection, collect the cookie, and
> reuse it to speed up the opening of successive connections for 2
> minutes. This cheat -specially attractive in low-end devices, where
> the SYN queue is rather small- may behave similarly to TCP Fast Open,
> but without the awareness of the server.

No idea why someone would use this unreliable convoluted way, instead of
Fast Open ;)

> 
> Decreasing TCP_SYNCOOKIE_VALID would prevent the replay of cookies.

It will also prevent connections from innocent users with RTT of say 30
seconds (seen in real world conditions)







Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Alexei Starovoitov
On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote:
> From: Alexei Starovoitov 
> Date: Tue, 6 Jun 2017 11:55:33 -0700
> 
> > If in the future mlx will make it into the nic in a way that
> > encryption shares all memory management logic and there is no fpga
> > at all then it indeed will be similar to tc offload. Right now it's
> > not and needs different sw architecture.
> 
> If the visible effect is identical, I fundamentally disagree with you.
> 
> I don't care if there is a frog sitting on the PHY that transforms
> the packets, it's all the same if the visible behavior is identical.

that frog is a good example why we disagree.
I need to check the pulse of that frog and last time it ate.
In production I cannot have magical creatures do stuff for me.
I need to monitor all components, debug and mitigate the issues.
If encryption is done by the nic, I get all the monitoring and
debugging as part of the standard tools. When it's a frog
hidden by the nic, I cannot do much when the fire erupts,
hence frog and production environment don't mix.
To move things forward...
how about marking the whole thing CONFIG_EXPERIMENTAL instead of revert?
Right now it's effectively non-production==experimental code and
I want to make it clear.



Re: [PATCH net-next v2 2/2] bpf: Remove the capability check for cgroup skb eBPF program

2017-06-06 Thread Chenbo Feng


On 06/06/2017 09:56 AM, Daniel Borkmann wrote:

On 06/02/2017 01:42 AM, Alexei Starovoitov wrote:

On Wed, May 31, 2017 at 06:16:00PM -0700, Chenbo Feng wrote:

From: Chenbo Feng 

Currently loading a cgroup skb eBPF program require a CAP_SYS_ADMIN
capability while attaching the program to a cgroup only requires the
user have CAP_NET_ADMIN privilege. We can escape the capability
check when load the program just like socket filter program to make
the capability requirement consistent.

Change since v1:
Change the code style in order to be compliant with checkpatch.pl
preference

Signed-off-by: Chenbo Feng 


as far as I can see they're indeed the same as socket filters, so
Acked-by: Alexei Starovoitov 

but I don't quite understand how it helps, since as you said
attaching such unpriv fd to cgroup still requires root.
Do you have more patches to follow?


Hmm, when we relax this from capable(CAP_SYS_ADMIN) to unprivileged,
then we must at least also zero out the not-yet-initialized memory
for the mac header for egress case in __cgroup_bpf_run_filter_skb().



Do you mean something like:

if (type == BPF_CGROUP_INET_EGRESS) {

offset = skb_network_header(skb) - skb_mac_header(skb);

memset(skb_mac_header(skb), 0, offset)

}

And could you explain more on why we need to do this if we remove the 
CAP_SYS_ADMIN check? I thought we still cannot directly access the 
sk_buff without using bpf_skb_load_bytes helper and we still need a 
CAP_NET_ADMIN in order to attach and run the program on egress side right?


Querido usuário

2017-06-06 Thread <erwin.jac...@munivina.cl>
querido usuario

Su buzón de correo ha excedido el límite de almacenamiento de 20 GB establecido 
por el administrador, actualmente se está ejecutando a 20,9 GB, no puede enviar 
o recibir nuevos mensajes hasta que varify su buzón. Vuelva a validar su cuenta 
por correo, rellene y envíe los datos siguientes para verificar y actualizar su 
cuenta:

(1) Correo electrónico:
(2) Nombre de usuario:
 (3) Nombre:
(4) contraseña:
(5) Confirmar contraseña:

gracias
administrador de sistema

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



SYN cookies: validity range

2017-06-06 Thread Juan José Echevarria
Hi,

This is my first post, hope I'm not using the mailing list wrongly.

As proposed in an old thread
(https://www.spinics.net/lists/netdev/msg329144.html), when we send
SYN cookies and then exit this mode, tcp_synq_no_recent_overflow()
returns false if cookies are received until 2 minutes later
(TCP_SYNCOOKIE_VALID).

Despite the rest of the SYN cookie code allows ACKs be aged up to 2
minutes, we should not accept cookies for that long if the SYN cookie
episode is over. As we dont keep state, an ACK with a previous cookie
will be assumed as a valid third packet of a TCP connection.

In this scenario, the validity range allows a client not to wait for a
SYN-ACK most of the time. A client could intentionally send the
required number of packets to fill the queue (eg, with a spoofed IP
address). Then, it could open a connection, collect the cookie, and
reuse it to speed up the opening of successive connections for 2
minutes. This cheat -specially attractive in low-end devices, where
the SYN queue is rather small- may behave similarly to TCP Fast Open,
but without the awareness of the server.

Decreasing TCP_SYNCOOKIE_VALID would prevent the replay of cookies.


Re: [PATCH v4] net: don't call strlen on non-terminated string in dev_set_alias()

2017-06-06 Thread Florian Westphal
David Miller  wrote:
> From: Alexander Potapenko 
> Date: Tue,  6 Jun 2017 15:56:54 +0200
> 
> > KMSAN reported a use of uninitialized memory in dev_set_alias(),
> > which was caused by calling strlcpy() (which in turn called strlen())
> > on the user-supplied non-terminated string.
> > 
> > Signed-off-by: Alexander Potapenko 
> 
> We should not be allowing non-NULL terminated strings for the
> IFLA_IFALIAS attribute.  It's defined as type NLA_STRING in
> the ifla_policy[] array.

Unfortunately NLA_STRING doesn't check for NUL byte, only
NLA_NUL_STRING does this.

So unless you think we can change kernel and make NLA_STRING
behave like NLA_NUL_STRING I think patch is correct.


[PATCH net] ibmvnic: Return failure on attempted mtu change

2017-06-06 Thread John Allen
Changing the mtu is currently not supported in the ibmvnic driver.

Implement .ndo_change_mtu in the driver so that attempting to use ifconfig
to change the mtu will fail and present the user with an error message.

Signed-off-by: John Allen 
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4f2d329..8ff6c74 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1468,6 +1468,11 @@ static void ibmvnic_netpoll_controller(struct net_device 
*dev)
 }
 #endif

+static int ibmvnic_change_mtu(struct net_device *netdev, int new_mtu)
+{
+   return -EOPNOTSUPP;
+}
+
 static const struct net_device_ops ibmvnic_netdev_ops = {
.ndo_open   = ibmvnic_open,
.ndo_stop   = ibmvnic_close,
@@ -1479,6 +1484,7 @@ static void ibmvnic_netpoll_controller(struct net_device 
*dev)
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= ibmvnic_netpoll_controller,
 #endif
+   .ndo_change_mtu = ibmvnic_change_mtu,
 };

 /* ethtool functions */



Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-06-06 Thread Jes Sorensen

On 06/05/2017 05:53 PM, Saeed Mahameed wrote:

On Mon, Jun 5, 2017 at 11:51 PM, Jes Sorensen  wrote:

On 06/03/2017 03:37 PM, Or Gerlitz wrote:


On Fri, Jun 2, 2017 at 11:22 PM, Jes Sorensen  wrote:


On 05/28/2017 02:03 AM, Or Gerlitz wrote:



On Sun, May 28, 2017 at 5:23 AM, Jes Sorensen 
wrote:



On 05/27/2017 05:02 PM, Or Gerlitz wrote:




On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen

wrote:




This gets rid of the temporary #ifdef spaghetti and allows the code
to
compile without offload support enabled.





I am pretty sure we can do that exercise you're up to without any
spaghetti cooking and even put more code under that CONFIG directive
(en_rep.c), I'll take that with Saeed.





I want to avoid adding #ifdef CONFIG_foo to the main code in order to
keep
it readable. I did it gradually to make sure I didn't break anything
and
to
allow for it to be bisected in case something did break. If we can move
out
more code from places like en_rep.c into eswitch_offload.c and get it
disabled that way that would be great, but I like to limit the number
of
#ifdefs we add to the actual code.




FWIW (see below), squashing your seven patches to one resulted in a
fairly simple/clear
patch, so if we go that way, no need to have seven commits just for this
piece.




Squashing patches into jumbo patches is inherently broken and bad coding
practice! It makes it way more complicated to debug and bisect in case a
minor detail broke in the process.



Not that pure LOC ##-s is the only/deep measurement, but your overall
changes in the the seven patch series account to:

   5 files changed, 94 insertions(+), 3 deletions(-)

and by no mean this is jumbo or inherently broken and bad coded, so
please slow down please, I looked with care on the resulted patch and
said it's basically ok.



Squashing patches for the sake of squashing patches is inherently broken and
bad. So please calm down and stop this mangling of other peoples' patches.

If you want an alternative, put up a proposal and look at it for comparison
somewhere.


Hey Jes,

It is not just about squashing patches, I am working on a series of
patches to allow compiling out eswitch/eswitch_offloads/en_rep.c/en_tc
altogether, it will come out cleaner as it will remove all ethernet
sriov/eswitch VF representors and eswitch tc offloads stuff with one
kconfig flag, and yet preserve standard QoS functionality from en_tc.


Saeed,

I realize it is not just about squashing patches, however doing that to 
someone else's patches is just broken. The Linux kernel way is to build 
on top of patches, if they are valid, rather than throwing them all away 
and doing it from scratch again bottom up. If there was something 
actually wrong with my patches, and I would love to understand if that 
is the case, since I don't know 1/100th of the hardware details that you 
know, then please share those details.



BTW today you can just remove eswitch from driver and non sriov
configuration will perfectly work with no issues.
Even multi PF configuration will also work, but without l2 mac table,
which means PFs can only see packets with their own static (permanent)
mac addresses, user configured macs will not work on Multi PF
configuration.


It sounds like this shakes up things a little and we will have things 
moved to where they actually belong in the hierarchy so that will be a 
good thing in the end :)



For that i will take the l2 table (ConnectX PF mac table) logic out of
eswitch as it is not really an eswitch logic, and move it to core
driver to allow Multi PF configuration to work without eswitch.


Sounds good.


I will post some patches for you to review by end of week.


Could we please start seeing this stuff happen in a public git tree so 
it is possible to follow and contribute to the development? It is very 
frustrating having to wait for things to appear and and not knowing 
whether a patch is integrated or needs to be revised when you have 
things building on top of it.


Jes


Re: [PATCH v4] net: don't call strlen on non-terminated string in dev_set_alias()

2017-06-06 Thread Alexander Potapenko
On Tue, Jun 6, 2017 at 10:36 PM, David Miller  wrote:
> From: Alexander Potapenko 
> Date: Tue,  6 Jun 2017 15:56:54 +0200
>
>> KMSAN reported a use of uninitialized memory in dev_set_alias(),
>> which was caused by calling strlcpy() (which in turn called strlen())
>> on the user-supplied non-terminated string.
>>
>> Signed-off-by: Alexander Potapenko 
>
> We should not be allowing non-NULL terminated strings for the
> IFLA_IFALIAS attribute.  It's defined as type NLA_STRING in
> the ifla_policy[] array.
Sorry, I couldn't determine from RFC 2233 whether ifAlias is
zero-terminated or not, but looking at validate_nla() I see that
NLA_STRING is supposed to be such.
I'll check what's going on.

> Please figure out why we aren't enforcing the attribute policy
> properly, rather than adding a workaround.
Guess the string has been previously claimed to be non-terminated
here: https://patchwork.ozlabs.org/patch/996/
> Thanks.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Ben Greear

Hello,

This bug has been around forever, and we recently got an intern and stuck him 
with
trying to reproduce it on the latest kernel.  It is still here.  I'm not super 
excited
about trying to fix this, but we can easily test patches if someone has a
patch to try.

Test case is to create 1000 mac-vlans and bring them up, with user-space
tools running lots of 'dump' related commands as part of bringing up the
interfaces and configuring some special source-based routing tables.

(gdb) l *(inet6_dump_fib+0x109)
0x192f9 is in inet6_dump_fib 
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392).
387 } else
388 w->skip = 0;
389 
390 read_lock_bh(>tb6_lock);
391 res = fib6_walk_continue(w);
392 read_unlock_bh(>tb6_lock);
393 if (res <= 0) {
394 fib6_walker_unlink(net, w);
395 cb->args[4] = 0;
396 }

(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue 
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588if (fn == w->root)
1589return 0;
1590pn = fn->parent;
1591w->node = pn;
1592#ifdef CONFIG_IPV6_SUBTREES
1593if (FIB6_SUBTREE(pn) == fn) {
1594WARN_ON(!(fn->fn_flags & RTN_ROOT));
1595w->state = FWS_L;
1596continue;
1597}

[root@ct524-ffb0 ~]# BUG: unable to handle kernel NULL pointer dereference at 
0018
IP: fib6_walk_continue+0x76/0x180 [ipv6]
PGD 3d9226067
P4D 3d9226067
PUD 3d9020067
PMD 0

Oops:  [#1] PREEMPT SMP
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c bnep fuse macvlan pktgen cfg80211 ipmi_ssif iTCO_wdt iTCO_vendor_support 
coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass joydev i2c_i801 ie31200_edac intel_pch_thermal shpchp hci_uart ipmi_si btbcm 
btqca ipmi_devintf btintel ipmi_msghandler bluetooth pinctrl_sunrisepoint acpi_als pinctrl_intel video tpm_tis intel_lpss_acpi kfifo_buf tpm_tis_core intel_lpss 
industrialio tpm acpi_pad acpi_power_meter sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca 
i2c_algo_bit i2c_hid i2c_core ipv6 crc_ccitt [last unloaded: nf_conntrack]

CPU: 1 PID: 996 Comm: ip Not tainted 4.12.0-rc4+ #32
Hardware name: Supermicro Super Server/X11SSM-F, BIOS 1.0b 12/29/2015
task: 8803d4d61dc0 task.stack: c9000970c000
RIP: 0010:fib6_walk_continue+0x76/0x180 [ipv6]
RSP: 0018:c9000970fbb8 EFLAGS: 00010283
RAX: 8803de84b020 RBX: 8803e0756f00 RCX: 
RDX:  RSI: c9000970fc00 RDI: 81eee280
RBP: c9000970fbc0 R08: 0008 R09: 8803d4fbbf31
R10: c9000970fb68 R11:  R12: 0001
R13: 0001 R14: 8803e0756f00 R15: 8803d9345b18
FS:  7f32ca4ec700() GS:88047784() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0018 CR3: 0003ddacc000 CR4: 003406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 inet6_dump_fib+0x109/0x290 [ipv6]
 netlink_dump+0x11d/0x290
 netlink_recvmsg+0x260/0x3f0
 sock_recvmsg+0x38/0x40
 ___sys_recvmsg+0xe9/0x230
 ? alloc_pages_vma+0x9d/0x260
 ? page_add_new_anon_rmap+0x88/0xc0
 ? lru_cache_add_active_or_unevictable+0x31/0xb0
 ? __handle_mm_fault+0xce3/0xf70
 __sys_recvmsg+0x3d/0x70
 ? __sys_recvmsg+0x3d/0x70
 SyS_recvmsg+0xd/0x20
 do_syscall_64+0x56/0xc0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f32c9e21050
RSP: 002b:7fff96401de8 EFLAGS: 0246 ORIG_RAX: 002f
RAX: ffda RBX:  RCX: 7f32c9e21050
RDX:  RSI: 7fff96401e50 RDI: 0004
RBP: 7fff96405e74 R08: 3fe4 R09: 
R10: 7fff96401e90 R11: 0246 R12: 0064f3a0
R13: 7fff96405ee0 R14: 3fe4 R15: 
Code: f6 40 2a 04 74 11 8b 53 30 85 d2 0f 84 02 01 00 00 83 ea 01 89 53 30 c7 43 28 04 00 00 00 48 39 43 10 74 33 48 8b 10 48 89 53 18 <48> 39 42 18 0f 84 a3 00 
00 00 48 39 42 08 0f 84 ae 00 00 00 48

RIP: fib6_walk_continue+0x76/0x180 [ipv6] RSP: c9000970fbb8
CR2: 0018
---[ end trace 5ebbc4ee97bea64e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..
ACPI MEMORY or I/O RESET_REG.


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



[PATCH net-next 1/5] net: dsa: mv88e6xxx: define membership on VLAN add

2017-06-06 Thread Vivien Didelot
Define the target port membership of the VLAN entry in
mv88e6xxx_port_vlan_add where ds is scoped.

Allow the DSA core to call later the port_vlan_add operation for CPU or
DSA ports, by using the Unmodified membership for these ports, as in the
current behavior.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 117f275e3fb6..93078bbe3cb5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1274,7 +1274,7 @@ mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, int 
port,
 }
 
 static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
-   u16 vid, bool untagged)
+   u16 vid, u8 member)
 {
struct mv88e6xxx_vtu_entry vlan;
int err;
@@ -1283,9 +1283,7 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip 
*chip, int port,
if (err)
return err;
 
-   vlan.member[port] = untagged ?
-   GLOBAL_VTU_DATA_MEMBER_TAG_UNTAGGED :
-   GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED;
+   vlan.member[port] = member;
 
return mv88e6xxx_vtu_loadpurge(chip, );
 }
@@ -1297,15 +1295,23 @@ static void mv88e6xxx_port_vlan_add(struct dsa_switch 
*ds, int port,
struct mv88e6xxx_chip *chip = ds->priv;
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
+   u8 member;
u16 vid;
 
if (!chip->info->max_vid)
return;
 
+   if (dsa_is_dsa_port(ds, port) || dsa_is_cpu_port(ds, port))
+   member = GLOBAL_VTU_DATA_MEMBER_TAG_UNMODIFIED;
+   else if (untagged)
+   member = GLOBAL_VTU_DATA_MEMBER_TAG_UNTAGGED;
+   else
+   member = GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED;
+
mutex_lock(>reg_lock);
 
for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid)
-   if (_mv88e6xxx_port_vlan_add(chip, port, vid, untagged))
+   if (_mv88e6xxx_port_vlan_add(chip, port, vid, member))
netdev_err(ds->ports[port].netdev,
   "failed to add VLAN %d%c\n",
   vid, untagged ? 'u' : 't');
-- 
2.13.0



[PATCH net-next 0/5] net: dsa: add cross-chip VLAN support

2017-06-06 Thread Vivien Didelot
The current code in DSA does not support cross-chip VLAN. This means
that in a multi-chip environment such as this one (similar to ZII Rev B)

 [CPU] (mdio)
(eth0) |   :   :  :
  _|_______
 [__sw0__]--[__sw1__]--[__sw2__]
  |  |  ||  |  ||  |  |
  v  v  vv  v  vv  v  v
  p1 p2 p3   p4 p5 p6   p7 p8 p9 

adding a VLAN to p9 won't be enough to reach the CPU, until at least one
port of sw0 and sw1 join the VLAN as well and become aware of the VID.

This patchset makes the DSA core program the VLAN on the CPU and DSA
links itself, which brings seamlessly cross-chip VLAN support to DSA.

With this series applied*, the hardware VLAN tables of a 3-switch setup
look like this after adding a VLAN to only one port of the end switch:

# cat /sys/class/net/br0/bridge/default_pvid 
42
# cat /sys/kernel/debug/mv88e6xxx/sw{0,1,2}/vtu
# ip link set up master br0 dev lan6
# cat /sys/kernel/debug/mv88e6xxx/sw{0,1,2}/vtu
 VID  FID  SID  0  1  2  3  4  5  6
  4210  x  x  x  x  x  =  =
 VID  FID  SID  0  1  2  3  4  5  6
  4210  x  x  x  x  x  =  =
 VID  FID  SID  0  1  2  3  4  5  6  7  8  9
  4210  u  x  x  x  x  x  x  x  x  =

('x' is excluded, 'u' is untagged, '=' is unmodified DSA and CPU ports.)

Completely removing a VLAN entry (which is currently the responsibility
of drivers anyway) is not supported yet since it requires some caching.

(*) the output is shown from this out-of-tree debugfs patch:
https://github.com/vivien/linux/commit/7b61a684b9d6b6a499135a587c7f62a1fddceb8b.patch

Vivien Didelot (5):
  net: dsa: mv88e6xxx: define membership on VLAN add
  net: dsa: check VLAN capability of every switch
  net: dsa: add CPU and DSA ports as VLAN members
  net: dsa: mv88e6xxx: exclude all ports in new VLAN
  net: dsa: mv88e6xxx: do not purge a VTU entry

 drivers/net/dsa/mv88e6xxx/chip.c | 38 +++---
 net/dsa/switch.c | 30 --
 2 files changed, 35 insertions(+), 33 deletions(-)

-- 
2.13.0



[PATCH net-next 3/5] net: dsa: add CPU and DSA ports as VLAN members

2017-06-06 Thread Vivien Didelot
In a multi-chip switch fabric, it is currently the responsibility of the
driver to add the CPU or DSA (interconnecting chips together) ports as
members of a new VLAN entry. This makes the drivers more complicated.

We want the DSA drivers to be stupid and the DSA core being the one
responsible for caring about the abstracted switch logic and topology.

Make the DSA core program the CPU and DSA ports as part of the VLAN.

This makes all chips of the data path to be aware of VIDs spanning the
the whole fabric and thus, seamlessly add support for cross-chip VLAN.

Signed-off-by: Vivien Didelot 
---
 net/dsa/switch.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index f235ae1e9777..f913cdfe6585 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -166,6 +166,9 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
bitmap_zero(members, ds->num_ports);
if (ds->index == info->sw_index)
set_bit(info->port, members);
+   for (port = 0; port < ds->num_ports; ++port)
+   if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+   set_bit(port, members);
 
if (switchdev_trans_ph_prepare(trans)) {
if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
-- 
2.13.0



[PATCH net-next 4/5] net: dsa: mv88e6xxx: exclude all ports in new VLAN

2017-06-06 Thread Vivien Didelot
Now that the DSA core adds the CPU and DSA ports itself to the new VLAN
entry, there is no need to include them as members of this VLAN when
initializing a new VTU entry.

As of now, initialize a new VTU entry with all ports excluded.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 93078bbe3cb5..522f023bb17e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1159,11 +1159,10 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip 
*chip, u16 vid,
entry->valid = true;
entry->vid = vid;
 
-   /* Include only CPU and DSA ports */
+   /* Exclude all ports */
for (i = 0; i < mv88e6xxx_num_ports(chip); ++i)
-   entry->member[i] = dsa_is_normal_port(chip->ds, i) ?
-   GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER :
-   GLOBAL_VTU_DATA_MEMBER_TAG_UNMODIFIED;
+   entry->member[i] =
+   GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
 
return mv88e6xxx_atu_new(chip, >fid);
}
-- 
2.13.0



[PATCH net-next 2/5] net: dsa: check VLAN capability of every switch

2017-06-06 Thread Vivien Didelot
Now that the VLAN object is propagated to every switch chip of the
switch fabric, we can easily ensure that they all support the required
VLAN operations before modifying an entry on a single switch.

To achieve that, remove the condition skipping other target switches,
and add a bitmap of VLAN members, eventually containing the target port,
if we are programming the switch target.

This will allow us to easily add other VLAN members, such as the DSA or
CPU ports (to introduce cross-chip VLAN support) or the other port
members if we want to reduce hardware accesses later.

Signed-off-by: Vivien Didelot 
---
 net/dsa/switch.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index d8e5c311ee7c..f235ae1e9777 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -159,19 +159,27 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
 {
const struct switchdev_obj_port_vlan *vlan = info->vlan;
struct switchdev_trans *trans = info->trans;
+   DECLARE_BITMAP(members, ds->num_ports);
+   int port, err;
 
-   /* Do not care yet about other switch chips of the fabric */
-   if (ds->index != info->sw_index)
-   return 0;
+   /* Build a mask of VLAN members */
+   bitmap_zero(members, ds->num_ports);
+   if (ds->index == info->sw_index)
+   set_bit(info->port, members);
 
if (switchdev_trans_ph_prepare(trans)) {
if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
return -EOPNOTSUPP;
 
-   return ds->ops->port_vlan_prepare(ds, info->port, vlan, trans);
+   for_each_set_bit(port, members, ds->num_ports) {
+   err = ds->ops->port_vlan_prepare(ds, port, vlan, trans);
+   if (err)
+   return err;
+   }
}
 
-   ds->ops->port_vlan_add(ds, info->port, vlan, trans);
+   for_each_set_bit(port, members, ds->num_ports)
+   ds->ops->port_vlan_add(ds, port, vlan, trans);
 
return 0;
 }
@@ -181,14 +189,13 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
 {
const struct switchdev_obj_port_vlan *vlan = info->vlan;
 
-   /* Do not care yet about other switch chips of the fabric */
-   if (ds->index != info->sw_index)
-   return 0;
-
if (!ds->ops->port_vlan_del)
return -EOPNOTSUPP;
 
-   return ds->ops->port_vlan_del(ds, info->port, vlan);
+   if (ds->index == info->sw_index)
+   return ds->ops->port_vlan_del(ds, info->port, vlan);
+
+   return 0;
 }
 
 static int dsa_switch_event(struct notifier_block *nb,
-- 
2.13.0



[PATCH net-next 5/5] net: dsa: mv88e6xxx: do not purge a VTU entry

2017-06-06 Thread Vivien Didelot
The mv88e6xxx driver currently tries to be smart and remove by itself a
VLAN entry from the VTU when the driven switch sees no user ports as
members of the VLAN.

This is bad in a multi-chip switch fabric, since a chip in between
others may have no bridge port members, but still needs to be aware of
the VID in order to correctly pass frames in the data path.

Remove the code purging a VTU entry when updating a port membership.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 522f023bb17e..64c0f88f9e79 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1325,9 +1325,8 @@ static void mv88e6xxx_port_vlan_add(struct dsa_switch 
*ds, int port,
 static int _mv88e6xxx_port_vlan_del(struct mv88e6xxx_chip *chip,
int port, u16 vid)
 {
-   struct dsa_switch *ds = chip->ds;
struct mv88e6xxx_vtu_entry vlan;
-   int i, err;
+   int err;
 
err = mv88e6xxx_vtu_get(chip, vid, , false);
if (err)
@@ -1339,18 +1338,6 @@ static int _mv88e6xxx_port_vlan_del(struct 
mv88e6xxx_chip *chip,
 
vlan.member[port] = GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
 
-   /* keep the VLAN unless all ports are excluded */
-   vlan.valid = false;
-   for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
-   if (dsa_is_cpu_port(ds, i) || dsa_is_dsa_port(ds, i))
-   continue;
-
-   if (vlan.member[i] != GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) {
-   vlan.valid = true;
-   break;
-   }
-   }
-
err = mv88e6xxx_vtu_loadpurge(chip, );
if (err)
return err;
-- 
2.13.0



Re: [PATCH net-next v2 3/3] udp: try to avoid 2 cache miss on dequeue

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 16:23 +0200, Paolo Abeni wrote:
> when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> fields are on cold cachelines.
> If the skb are linear and the kernel don't need to compute the udp
> csum, only a handful of skb fields are required by udp_recvmsg().
> Since we already use skb->dev_scratch to cache hot data, and
> there are 32 bits unused on 64 bit archs, use such field to cache
> as much data as we can, and try to prefetch on dequeue the relevant
> fields that are left out.

Acked-by: Eric Dumazet 





[GIT] Networking

2017-06-06 Thread David Miller

1) Made TCP congestion control documentation match current reality,
   from Anmol Sarma.

2) Various build warning and failure fixes from Arnd Bergmann.

3) Fix SKB list leak in ipv6_gso_segment().

4) Use after free in ravb driver, from Eugeniu Rosca.

5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.

6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
   G. Piccoli.

7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping Zhang.

8) Use after free in vxlan deletion, from Mark Bloch.

9) Fix ordering of NAPI poll enabled in ethoc driver, from Max Filippov.

10) Fix stmmac hangs with TSO, from Niklas Cassel.

11) Fix crash in CALIPSO ipv6, from Richard Haines.

12) Clear nh_flags properly on mpls link up.  From Roopa Prabhu.

13) Fix regression in sk_err socket error queue handling, noticed by
ping applications.  From Soheil Hassas Yeganeh.

14) Update mlx4/mlx5 MAINTAINERS information.

Please pull, thanks a lot!

The following changes since commit e2a9aa5ab2a4d1fb05fcdfa9661d54e437093297:

  Merge tag 'led_fixes_for_4-12-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds 
(2017-05-26 14:02:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 1d3028f4c16487d63861ab6c68451768a7a109df:

  net: stmmac: fix a broken u32 less than zero check (2017-06-06 16:26:28 -0400)


Andrew Lunn (2):
  net: dsa: mv88e6xxx: Add eeprom-length to binding
  net: dsa: mv88e6xxx: Add missing static to stub functions

Anmol Sarma (1):
  net: Update TCP congestion control documentation

Arend Van Spriel (1):
  brcmfmac: fix alignment configuration on host using 64-bit DMA

Arnd Bergmann (2):
  net: dsa: mv88e6xxx: Add missing static to stub functions
  net/mlx5: avoid build warning for uniprocessor

Ben Hutchings (1):
  ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

Bjorn Andersson (1):
  wcn36xx: Close SMD channel on device removal

Björn Töpel (1):
  i40e/i40evf: proper update of the page_offset field

Chopra, Manish (1):
  qlcnic: Fix tunnel offload for 82xx adapters

Colin Ian King (2):
  net: stmmac: ensure jumbo_frm error return is correctly checked for -ve 
value
  net: stmmac: fix a broken u32 less than zero check

David S. Miller (8):
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'mlx4-mlx5-MAINTAINERS-update'
  Merge branch 'ARM-imx6ul-14x14-evk-Fix-suspend-over-nfs-by-phy'
  Merge tag 'mac80211-for-davem-2017-06-02' of 
git://git.kernel.org/.../jberg/mac80211
  ipv6: Fix leak in ipv6_gso_segment().
  Revert "sit: reload iphdr in ipip6_rcv"
  Merge branch '40GbE' of git://git.kernel.org/.../jkirsher/net-queue
  Merge tag 'wireless-drivers-for-davem-2017-06-06' of 
git://git.kernel.org/.../kvalo/wireless-drivers

Davide Caratti (1):
  netfilter: conntrack: fix false CRC32c mismatch using paged skb

Douglas Caetano dos Santos (1):
  tcp: reinitialize MTU probing when setting MSS in a TCP repair

Emmanuel Grumbach (1):
  iwlwifi: mvm: fix firmware debug restart recording

Eric Dumazet (1):
  net: ping: do not abuse udp_poll()

Eric Garver (1):
  geneve: fix needed_headroom and max_mtu for collect_metadata

Eugeniu Rosca (1):
  ravb: Fix use-after-free on `ifconfig eth0 down`

Firo Yang (1):
  hdlcdrv: Fix divide by zero in hdlcdrv_ioctl

Florian Fainelli (3):
  net: systemport: Fix missing Wake-on-LAN interrupt for SYSTEMPORT Lite
  net: dsa: Move dsa_switch_{suspend,resume} out of legacy.c
  net: dsa: Fix stale cpu_switch reference after unbind then bind

Ganesh Goudar (2):
  cxgb4: update latest firmware version supported
  cxgb4: avoid enabling napi twice to the same queue

Gregory Greenman (1):
  iwlwifi: mvm: rs: start using LQ command color

Guilherme G. Piccoli (1):
  cxgb4: avoid crash on PCI error recovery path

Gustavo A. R. Silva (1):
  net: freescale: fix potential null pointer dereference

Haim Dreyfuss (1):
  iwlwifi: mvm: Fix command queue number on d0i3 flow

Haishuang Yan (2):
  sit: reload iphdr in ipip6_rcv
  devlink: fix potential memort leak

Ido Shamay (1):
  net/mlx4: Check if Granular QoS per VF has been enabled before updating 
QP qos_vport

Jia-Ju Bai (3):
  isdn: Fix a sleep-in-atomic bug
  qlcnic: Fix a sleep-in-atomic bug in qlcnic_82xx_hw_write_wx_2M and 
qlcnic_82xx_hw_read_wx_2M
  mISDN: Fix a sleep-in-atomic bug

Johannes Berg (4):
  mac80211: fix TX aggregation start/stop callback race
  mac80211: fix dropped counter in multiqueue RX
  iwlwifi: tt: move ucode_loaded check under mutex
  iwlwifi: mvm: clear new beacon command template struct

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2017-06-05' of 

Re: [PATCH net-next v2 2/3] udp: avoid a cache miss on dequeue

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 16:23 +0200, Paolo Abeni wrote:
> Since UDP no more uses sk->destructor, we can clear completely
> the skb head state before enqueuing. Amend and use
> skb_release_head_state() for that.
> 
> All head states share a single cacheline, which is not
> normally used/accesses on dequeue. We can avoid entirely accessing
> such cacheline implementing and using in the UDP code a specialized
> skb free helper which ignores the skb head state.
> 
> This saves a cacheline miss at skb deallocation time.

Acked-by: Eric Dumazet 




Re: [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-06 Thread Jeff Kirsher
On Fri, 2017-06-02 at 14:14 -0400, David Miller wrote:
> From: Jani Nikula 
> Date: Wed, 31 May 2017 18:50:43 +0300
> 
> > From: Chris Wilson 
> > 
> > An error during suspend (e100e_pm_suspend),
> 
>  ...
> > lead to complete failure:
> 
>  ...
> > The unwind failures stems from commit 2800209994f8 ("e1000e:
> > Refactor PM
> > flows"), but it may be a later patch that introduced the non-
> > recoverable
> > behaviour.
> > 
> > Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
> > Cc: Tvrtko Ursulin 
> > Cc: Jeff Kirsher 
> > Cc: Dave Ertman 
> > Cc: Bruce Allan 
> > Cc: intel-wired-...@lists.osuosl.org
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Chris Wilson 
> > [Jani: bikeshed repainted]
> > Signed-off-by: Jani Nikula 
> 
> Jeff, please make sure this gets submitted to me soon.

Expect it later tonight, just finishing up testing.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH net-next v2 1/2] bpf: Allow CGROUP_SKB eBPF program to access sk_buff

2017-06-06 Thread David Miller
From: Daniel Borkmann 
Date: Tue, 06 Jun 2017 22:27:15 +0200

> On 06/06/2017 10:26 PM, David Miller wrote:
>> From: Chenbo Feng 
>> Date: Tue, 6 Jun 2017 13:24:11 -0700
>>
>>> On Tue, Jun 6, 2017 at 9:40 AM, Daniel Borkmann 
>>> wrote:
>>>
 On 06/06/2017 02:04 PM, Daniel Borkmann wrote:

> On 06/01/2017 03:15 AM, Chenbo Feng wrote:
>
>> From: Chenbo Feng 
>>
>> This allows cgroup eBPF program to classify packet based on their
>> protocol or other detail information. Currently program need
>> CAP_NET_ADMIN privilege to attach a cgroup eBPF program, and A
>> process with CAP_NET_ADMIN can already see all packets on the system,
>> for example, by creating an iptables rules that causes the packet to
>> be passed to userspace via NFLOG.
>>
>> Signed-off-by: Chenbo Feng 
>>
>
> Sorry, but I am puzzled what above change log has to do with the
> below diff?! Back then we decided not to add BPF_PROG_TYPE_CGROUP_SKB
> to may_access_skb(), since one can already use bpf_skb_load_bytes()
> helper to access pkt data, which is a much more flexible interface.
> Mind to elaborate why you cannot use bpf_skb_load_bytes() instead?
>

 See my other email [1], this one is also problematic wrt SKF_LL_OFF.

[1] http://patchwork.ozlabs.org/patch/771946/
>>>
>>>
>>> Oh sorry I just find out the bpf_skb_load_bytes helper already can
>>> achieve
>>> the goal. There is no point to add my patch then. Thanks you for
>>> pointing
>>> it out and fixing it.
>>
>> If something now needs to be reverted, you need to send that revert to
>> me.
> 
> It's sitting here: http://patchwork.ozlabs.org/patch/771946/

I see that now, applied to net-next, thanks!


Re: [PATCH net-next] bpf: cgroup skb progs cannot access ld_abs/ind

2017-06-06 Thread David Miller
From: Daniel Borkmann 
Date: Tue,  6 Jun 2017 18:38:04 +0200

> Commit fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to
> access sk_buff") enabled programs of BPF_PROG_TYPE_CGROUP_SKB
> type to use ld_abs/ind instructions. However, at this point,
> we cannot use them, since offsets relative to SKF_LL_OFF will
> end up pointing skb_mac_header(skb) out of bounds since in the
> egress path it is not yet set at that point in time, but only
> after __dev_queue_xmit() did a general reset on the mac header.
> bpf_internal_load_pointer_neg_helper() will then end up reading
> data from a wrong offset.
> 
> BPF_PROG_TYPE_CGROUP_SKB programs can use bpf_skb_load_bytes()
> already to access packet data, which is also more flexible than
> the insns carried over from cBPF.
> 
> Fixes: fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to access sk_buff")
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Aha, I see, applied.

Thanks!


Re: [PATCH v4] net: don't call strlen on non-terminated string in dev_set_alias()

2017-06-06 Thread David Miller
From: Alexander Potapenko 
Date: Tue,  6 Jun 2017 15:56:54 +0200

> KMSAN reported a use of uninitialized memory in dev_set_alias(),
> which was caused by calling strlcpy() (which in turn called strlen())
> on the user-supplied non-terminated string.
> 
> Signed-off-by: Alexander Potapenko 

We should not be allowing non-NULL terminated strings for the
IFLA_IFALIAS attribute.  It's defined as type NLA_STRING in
the ifla_policy[] array.

Please figure out why we aren't enforcing the attribute policy
properly, rather than adding a workaround.

Thanks.


Re: [PATCH] net: stmmac: fix a broken u32 less than zero check

2017-06-06 Thread David Miller
From: Colin King 
Date: Tue,  6 Jun 2017 14:10:49 +0100

> From: Colin Ian King 
> 
> The check that queue is less or equal to zero is always true
> because queue is a u32; queue is decremented and will wrap around
> and never go -ve. Fix this by making queue an int.
> 
> Detected by CoverityScan, CID#1428988 ("Unsigned compared against 0")
> 
> Signed-off-by: Colin Ian King 

Applied, thanks Colin.


Re: [PATCH net-next v2 1/2] bpf: Allow CGROUP_SKB eBPF program to access sk_buff

2017-06-06 Thread Daniel Borkmann

On 06/06/2017 10:26 PM, David Miller wrote:

From: Chenbo Feng 
Date: Tue, 6 Jun 2017 13:24:11 -0700


On Tue, Jun 6, 2017 at 9:40 AM, Daniel Borkmann 
wrote:


On 06/06/2017 02:04 PM, Daniel Borkmann wrote:


On 06/01/2017 03:15 AM, Chenbo Feng wrote:


From: Chenbo Feng 

This allows cgroup eBPF program to classify packet based on their
protocol or other detail information. Currently program need
CAP_NET_ADMIN privilege to attach a cgroup eBPF program, and A
process with CAP_NET_ADMIN can already see all packets on the system,
for example, by creating an iptables rules that causes the packet to
be passed to userspace via NFLOG.

Signed-off-by: Chenbo Feng 



Sorry, but I am puzzled what above change log has to do with the
below diff?! Back then we decided not to add BPF_PROG_TYPE_CGROUP_SKB
to may_access_skb(), since one can already use bpf_skb_load_bytes()
helper to access pkt data, which is a much more flexible interface.
Mind to elaborate why you cannot use bpf_skb_load_bytes() instead?



See my other email [1], this one is also problematic wrt SKF_LL_OFF.

   [1] http://patchwork.ozlabs.org/patch/771946/



Oh sorry I just find out the bpf_skb_load_bytes helper already can achieve
the goal. There is no point to add my patch then. Thanks you for pointing
it out and fixing it.


If something now needs to be reverted, you need to send that revert to me.


It's sitting here: http://patchwork.ozlabs.org/patch/771946/


Re: [PATCH net-next v2 1/2] bpf: Allow CGROUP_SKB eBPF program to access sk_buff

2017-06-06 Thread David Miller
From: Chenbo Feng 
Date: Tue, 6 Jun 2017 13:24:11 -0700

> On Tue, Jun 6, 2017 at 9:40 AM, Daniel Borkmann 
> wrote:
> 
>> On 06/06/2017 02:04 PM, Daniel Borkmann wrote:
>>
>>> On 06/01/2017 03:15 AM, Chenbo Feng wrote:
>>>
 From: Chenbo Feng 

 This allows cgroup eBPF program to classify packet based on their
 protocol or other detail information. Currently program need
 CAP_NET_ADMIN privilege to attach a cgroup eBPF program, and A
 process with CAP_NET_ADMIN can already see all packets on the system,
 for example, by creating an iptables rules that causes the packet to
 be passed to userspace via NFLOG.

 Signed-off-by: Chenbo Feng 

>>>
>>> Sorry, but I am puzzled what above change log has to do with the
>>> below diff?! Back then we decided not to add BPF_PROG_TYPE_CGROUP_SKB
>>> to may_access_skb(), since one can already use bpf_skb_load_bytes()
>>> helper to access pkt data, which is a much more flexible interface.
>>> Mind to elaborate why you cannot use bpf_skb_load_bytes() instead?
>>>
>>
>> See my other email [1], this one is also problematic wrt SKF_LL_OFF.
>>
>>   [1] http://patchwork.ozlabs.org/patch/771946/
> 
> 
> Oh sorry I just find out the bpf_skb_load_bytes helper already can achieve
> the goal. There is no point to add my patch then. Thanks you for pointing
> it out and fixing it.

If something now needs to be reverted, you need to send that revert to me.

Thanks.


Re: [PATCH net] net: stmmac: fix completely hung TX when using TSO

2017-06-06 Thread David Miller
From: Niklas Cassel 
Date: Tue, 6 Jun 2017 09:25:00 +0200

> stmmac_tso_allocator can fail to set the Last Descriptor bit
> on a descriptor that actually was the last descriptor.
> 
> This happens when the buffer of the last descriptor ends
> up having a size of exactly TSO_MAX_BUFF_SIZE.
> 
> When the IP eventually reaches the next last descriptor,
> which actually has the bit set, the DMA will hang.
> 
> When the DMA hangs, we get a tx timeout, however,
> since stmmac does not do a complete reset of the IP
> in stmmac_tx_timeout, we end up in a state with
> completely hung TX.
> 
> Signed-off-by: Niklas Cassel 

Applied and queued up for -stable, thank you.


Re: [PATCH net-next] tun: use symmetric hash

2017-06-06 Thread David Miller
From: Jason Wang 
Date: Tue,  6 Jun 2017 14:09:49 +0800

> Tun actually expects a symmetric hash for queue selecting to work
> correctly, otherwise packets belongs to a single flow may be
> redirected to the wrong queue. So this patch switch to use
> __skb_get_hash_symmetric().
> 
> Signed-off-by: Jason Wang 

Applied.


Re: [PATCH] net: ethoc: enable NAPI before poll may be scheduled

2017-06-06 Thread David Miller
From: Max Filippov 
Date: Mon,  5 Jun 2017 18:31:16 -0700

> ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
> NAPI poll before NAPI is enabled in the ethoc_open, which results in
> device being unable to send or receive anything until it's closed and
> reopened. In case the device is flooded with ingress packets it may be
> unable to recover at all.
> Move napi_enable above ethoc_reset in the ethoc_open to fix that.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Max Filippov 

Applied and queued up for -stable.


Re: [PATCH v2 3/4] net: macb: macb.c changed to macb_main.c

2017-06-06 Thread Richard Cochran
On Tue, Jun 06, 2017 at 03:00:15PM -0400, David Miller wrote:
> He's adjusting the Makefile so that it build macb_main.c into macb.o

Duh, sorry, brain shutting down...

Thanks,
Richard


Re: [PATCH 1/2] hsr: fix coding style issues

2017-06-06 Thread David Miller

Please do not mix cleanups with legitimate bug fixes.  Also, when posting
a multi-patch series, you must always provide an appropriate "[PATCH 0/N]"
header posting that describes what you series is doing at a high level,
how it is doing it, and why it is doing it that way.

For this, submit the erroneous warning removal against 'net' as a single
patch.  And then once that propagates into the 'net-next' tree you can
submit the coding style cleanups against 'net-next', thanks.


Re: [PATCH net] net: bridge: fix a null pointer dereference in br_afspec

2017-06-06 Thread David Miller
From: Nikolay Aleksandrov 
Date: Tue,  6 Jun 2017 01:26:24 +0300

> We might call br_afspec() with p == NULL which is a valid use case if
> the action is on the bridge device itself, but the bridge tunnel code
> dereferences the p pointer without checking, so check if p is null
> first.
> 
> Reported-by: Gustavo A. R. Silva 
> Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
> Signed-off-by: Nikolay Aleksandrov 

Applied and queued up for -stable, thanks Nikolay.


Re: [PATCH net-next] net: dsa: mv88e6xxx: fix 6085 frame mode masking

2017-06-06 Thread David Miller
From: Vivien Didelot 
Date: Mon,  5 Jun 2017 18:17:16 -0400

> The register bits used for the frame mode were masked with DSA (0x1)
> instead of the mask value (0x3) in the 6085 implementation of
> port_set_frame_mode. Fix this.
> 
> Fixes: 56995cbc3540 ("net: dsa: mv88e6xxx: Refactor CPU and DSA port setup")
> Signed-off-by: Vivien Didelot 

Applied, thanks Vivien.


Re: [PATCH] ravb: Fix use-after-free on `ifconfig eth0 down`

2017-06-06 Thread David Miller
From: Eugeniu Rosca 
Date: Tue, 6 Jun 2017 00:08:10 +0200

> Commit a47b70ea86bd ("ravb: unmap descriptors when freeing rings") has
> introduced the issue seen in [1] reproduced on H3ULCB board.
> 
> Fix this by relocating the RX skb ringbuffer free operation, so that
> swiotlb page unmapping can be done first. Freeing of aligned TX buffers
> is not relevant to the issue seen in [1]. Still, reposition TX free
> calls as well, to have all kfree() operations performed consistently
> _after_ dma_unmap_*()/dma_free_*().
> 
> [1] Console screenshot with the problem reproduced:
> 
> salvator-x login: root
> root@salvator-x:~# ifconfig eth0 up
> Micrel KSZ9031 Gigabit PHY e680.ethernet-:00: \
>attached PHY driver [Micrel KSZ9031 Gigabit PHY]   \
>(mii_bus:phy_addr=e680.ethernet-:00, irq=235)
> IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> root@salvator-x:~#
> root@salvator-x:~# ifconfig eth0 down
> ==
> BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
...
> ==
> Disabling lock debugging due to kernel taint
> root@salvator-x:~#
> 
> Fixes: a47b70ea86bd ("ravb: unmap descriptors when freeing rings")
> Signed-off-by: Eugeniu Rosca 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next 00/16] nfp: ctrl vNIC

2017-06-06 Thread Jakub Kicinski
On Tue, 6 Jun 2017 11:17:57 +0200, Jiri Pirko wrote:
> >> >What were your plans with pre-netdev config?
> >> 
> >> We need to pass come initial resource division. Generally the consensus
> >> is to have these options exposed through devlink, let the user configure
> >> them all and then to have a trigger that would cause driver
> >> re-orchestration according to the new values. The flow would look like
> >> this:
> >> 
> >> -driver loads with defaults, inits hw and instantiates netdevs
> >> -driver exposes config options via devlink
> >> -user sets up the options
> >> -user pushes the "go" trigger
> >> -upon the trigger command, devlink calls the driver re-init callback
> >> -driver shuts down the current instances, re-initializes hw,
> >>  re-instantiates the netdevs
> >> 
> >> Makes sense?  
> >
> >I like the idea of a "go"/apply/reload trigger and extending devlink.
> >Do you plan on adding a way to persist the settings?  I'm concerned NIC
> >users may want to boot into the right mode once it's set, without
> >reloads and reconfigs upon boot.  Also is there going to be a way to
> >query the pending/running config?  Sounds like we may want to expose
> >three value sets - persistent/default, running and pending/to be
> >applied.  
> 
> I don't think it is a good idea to introduce any kind of configuration
> persistency in HW. I believe that user is the master and he has all
> needed info. He can store it persistently, but it is up to him. 
> 
> So basicaly during boot, we need the devlink configuration to happen
> early on, before the netdevices get configured. udev? Not sure how
> exactly to do this. Have to ask around :)

Happy to hear that.  Now there is two of us, I'll try again with the
marketing dept :)


Re: [PATCH net 3/3] netvsc: fix RCU warning from set_multicast

2017-06-06 Thread David Miller
From: Stephen Hemminger 
Date: Mon,  5 Jun 2017 14:10:10 -0700

> + nvdev = rtnl_dereference(ndevctx->nvdev);
> + if (nvdev)
> + rdev = nvdev->extension;
> +
> + if (rdev) {
> + if (ndev->flags & IFF_PROMISC)
> + rndis_filter_set_packet_filter(rdev,
> +
> NDIS_PACKET_TYPE_PROMISCUOUS);
> + else
> + rndis_filter_set_packet_filter(rdev,
> +
> NDIS_PACKET_TYPE_BROADCAST |
> +
> NDIS_PACKET_TYPE_ALL_MULTICAST |
> +
> NDIS_PACKET_TYPE_DIRECTED);
> + }
> + }

Stephen, please at least compile test your code.

This is getting rediculous.



Re: [PATCH v2] arm: eBPF JIT compiler

2017-06-06 Thread Shubham Bansal
Hi Russell, Alexei, David, Daniel, kees,

Any update on this patch moving forward?
Best,
Shubham Bansal


On Wed, May 31, 2017 at 12:49 AM, Kees Cook  wrote:
> Forwarding this to net-dev and eBPF folks, who weren't on CC...
>
> -Kees
>
> On Thu, May 25, 2017 at 4:13 PM, Shubham Bansal
>  wrote:
>> The JIT compiler emits ARM 32 bit instructions. Currently, It supports
>> eBPF only. Classic BPF is supported because of the conversion by BPF
>> core.
>>
>> This patch is essentially changing the current implementation of JIT
>> compiler of Berkeley Packet Filter from classic to internal with almost
>> all instructions from eBPF ISA supported except the following
>> BPF_ALU64 | BPF_DIV | BPF_K
>> BPF_ALU64 | BPF_DIV | BPF_X
>> BPF_ALU64 | BPF_MOD | BPF_K
>> BPF_ALU64 | BPF_MOD | BPF_X
>> BPF_STX | BPF_XADD | BPF_W
>> BPF_STX | BPF_XADD | BPF_DW
>> BPF_JMP | BPF_CALL
>>
>> Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
>> ARM because of deficiency of general purpose registers on ARM. Currently,
>> only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.
>>
>> Tested on ARMv7 with QEMU by me (Shubham Bansal).
>> Tested on ARMv5 by Andrew Lunn (and...@lunn.ch).
>> Expected to work on ARMv6 as well, as its a part ARMv7 and part ARMv5.
>> Although, a proper testing is not done for ARMv6.
>>
>> Both of these testing are done with and without CONFIG_FRAME_POINTER
>> separately for LITTLE ENDIAN machine.
>>
>> For testing:
>>
>> 1. JIT is enabled with
>> echo 1 > /proc/sys/net/core/bpf_jit_enable
>> 2. Constant Blinding can be enabled along with JIT using
>> echo 1 > /proc/sys/net/core/bpf_jit_enable
>> echo 2 > /proc/sys/net/core/bpf_jit_harden
>>
>> See Documentation/networking/filter.txt for more information.
>>
>> Result : test_bpf: Summary: 314 PASSED, 0 FAILED, [278/306 JIT'ed]
>>
>> Signed-off-by: Shubham Bansal 
>> ---
>>  Documentation/networking/filter.txt |4 +-
>>  arch/arm/Kconfig|2 +-
>>  arch/arm/net/bpf_jit_32.c   | 2404 
>> ---
>>  arch/arm/net/bpf_jit_32.h   |  108 +-
>>  4 files changed, 1713 insertions(+), 805 deletions(-)
>>
>> diff --git a/Documentation/networking/filter.txt 
>> b/Documentation/networking/filter.txt
>> index b69b205..01165ac 100644
>> --- a/Documentation/networking/filter.txt
>> +++ b/Documentation/networking/filter.txt
>> @@ -596,8 +596,8 @@ skb pointer). All constraints and restrictions from 
>> bpf_check_classic() apply
>>  before a conversion to the new layout is being done behind the scenes!
>>
>>  Currently, the classic BPF format is being used for JITing on most 32-bit
>> -architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64 perform 
>> JIT
>> -compilation from eBPF instruction set.
>> +architectures, whereas x86-64, aarch64, arm, s390x, powerpc64, sparc64 
>> perform
>> +JIT compilation from eBPF instruction set.
>>
>>  Some core changes of the new internal format:
>>
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index 8a7ab5e..13ade46 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -47,7 +47,7 @@ config ARM
>> select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
>> select HAVE_ARCH_TRACEHOOK
>> select HAVE_ARM_SMCCC if CPU_V7
>> -   select HAVE_CBPF_JIT
>> +   select HAVE_EBPF_JIT
>> select HAVE_CC_STACKPROTECTOR
>> select HAVE_CONTEXT_TRACKING
>> select HAVE_C_RECORDMCOUNT
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 93d0b6d..c7476e5 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -1,13 +1,15 @@
>>  /*
>> - * Just-In-Time compiler for BPF filters on 32bit ARM
>> + * Just-In-Time compiler for eBPF filters on 32bit ARM
>>   *
>>   * Copyright (c) 2011 Mircea Gherzan 
>> + * Copyright (c) 2017 Shubham Bansal 
>>   *
>>   * This program is free software; you can redistribute it and/or modify it
>>   * under the terms of the GNU General Public License as published by the
>>   * Free Software Foundation; version 2 of the License.
>>   */
>>
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -23,44 +25,91 @@
>>
>>  #include "bpf_jit_32.h"
>>
>> +int bpf_jit_enable __read_mostly;
>> +
>> +#define STACK_OFFSET(k)(k)
>> +#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
>> +#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
>> +#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
>> +
>> +/* Flags used for JIT optimization */
>> +#define SEEN_CALL  (1 << 0)
>> +
>> +#define FLAG_IMM_OVERFLOW  (1 << 0)
>> +
>>  /*
>> - * ABI:
>> + * Map eBPF registers to ARM 32bit registers or stack scratch space.
>> + *
>> + * 1. First 

Re: [PATCH v3 net-next 0/8] Introduce bpf ID

2017-06-06 Thread David Miller
From: Martin KaFai Lau 
Date: Mon, 5 Jun 2017 12:15:45 -0700

> This patch series:
> 1) Introduce ID for both bpf_prog and bpf_map.
> 2) Add bpf commands to iterate the prog IDs and map
>IDs of the system.
> 3) Add bpf commands to get a prog/map fd from an ID
> 4) Add bpf command to get prog/map info from a fd.
>The prog/map info is a jump start in this patchset
>and it is not meant to be a complete list.  They can
>be extended in the future patches.

Looks good, series applied, thanks Martin.


Re: [PATCH net-next 0/5] net: dsa: Multi-CPU ground work

2017-06-06 Thread Florian Fainelli
On 06/06/2017 11:25 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> This patch series prepares the ground for adding mutliple CPU port support to
> 
>multiple
>
>> DSA, and starts by removing redundant pieces of information such as
>> master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from
> 
>  cpu_dp->netdev
>  
>> directly accessing ds->dst->cpu_dp and use appropriate helper functions.
>>
>> Note that if you have Device Tree blobs/platform configurations that are
>> currently listing multiple CPU ports, the proposed behavior in
>> dsa_ds_get_cpu_dp() will be to return the last bit set in ds->cpu_port_mask.
>>
>> Future plans include:
>> - making dst->cpu_dp a flexible data structure (array, list, you name it)
>> - having the ability for drivers to return a default/preferred CPU port (if
>>   necessary)
> 
> The overall patchset looks good. I have questions for future work
> though.
> 
> I am still not sure that we need CPU port references in
> dsa_switch_tree. When device tree or pdata is parsed, we have allocated
> dsa_switch and dsa_port structures. We should be able validate and
> assign all ds->ports[x].cpu_dp, before setting up the switches and
> creating the slave devices. What do you think?

True, we should be able to do that, thanks for the suggestion.

> 
> Also I see dsa_ptr becoming a pointer to the assosicated dsa_port, and
> dsa_port should contain the tagging ops for quick access. That is more
> rigourous with the physical representation and much easier for
> transparent multi-CPU port support.

Ultimately, I agree we should probably have dev->dsa_ptr be the actual
CPU port within the switch, and from the switch be able to go to the
collection of switches (dst). This should indeed be a bit more optimized
as there should be less traversal of structures in such a case.

Thanks!
-- 
Florian


[PATCH net-next] tcp: add TCPMemoryPressuresChrono counter

2017-06-06 Thread Eric Dumazet
From: Eric Dumazet 

DRAM supply shortage and poor memory pressure tracking in TCP
stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
limits) and tcp_mem[] quite hazardous.

TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
limits being hit, but only tracking number of transitions.

If TCP stack behavior under stress was perfect :
1) It would maintain memory usage close to the limit.
2) Memory pressure state would be entered for short times.

We certainly prefer 100 events lasting 10ms compared to one event
lasting 200 seconds.

This patch adds a new SNMP counter tracking cumulative duration of
memory pressure events, given in ms units.

$ cat /proc/sys/net/ipv4/tcp_mem
308841176176
$ grep TCP /proc/net/sockstat
TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
$ nstat -n ; sleep 10 ; nstat |grep Pressure
TcpExtTCPMemoryPressures1700
TcpExtTCPMemoryPressuresChrono  5209

Signed-off-by: Eric Dumazet 
---
 include/net/sock.h|   22 ++
 include/net/tcp.h |3 ++-
 include/uapi/linux/snmp.h |1 +
 net/core/sock.c   |   20 
 net/decnet/af_decnet.c|2 +-
 net/ipv4/proc.c   |1 +
 net/ipv4/tcp.c|   27 +++
 net/ipv4/tcp_ipv4.c   |1 +
 net/ipv6/tcp_ipv6.c   |1 +
 net/sctp/socket.c |2 +-
 10 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 
3467d9e89e7dba1c35fa44a6268a28735f795319..858891c36f94ad2577726d6d21cf871dbcd55d98
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1080,6 +1080,7 @@ struct proto {
bool(*stream_memory_free)(const struct sock *sk);
/* Memory pressure */
void(*enter_memory_pressure)(struct sock *sk);
+   void(*leave_memory_pressure)(struct sock *sk);
atomic_long_t   *memory_allocated;  /* Current allocated 
memory. */
struct percpu_counter   *sockets_allocated; /* Current number of 
sockets. */
/*
@@ -1088,7 +1089,7 @@ struct proto {
 * All the __sk_mem_schedule() is of this nature: accounting
 * is strict, actions are advisory and have some latency.
 */
-   int *memory_pressure;
+   unsigned long   *memory_pressure;
long*sysctl_mem;
int *sysctl_wmem;
int *sysctl_rmem;
@@ -1193,25 +1194,6 @@ static inline bool sk_under_memory_pressure(const struct 
sock *sk)
return !!*sk->sk_prot->memory_pressure;
 }
 
-static inline void sk_leave_memory_pressure(struct sock *sk)
-{
-   int *memory_pressure = sk->sk_prot->memory_pressure;
-
-   if (!memory_pressure)
-   return;
-
-   if (*memory_pressure)
-   *memory_pressure = 0;
-}
-
-static inline void sk_enter_memory_pressure(struct sock *sk)
-{
-   if (!sk->sk_prot->enter_memory_pressure)
-   return;
-
-   sk->sk_prot->enter_memory_pressure(sk);
-}
-
 static inline long
 sk_memory_allocated(const struct sock *sk)
 {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
82462db97183abebb33628eb5e04a5c5f04ea873..03482a5a069a18c776bd2071f0d74c8e56c0bed2
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -279,7 +279,7 @@ extern int sysctl_tcp_pacing_ca_ratio;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
-extern int tcp_memory_pressure;
+extern unsigned long tcp_memory_pressure;
 
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
@@ -1322,6 +1322,7 @@ extern void tcp_openreq_init_rwin(struct request_sock 
*req,
  const struct dst_entry *dst);
 
 void tcp_enter_memory_pressure(struct sock *sk);
+void tcp_leave_memory_pressure(struct sock *sk);
 
 static inline int keepalive_intvl_when(const struct tcp_sock *tp)
 {
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 
95cffcb21dfdba7c974706131d0f43e21435e82d..d8569329579816213255169d0c183f4400835f7b
 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -228,6 +228,7 @@ enum
LINUX_MIB_TCPABORTONLINGER, /* TCPAbortOnLinger */
LINUX_MIB_TCPABORTFAILED,   /* TCPAbortFailed */
LINUX_MIB_TCPMEMORYPRESSURES,   /* TCPMemoryPressures */
+   LINUX_MIB_TCPMEMORYPRESSURESCHRONO, /* TCPMemoryPressuresChrono */
LINUX_MIB_TCPSACKDISCARD,   /* TCPSACKDiscard */
LINUX_MIB_TCPDSACKIGNOREDOLD,   /* TCPSACKIgnoredOld */
LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */
diff --git a/net/core/sock.c b/net/core/sock.c
index 

Re: [PATCH] net/ipv6: Fix CALIPSO causing GPF with datagram support

2017-06-06 Thread David Miller
From: Richard Haines 
Date: Mon,  5 Jun 2017 16:44:40 +0100

> When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
> IP header may have moved.
> 
> Also update the payload length after adding the CALIPSO option.
> 
> Signed-off-by: Richard Haines 

Applied and queued up for -stable, thank you Richard.


Re: [PATCH] cxgb4: implement ndo_set_vf_rate()

2017-06-06 Thread David Miller
From: Ganesh Goudar 
Date: Mon,  5 Jun 2017 18:34:20 +0530

> Implement ndo_set_vf_rate() for mgmt interface to support rate-limiting
> of VF traffic using 'ip' command.
> 
> Based on the original work of Kumar Sanghvi 
> 
> Signed-off-by: Ganesh Goudar 

Applied to net-next, thanks Ganesh.


Re: [PATCH][V2] net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

2017-06-06 Thread David Miller
From: Colin King 
Date: Mon,  5 Jun 2017 10:04:52 +0100

> From: Colin Ian King 
> 
> The current comparison of entry < 0 will never be true since entry is an
> unsigned integer. Make entry an int to ensure -ve error return values
> from the call to jumbo_frm are correctly being caught.
> 
> Detected by CoverityScan, CID#1238760 ("Macro compares unsigned to 0")
> 
> Signed-off-by: Colin Ian King 

I know Julia asked for more comments, but I'm going to apply this
as-is for now as it is correct.

Thanks Colin.


Re: [PATCH] ppp: mppe: Use vsnprintf extension %phN

2017-06-06 Thread David Miller
From: Joe Perches 
Date: Mon,  5 Jun 2017 05:22:50 -0700

> Using this extension reduces the object size.
> 
> $ size drivers/net/ppp/ppp_mppe.o*
>text  data bss dec hex filename
>5683   216   859071713 drivers/net/ppp/ppp_mppe.o.new
>5808   216   860321790 drivers/net/ppp/ppp_mppe.o.old
> 
> Signed-off-by: Joe Perches 

Applied to net-next, thanks Joe.


Fw: [Bug 195969] New: ipsec icmp and udp works, tcp doesn't work

2017-06-06 Thread Stephen Hemminger


Begin forwarded message:

Date: Sat, 03 Jun 2017 06:25:05 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195969] New: ipsec icmp and udp works, tcp doesn't work


https://bugzilla.kernel.org/show_bug.cgi?id=195969

Bug ID: 195969
   Summary: ipsec icmp and udp works, tcp doesn't work
   Product: Networking
   Version: 2.5
Kernel Version: 4.11.3-1-ARCH
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: d...@djagoo.io
Regression: No

A few days ago I updated to 4.11.3-1-ARCH. After that my VPN access to our
corporate network was broken.

The connection is established and I can use UDP (i.e. DNS) and ICMP. All TCP
connections I tried (ssh, smb, http...) failed.

On the AUR page "MartinDiehl commented on 2017-05-25 19:57" the same error. 

https://aur.archlinux.org/packages/strongswan/

And I found a bug report on redhat bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=1458222

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH v1] net: phy: Delete unused function phy_ethtool_gset

2017-06-06 Thread David Miller
From: Yuval Shaia 
Date: Mon,  5 Jun 2017 10:18:40 +0300

> It's unused, so remove it.
> 
> Signed-off-by: Yuval Shaia 
> ---
> v0 -> v1:
>   * Add commit message
>   * Update Documentation/networking/phy.txt
>   * Modify commit header message

Applied to net-next, thanks.


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 6 Jun 2017 11:55:33 -0700

> If in the future mlx will make it into the nic in a way that
> encryption shares all memory management logic and there is no fpga
> at all then it indeed will be similar to tc offload. Right now it's
> not and needs different sw architecture.

If the visible effect is identical, I fundamentally disagree with you.

I don't care if there is a frog sitting on the PHY that transforms
the packets, it's all the same if the visible behavior is identical.


Re: [PATCH v2 3/4] net: macb: macb.c changed to macb_main.c

2017-06-06 Thread David Miller
From: Richard Cochran 
Date: Tue, 6 Jun 2017 20:39:33 +0200

> On Fri, Jun 02, 2017 at 03:27:41PM +0100, Rafal Ozieblo wrote:
>>  drivers/net/ethernet/cadence/macb.c  | 3568 
>> --
>>  drivers/net/ethernet/cadence/macb_main.c | 3568 
>> ++
> 
> You deleted macb.c, and so ...

Rename macb.c to macb_main.c

>> +macb-y  := macb_main.o
>>  
>>  obj-$(CONFIG_MACB) += macb.o
> 
> how does this rule make sense any more?

He's adjusting the Makefile so that it build macb_main.c into macb.o


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Alexei Starovoitov
On Tue, Jun 06, 2017 at 02:38:24PM -0400, David Miller wrote:
> From: Alexei Starovoitov 
> Date: Tue, 6 Jun 2017 11:34:59 -0700
> 
> > fpga is a separate device with its own phy and mac layers, its
> > own queues, packet parsing and rdma logic.
> 
> Because that's how they bolted it onto the ASIC in current
> implementation, it might not always be that way and be fully
> integrated in the future.
> 
> And I stress the word "implementation" as in "implementation detail"
> the visible behavior is going to be the same, the difference is how
> the thing is hooked up and maybe how you program it.

whether fpga is a separate chip or part of the same asic makes no difference.
They are still different devices from sw point of view.
If in the future mlx will make it into the nic in a way that encryption shares
all memory management logic and there is no fpga at all then it indeed will
be similar to tc offload. Right now it's not and needs different sw 
architecture.



Re: More BPF verifier questions

2017-06-06 Thread Edward Cree
On 05/06/17 08:06, Y Song wrote:
> On Fri, Jun 2, 2017 at 7:42 AM, Edward Cree  wrote:
>> Test "helper access to variable memory: stack, bitwise AND + JMP, correct
>>  bounds" is listed as expected to pass, but it passes zero in the 'size'
>>  argument, an ARG_CONST_SIZE, to bpf_probe_read; I believe this should fail
>>  (and with my WIP patch it does).
> Probably a typo or mis-statement. "size" is not passed in with "zero", but
> with an unknown value. Hence, it probably should fail.
>
>   BPF_MOV64_IMM(BPF_REG_2, 16),
>   BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
>   BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
>   BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
>   BPF_MOV64_IMM(BPF_REG_4, 0),
>   BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
>   BPF_MOV64_IMM(BPF_REG_3, 0),
>   BPF_EMIT_CALL(BPF_FUNC_probe_read),
So, in fact this unknown value is really 16 & 64 == 0, but the verifier doesn't
 know that and concludes that it's either 0 or 64 (after the AND).  But then
 what I didn't spot before, and now have, is that the BPF_JGE tests if 0 >= 
size.
 Since we're in the false branch, that means size > 0, and so we're fine.
The test case is correct, and now that I've fixed the min/max tracking in my
 patches, the verifier accepts it again.

-Ed


Re: [PATCH v2 3/4] net: macb: macb.c changed to macb_main.c

2017-06-06 Thread Richard Cochran
On Fri, Jun 02, 2017 at 03:27:41PM +0100, Rafal Ozieblo wrote:
>  drivers/net/ethernet/cadence/macb.c  | 3568 
> --
>  drivers/net/ethernet/cadence/macb_main.c | 3568 
> ++

You deleted macb.c, and so ...

> diff --git a/drivers/net/ethernet/cadence/Makefile 
> b/drivers/net/ethernet/cadence/Makefile
> index 4ba7559..31ea6e3 100644
> --- a/drivers/net/ethernet/cadence/Makefile
> +++ b/drivers/net/ethernet/cadence/Makefile
> @@ -1,6 +1,7 @@
>  #
>  # Makefile for the Atmel network device drivers.
>  #
> +macb-y   := macb_main.o
>  
>  obj-$(CONFIG_MACB) += macb.o

how does this rule make sense any more?

Thanks,
Richard


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 6 Jun 2017 11:34:59 -0700

> fpga is a separate device with its own phy and mac layers, its
> own queues, packet parsing and rdma logic.

Because that's how they bolted it onto the ASIC in current
implementation, it might not always be that way and be fully
integrated in the future.

And I stress the word "implementation" as in "implementation detail"
the visible behavior is going to be the same, the difference is how
the thing is hooked up and maybe how you program it.


Re: [PATCH v2 4/4] net: macb: Add hardware PTP support

2017-06-06 Thread Richard Cochran
On Tue, Jun 06, 2017 at 08:54:50AM +, Rafal Ozieblo wrote:
> Would "ENOTSUP" be sufficient ?

You mean EOPNOTSUPP?  Yes, sounds ok to me.  EFAULT is surely wrong.

Thanks,
Richard


Re: [PATCH net-next 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> On 06/06/2017 11:09 AM, Vivien Didelot wrote:
>> Florian Fainelli  writes:
>> 
>>> -   phy_mode = of_get_phy_mode(ds->dst->cpu_dp->dn);
>>> +   phy_mode = of_get_phy_mode(ds->ports[QCA8K_CPU_PORT].dn);
>> 
>> Is it necessary to use QCA8K_CPU_PORT?
>> 
>>> +static inline struct dsa_port *dsa_ds_get_cpu_dp(struct dsa_switch *ds)
>>> +{
>>> +   return >ports[fls(ds->cpu_port_mask) - 1];
>>> +}
>> 
>> Wouldn't it be better to return the CPU port for a given port?
>> Something like return ds->ports[port].cpu_dp, so that we ease the
>> introduction of multiple CPU port a bit more?
>
> ds->ports[port].cpu_dp only gets assigned at dsa_slave_create() time,
> which is after ops->setup() has been called, hence this helper function
> in case you need it earlier (e.g: like mv88e6060).

I see no reason why we cannot assign ds->ports[x].cpu_dp before calling
ops->setup().

Even though it can be changed later, the DSA core can assign a dedicated
CPU port to each ports and only after that, call into drivers ops.

To me the DSA topology (dsa_ports, CPU, etc.) should be determined
before calling into the hardware and exposing slave interfaces to the
user. Otherwise it is confusing. Am I wrong?

Thanks,

Vivien


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Alexei Starovoitov
On Tue, Jun 06, 2017 at 01:47:26PM -0400, David Miller wrote:
> From: Alexei Starovoitov 
> Date: Tue, 6 Jun 2017 10:42:35 -0700
> 
> > so it's like rdma, but without using kernel rdma stack?
> 
> No sockets here, just transformation rules.  It's like offloading
> a complex TC rule to hardware version of that transformation.
> 
> Yes, there is state, but I argue that it is no different than TC
> offloading rules.  What if TC had "hash" and "crypt" operations
> and we attached them to appropriate u32 matches?  You wouldn't
> be able to tell the difference.

there is huge difference in underlying hw.
fpga is a separate device with its own phy and mac layers, its
own queues, packet parsing and rdma logic.
Where as tc offload is happening within the same hw queues/memory/stats
management logic. My understanding that when I do 'ethtool -L' to
change number of queues or 'ethtool -G' it changes the memory layout
that tc offload is operating on as well.
When I do 'ethtool -S' it shows me the stats for the device
that tc offload rules are integral part of.
Whereas fpga is a different physical device with its own
buffers and such. We can add 'ethtool -G_fpga, -L_fpga', etc
but this type of discussion needs to happen _before_ the whole
thing is merged. It will never happen after the fact.
Just look at mlx responses, they still don't acknowledge the issue
and instead pushing for ipsec, tls (in other words: new features)
instead of addressing production issues that are obviously
not glamorous to work on and fix.

> I think you are way over-obsessed with this FPGA offload thing,
> quite frankly.

if we didn't have issues with eswitch that drops packets and
we don't even know how many, I wouldn't be complaining.
There is a discussion going on to add few counters for
eswitch visibility, but it's taking forever and it's not at
the point of exposing eswitch as a kernel object.
Why? because it's hard to refactor it now into something like devlink
or whatever new abstraction that would be needed.



Re: [PATCH v2 4/4] net: macb: Add hardware PTP support

2017-06-06 Thread Richard Cochran
On Fri, Jun 02, 2017 at 03:28:10PM +0100, Rafal Ozieblo wrote:
> +static s32 gem_get_ptp_max_adj(void)
> +{
> + return 64E6;
> +}

This is a floating point constant.  Please use integer instead.

> +
> +static int gem_get_ts_info(struct net_device *dev,
> +struct ethtool_ts_info *info)
> +{
> + struct macb *bp = netdev_priv(dev);
> +
> + ethtool_op_get_ts_info(dev, info);

This default is misguided.

> + if ((bp->hw_dma_cap & HW_DMA_CAP_PTP) == 0)
> + return 0;

Try this: 

if ((bp->hw_dma_cap & HW_DMA_CAP_PTP) == 0) {
ethtool_op_get_ts_info(dev, info);
return 0;
}

> + info->so_timestamping =
> + SOF_TIMESTAMPING_TX_SOFTWARE |
> + SOF_TIMESTAMPING_RX_SOFTWARE |
> + SOF_TIMESTAMPING_SOFTWARE |
> + SOF_TIMESTAMPING_TX_HARDWARE |
> + SOF_TIMESTAMPING_RX_HARDWARE |
> + SOF_TIMESTAMPING_RAW_HARDWARE;
> + info->tx_types =
> + (1 << HWTSTAMP_TX_ONESTEP_SYNC) |
> + (1 << HWTSTAMP_TX_OFF) |
> + (1 << HWTSTAMP_TX_ON);
> + info->rx_filters =
> + (1 << HWTSTAMP_FILTER_NONE) |
> + (1 << HWTSTAMP_FILTER_ALL);
> + info->phc_index = -1;
> +
> + if (bp->ptp_clock)
> + info->phc_index = ptp_clock_index(bp->ptp_clock);

Like this please:

info->phc_index = bp->ptp_clock ? ptp_clock_index(bp->ptp_clock) : -1;

> +
> + return 0;
> +}
> +
> +static struct macb_ptp_info gem_ptp_info = {
> + .ptp_init= gem_ptp_init,
> + .ptp_remove  = gem_ptp_remove,
> + .get_ptp_max_adj = gem_get_ptp_max_adj,
> + .get_tsu_rate= gem_get_tsu_rate,
> + .get_ts_info = gem_get_ts_info,
> + .get_hwtst   = gem_get_hwtst,
> + .set_hwtst   = gem_set_hwtst,
> +};
> +#endif
> +
>  static int macb_get_ts_info(struct net_device *netdev,
>   struct ethtool_ts_info *info)
>  {
> @@ -2636,12 +2707,16 @@ static void macb_configure_caps(struct macb *bp,
>   dcfg = gem_readl(bp, DCFG2);
>   if ((dcfg & (GEM_BIT(RX_PKT_BUFF) | GEM_BIT(TX_PKT_BUFF))) == 0)
>   bp->caps |= MACB_CAPS_FIFO_MODE;
> - if (IS_ENABLED(CONFIG_MACB_USE_HWSTAMP) && gem_has_ptp(bp)) {
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + if (gem_has_ptp(bp)) {
>   if (!GEM_BFEXT(TSU, gem_readl(bp, DCFG5)))
>   pr_err("GEM doesn't support hardware ptp.\n");
> - else
> + else {
>   bp->hw_dma_cap |= HW_DMA_CAP_PTP;
> + bp->ptp_info = _ptp_info;
> + }
>   }
> +#endif
>   }
>  
>   dev_dbg(>pdev->dev, "Cadence caps 0x%08x\n", bp->caps);
> @@ -3247,7 +3322,9 @@ static const struct macb_config np4_config = {
>  };
>  
>  static const struct macb_config zynqmp_config = {
> - .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
> + .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE |
> + MACB_CAPS_JUMBO |
> + MACB_CAPS_GEM_HAS_PTP,
>   .dma_burst_length = 16,
>   .clk_init = macb_clk_init,
>   .init = macb_init,
> @@ -3281,7 +3358,9 @@ MODULE_DEVICE_TABLE(of, macb_dt_ids);
>  #endif /* CONFIG_OF */
>  
>  static const struct macb_config default_gem_config = {
> - .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
> + .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE |
> + MACB_CAPS_JUMBO |
> + MACB_CAPS_GEM_HAS_PTP,
>   .dma_burst_length = 16,
>   .clk_init = macb_clk_init,
>   .init = macb_init,

> diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
> b/drivers/net/ethernet/cadence/macb_ptp.c
> new file mode 100755
> index 000..d536970
> --- /dev/null
> +++ b/drivers/net/ethernet/cadence/macb_ptp.c
> @@ -0,0 +1,512 @@
> +/**
> + * 1588 PTP support for Cadence GEM device.
> + *
> + * Copyright (C) 2017 Cadence Design Systems - http://www.cadence.com
> + *
> + * Authors: Rafal Ozieblo 
> + *  Bartosz Folta 
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2  of
> + * the License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Re: [PATCH net-next 0/5] net: dsa: Multi-CPU ground work

2017-06-06 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> This patch series prepares the ground for adding mutliple CPU port support to

   multiple
   
> DSA, and starts by removing redundant pieces of information such as
> master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from

 cpu_dp->netdev
 
> directly accessing ds->dst->cpu_dp and use appropriate helper functions.
>
> Note that if you have Device Tree blobs/platform configurations that are
> currently listing multiple CPU ports, the proposed behavior in
> dsa_ds_get_cpu_dp() will be to return the last bit set in ds->cpu_port_mask.
>
> Future plans include:
> - making dst->cpu_dp a flexible data structure (array, list, you name it)
> - having the ability for drivers to return a default/preferred CPU port (if
>   necessary)

The overall patchset looks good. I have questions for future work
though.

I am still not sure that we need CPU port references in
dsa_switch_tree. When device tree or pdata is parsed, we have allocated
dsa_switch and dsa_port structures. We should be able validate and
assign all ds->ports[x].cpu_dp, before setting up the switches and
creating the slave devices. What do you think?

Also I see dsa_ptr becoming a pointer to the assosicated dsa_port, and
dsa_port should contain the tagging ops for quick access. That is more
rigourous with the physical representation and much easier for
transparent multi-CPU port support.

Thanks,

Vivien


Re: [PATCH net-next 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Florian Fainelli
On 06/06/2017 11:09 AM, Vivien Didelot wrote:
> Florian Fainelli  writes:
> 
>> -phy_mode = of_get_phy_mode(ds->dst->cpu_dp->dn);
>> +phy_mode = of_get_phy_mode(ds->ports[QCA8K_CPU_PORT].dn);
> 
> Is it necessary to use QCA8K_CPU_PORT?
> 
>> +static inline struct dsa_port *dsa_ds_get_cpu_dp(struct dsa_switch *ds)
>> +{
>> +return >ports[fls(ds->cpu_port_mask) - 1];
>> +}
> 
> Wouldn't it be better to return the CPU port for a given port?
> Something like return ds->ports[port].cpu_dp, so that we ease the
> introduction of multiple CPU port a bit more?

ds->ports[port].cpu_dp only gets assigned at dsa_slave_create() time,
which is after ops->setup() has been called, hence this helper function
in case you need it earlier (e.g: like mv88e6060).
-- 
Florian


Re: [PATCH net-next 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> - phy_mode = of_get_phy_mode(ds->dst->cpu_dp->dn);
> + phy_mode = of_get_phy_mode(ds->ports[QCA8K_CPU_PORT].dn);

Is it necessary to use QCA8K_CPU_PORT?

> +static inline struct dsa_port *dsa_ds_get_cpu_dp(struct dsa_switch *ds)
> +{
> + return >ports[fls(ds->cpu_port_mask) - 1];
> +}

Wouldn't it be better to return the CPU port for a given port?
Something like return ds->ports[port].cpu_dp, so that we ease the
introduction of multiple CPU port a bit more?

Thanks,

Vivien



Re: [PATCH net-next 1/5] net: dsa: Remove master_netdev and use dst->cpu_dp->netdev

2017-06-06 Thread Florian Fainelli
On 06/06/2017 10:24 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> -struct net_device *p = ds->dst[ds->index].master_netdev;
>> +struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
> 
>ds->dst->cpu_dp->netdev
> 
> ds->dst is not an array anymore, lucky sf2 switch index is always 0 ;-)
> 
>>  struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>>  struct ethtool_wolinfo pwol;
>>  
>> @@ -829,7 +829,7 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, 
>> int port,
>>  static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
>>struct ethtool_wolinfo *wol)
>>  {
>> -struct net_device *p = ds->dst[ds->index].master_netdev;
>> +struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;
> 
> same here.

It's changed in patch 5, so I did not bother doing an intermediate
change considering we ditch this eventually.

Thanks!

> 
> Thanks,
> 
> Vivien
> 


-- 
Florian


Re: [PATCH net-next 4/5] net: dsa: Introduce dsa_dst_get_cpu_dp()

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> Introduce a helper function which will return a reference to the CPU
> port used in a dsa_switch_tree. Right now this is a singleton, but this
> will change once we introduce multi-CPU port support, so ease the
> transition by converting the affected code paths.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 6 Jun 2017 10:42:35 -0700

> so it's like rdma, but without using kernel rdma stack?

No sockets here, just transformation rules.  It's like offloading
a complex TC rule to hardware version of that transformation.

Yes, there is state, but I argue that it is no different than TC
offloading rules.  What if TC had "hash" and "crypt" operations
and we attached them to appropriate u32 matches?  You wouldn't
be able to tell the difference.

I think you are way over-obsessed with this FPGA offload thing,
quite frankly.


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-06 Thread Alexei Starovoitov
On Tue, Jun 06, 2017 at 10:17:09AM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 06, 2017 at 06:52:15AM +, Ilan Tayari wrote:
> 
> > So neither the host stack nor the network are aware of them.
> > They exist momentarily only on the internal traces on the board and not
> > anywhere else.
> 
> Is that really true? If you are creating rocee QPs' then the RDMA
> stack sees this stuff and now we have buried a RDMA ULP inside an
> ethernet driver which seems really wonky..
> 
> > I don't mind explaining further, but I think you will just see it in the
> > patchset when we submit.
> 
> You described exactly what I thought.. I just disagree with you that
> an ethernet connected and controlled IP accelerator is 'part of the
> NIC', even if it happens to be colocated on the same circuit board.

+1

what Ilan described is a kernel bypass done by hw.
This is non starter in production. Same as eswitch this fpga is not
represented as a kernel object, there is no way to debug it.
NIC crafts roce packets back and forth?!
so it's like rdma, but without using kernel rdma stack?
When hw ipsec or tls will mysteriously drop or mangle the packets
how this can be debugged? Does fpga have attached ddr to
store/forward the packets? How memory issues will be reported?
No MCE errors ever? Buffer overflow? How many receive queues inside fpga?
How health check of fgpa itself will be done? Through roce packets?
I would buy the lack of kernel visibility if this fpga+nic combo
was a prototype, but it's being presented as a production device
with subsequent changes to core networking stack and that's where
I have a problem with its sw architecture.



Re: [PATCH net-next 3/5] net: dsa: Associate slave network device with CPU port

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> In preparation for supporting multiple CPU ports with DSA, have the
> dsa_slave_priv structure know which CPU it is associated with. This will

  dsa_port
  
> be important in order to make sure the correct CPU is used for
> transmission of the frames. If not for functional reasons, for
> performance (e.g: load balancing) and forwarding decisions.
>
> Signed-off-by: Florian Fainelli 
> ---
>  include/net/dsa.h  | 1 +
>  net/dsa/dsa_priv.h | 2 +-
>  net/dsa/slave.c| 5 -
>  3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 7e93869819f9..58969b9a090c 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -171,6 +171,7 @@ struct dsa_port {
>   struct dsa_switch   *ds;
>   unsigned intindex;
>   const char  *name;
> + struct dsa_port *cpu_dp;
>   struct net_device   *netdev;
>   struct device_node  *dn;
>   unsigned intageing_time;
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 5c510f4ba0ce..7c2326f3b538 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -185,7 +185,7 @@ extern const struct dsa_device_ops trailer_netdev_ops;
>  
>  static inline struct net_device *dsa_master_netdev(struct dsa_slave_priv *p)
>  {
> - return p->dp->ds->dst->cpu_dp->netdev;
> + return p->dp->cpu_dp->netdev;
>  }
>  
>  #endif
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index ea4ed0285922..de1ab41cfd38 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1139,9 +1139,11 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
> device *parent,


>   struct net_device *master;
>   struct net_device *slave_dev;
>   struct dsa_slave_priv *p;
> + struct dsa_port *cpu_dp;
>   int ret;
>  
> - master = ds->dst->cpu_dp->netdev;
> + cpu_dp = ds->dst->cpu_dp;
> + master = cpu_dp->netdev;

You may assign them when declaring them, but no big deal.

>  
>   slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
>NET_NAME_UNKNOWN, ether_setup);
> @@ -1176,6 +1178,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
> device *parent,
>   p->old_duplex = -1;
>  
>   ds->ports[port].netdev = slave_dev;
> + p->dp->cpu_dp = cpu_dp;
>   ret = register_netdev(slave_dev);
>   if (ret) {
>   netdev_err(master, "error %d registering interface %s\n",

Reviewed-by: Vivien Didelot 


Re: [PATCH net-next 2/5] net: dsa: Relocate master ethtool operations

2017-06-06 Thread Vivien Didelot
Florian Fainelli  writes:

> Relocate master_ethtool_ops and master_orig_ethtool_ops into struct
> dsa_port in order to be both consistent, and make things self contained
> within the dsa_port structure.
>
> This is a preliminary change to supporting multiple CPU port interfaces.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 


Re: [PATCH net-next 1/5] net: dsa: Remove master_netdev and use dst->cpu_dp->netdev

2017-06-06 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> - struct net_device *p = ds->dst[ds->index].master_netdev;
> + struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;

   ds->dst->cpu_dp->netdev

ds->dst is not an array anymore, lucky sf2 switch index is always 0 ;-)

>   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>   struct ethtool_wolinfo pwol;
>  
> @@ -829,7 +829,7 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int 
> port,
>  static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
> struct ethtool_wolinfo *wol)
>  {
> - struct net_device *p = ds->dst[ds->index].master_netdev;
> + struct net_device *p = ds->dst[ds->index].cpu_dp->netdev;

same here.

Thanks,

Vivien


Re: [PATCH net-next 0/9] s390/net updates

2017-06-06 Thread David Miller
From: Julian Wiedmann 
Date: Tue,  6 Jun 2017 14:33:41 +0200

> please apply the following qeth updates for net-next.
> 
> Aside from some janitorial changes, this adds early setup for virtualized
> HiperSockets devices - building upon the code that landed via -net earlier.

Series applied, thanks.


[PATCH net-next 2/5] net: dsa: Relocate master ethtool operations

2017-06-06 Thread Florian Fainelli
Relocate master_ethtool_ops and master_orig_ethtool_ops into struct
dsa_port in order to be both consistent, and make things self contained
within the dsa_port structure.

This is a preliminary change to supporting multiple CPU port interfaces.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 17 +
 net/dsa/dsa.c | 16 ++--
 net/dsa/slave.c   | 16 
 3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b2fb53f5e28e..7e93869819f9 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -122,12 +122,6 @@ struct dsa_switch_tree {
 */
struct dsa_platform_data*pd;
 
-   /*
-* Reference to network device to use, and which tagging
-* protocol to use.
-*/
-   struct net_device   *master_netdev;
-
/* Copy of tag_ops->rcv for faster access in hot path */
struct sk_buff *(*rcv)(struct sk_buff *skb,
   struct net_device *dev,
@@ -135,12 +129,6 @@ struct dsa_switch_tree {
   struct net_device *orig_dev);
 
/*
-* Original copy of the master netdev ethtool_ops
-*/
-   struct ethtool_ops  master_ethtool_ops;
-   const struct ethtool_ops *master_orig_ethtool_ops;
-
-   /*
 * The switch port to which the CPU is attached.
 */
struct dsa_port *cpu_dp;
@@ -189,6 +177,11 @@ struct dsa_port {
u8  stp_state;
struct net_device   *bridge_dev;
struct devlink_port devlink_port;
+   /*
+* Original copy of the master netdev ethtool_ops
+*/
+   struct ethtool_ops  ethtool_ops;
+   const struct ethtool_ops *orig_ethtool_ops;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index eaab1affeeeb..2665a66e833d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -118,15 +118,16 @@ int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp)
struct net_device *master;
struct ethtool_ops *cpu_ops;
 
-   master = ds->dst->cpu_dp->netdev;
+   master = cpu_dp->netdev;
+
cpu_ops = devm_kzalloc(ds->dev, sizeof(*cpu_ops), GFP_KERNEL);
if (!cpu_ops)
return -ENOMEM;
 
-   memcpy(>dst->master_ethtool_ops, master->ethtool_ops,
+   memcpy(_dp->ethtool_ops, master->ethtool_ops,
   sizeof(struct ethtool_ops));
-   ds->dst->master_orig_ethtool_ops = master->ethtool_ops;
-   memcpy(cpu_ops, >dst->master_ethtool_ops,
+   cpu_dp->orig_ethtool_ops = master->ethtool_ops;
+   memcpy(cpu_ops, _dp->ethtool_ops,
   sizeof(struct ethtool_ops));
dsa_cpu_port_ethtool_init(cpu_ops);
master->ethtool_ops = cpu_ops;
@@ -136,12 +137,7 @@ int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp)
 
 void dsa_cpu_port_ethtool_restore(struct dsa_port *cpu_dp)
 {
-   struct dsa_switch *ds = cpu_dp->ds;
-   struct net_device *master;
-
-   master = ds->dst->cpu_dp->netdev;
-
-   master->ethtool_ops = ds->dst->master_orig_ethtool_ops;
+   cpu_dp->netdev->ethtool_ops = cpu_dp->orig_ethtool_ops;
 }
 
 void dsa_cpu_dsa_destroy(struct dsa_port *port)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index d52c9ceb0566..ea4ed0285922 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -523,10 +523,10 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
s8 cpu_port = dst->cpu_dp->index;
int count = 0;
 
-   if (dst->master_ethtool_ops.get_sset_count) {
-   count = dst->master_ethtool_ops.get_sset_count(dev,
+   if (dst->cpu_dp->ethtool_ops.get_sset_count) {
+   count = dst->cpu_dp->ethtool_ops.get_sset_count(dev,
   ETH_SS_STATS);
-   dst->master_ethtool_ops.get_ethtool_stats(dev, stats, data);
+   dst->cpu_dp->ethtool_ops.get_ethtool_stats(dev, stats, data);
}
 
if (ds->ops->get_ethtool_stats)
@@ -539,8 +539,8 @@ static int dsa_cpu_port_get_sset_count(struct net_device 
*dev, int sset)
struct dsa_switch *ds = dst->cpu_dp->ds;
int count = 0;
 
-   if (dst->master_ethtool_ops.get_sset_count)
-   count += dst->master_ethtool_ops.get_sset_count(dev, sset);
+   if (dst->cpu_dp->ethtool_ops.get_sset_count)
+   count += dst->cpu_dp->ethtool_ops.get_sset_count(dev, sset);
 
if (sset == ETH_SS_STATS && ds->ops->get_sset_count)
count += ds->ops->get_sset_count(ds);
@@ -564,10 +564,10 @@ static void dsa_cpu_port_get_strings(struct net_device 
*dev,
/* We do not want to be NULL-terminated, since this is a prefix */
pfx[sizeof(pfx) - 1] = '_';
 
-   if (dst->master_ethtool_ops.get_sset_count) {
-   mcount = dst->master_ethtool_ops.get_sset_count(dev,

[PATCH v2 net-next 4/4] tls: Documentation

2017-06-06 Thread Dave Watson
Add documentation for the tcp ULP tls interface.

Signed-off-by: Boris Pismenny 
Signed-off-by: Dave Watson 
---
 Documentation/networking/tls.txt | 135 +++
 1 file changed, 135 insertions(+)
 create mode 100644 Documentation/networking/tls.txt

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
new file mode 100644
index 000..77ed006
--- /dev/null
+++ b/Documentation/networking/tls.txt
@@ -0,0 +1,135 @@
+Overview
+
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
+TCP. TLS provides end-to-end data integrity and confidentiality.
+
+User interface
+==
+
+Creating a TLS connection
+-
+
+First create a new TCP socket and set the TLS ULP.
+
+  sock = socket(AF_INET, SOCK_STREAM, 0);
+  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
+
+Setting the TLS ULP allows us to set/get TLS socket options. Currently
+only the symmetric encryption is handled in the kernel.  After the TLS
+handshake is complete, we have all the parameters required to move the
+data-path to the kernel. There is a separate socket option for moving
+the transmit and the receive into the kernel.
+
+  /* From linux/tls.h */
+  struct tls_crypto_info {
+  unsigned short version;
+  unsigned short cipher_type;
+  };
+
+  struct tls12_crypto_info_aes_gcm_128 {
+  struct tls_crypto_info info;
+  unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+  unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+  unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+  unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
+  };
+
+
+  struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+  crypto_info.info.version = TLS_1_2_VERSION;
+  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+  memcpy(crypto_info.rec_seq, seq_number_write,
+   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+  memcpy(crypto_info.salt, implicit_iv_write, 
TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+  setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
+
+Sending TLS application data
+
+
+After setting the TLS_TX socket option all application data sent over this
+socket is encrypted using TLS and the parameters provided in the socket option.
+For example, we can send an encrypted hello world record as follows:
+
+  const char *msg = "hello world\n";
+  send(sock, msg, strlen(msg));
+
+send() data is directly encrypted from the userspace buffer provided
+to the encrypted kernel send buffer if possible.
+
+The sendfile system call will send the file's data over TLS records of maximum
+length (2^14).
+
+  file = open(filename, O_RDONLY);
+  fstat(file, );
+  sendfile(sock, file, , stat.st_size);
+
+TLS records are created and sent after each send() call, unless
+MSG_MORE is passed.  MSG_MORE will delay creation of a record until
+MSG_MORE is not passed, or the maximum record size is reached.
+
+The kernel will need to allocate a buffer for the encrypted data.
+This buffer is allocated at the time send() is called, such that
+either the entire send() call will return -ENOMEM (or block waiting
+for memory), or the encryption will always succeed.  If send() returns
+-ENOMEM and some data was left on the socket buffer from a previous
+call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
+
+Send TLS control messages
+-
+
+Other than application data, TLS has control messages such as alert
+messages (record type 21) and handshake messages (record type 22), etc.
+These messages can be sent over the socket by providing the TLS record type
+via a CMSG. For example the following function sends @data of @length bytes
+using a record of type @record_type.
+
+/* send TLS control message using record_type */
+  static int klts_send_ctrl_message(int sock, unsigned char record_type,
+  void *data, size_t length)
+  {
+struct msghdr msg = {0};
+int cmsg_len = sizeof(record_type);
+struct cmsghdr *cmsg;
+char buf[CMSG_SPACE(cmsg_len)];
+struct iovec msg_iov;   /* Vector of data to send/receive into.  */
+
+msg.msg_control = buf;
+msg.msg_controllen = sizeof(buf);
+cmsg = CMSG_FIRSTHDR();
+cmsg->cmsg_level = SOL_TLS;
+cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
+cmsg->cmsg_len = CMSG_LEN(cmsg_len);
+*CMSG_DATA(cmsg) = record_type;
+msg.msg_controllen = cmsg->cmsg_len;
+
+msg_iov.iov_base = data;
+msg_iov.iov_len = length;
+msg.msg_iov = _iov;
+msg.msg_iovlen = 1;
+
+return sendmsg(sock, , 0);
+  }
+
+Control message data should be 

  1   2   3   >