date:20160617

Re: [PATCH 2/2] net: ethernet: et131x: use phy_ethtool_{get|set}_link_ksettings

2016-06-17 Thread David Miller

From: Philippe Reynes 
Date: Fri, 17 Jun 2016 23:32:15 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH 1/2] net: ethernet: et131x: use phydev from struct net_device

2016-06-17 Thread David Miller

From: Philippe Reynes 
Date: Fri, 17 Jun 2016 23:32:14 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH net] RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP

2016-06-17 Thread David Miller

From: Sowmini Varadhan 
Date: Fri, 17 Jun 2016 16:12:12 -0400

> The state of the rds_connection after rds_tcp_reset_callbacks() would
> be RDS_CONN_RESETTING and this is the value that should be passed
> by rds_tcp_accept_one()  to rds_connect_path_complete() to transition
> the socket to RDS_CONN_UP.
> 
> Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
> by rds_tcp_accept_one()")
> Signed-off-by: Sowmini Varadhan 

Applied.

Re: [PATCH -next] RDS: TCP: Fix non static symbol warnings

2016-06-17 Thread David Miller

From: weiyj...@163.com
Date: Fri, 17 Jun 2016 18:12:46 +

> From: Wei Yongjun 
> 
> Fixes the following sparse warnings:
> 
> net/rds/tcp.c:59:5: warning:
>  symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
> net/rds/tcp.c:60:5: warning:
>  symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH] gtp: remove unused including

2016-06-17 Thread David Miller

From: weiyj...@163.com
Date: Fri, 17 Jun 2016 17:33:30 +

> From: Wei Yongjun 
> 
> Remove including  that don't need it.
> 
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH] net:liquidio: remove unused including

2016-06-17 Thread David Miller

From: weiyj...@163.com
Date: Fri, 17 Jun 2016 17:49:33 +

> From: Wei Yongjun 
> 
> Remove including  that don't need it.
> 
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH] net: tilegx: use correct timespec64 type

2016-06-17 Thread David Miller

From: Arnd Bergmann 
Date: Fri, 17 Jun 2016 18:15:30 +0200

> The conversion to the 64-bit time based ptp methods left two instances
> of 'struct timespec' in place. This is harmless because 64-bit
> architectures define timespec64 as timespec, and this driver is
> not used on 32-bit machines.
> 
> However, using 'struct timespec64' directly is obviously the right
> thing to do, and will help us remove 'struct timespec' in the future.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: b9acf24f779c ("ptp: tilegx: convert to the 64 bit get/set time 
> methods.")

Applied, thanks.

Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-17 Thread Takashi Sakamoto

Hi,

Sorry to be late. In this weekday, I have little time for this thread
because working for alsa-lib[1]. Besides, I'm not full-time developer
for this kind of work. In short, I use my limited private time for this
discussion.

On Jun 15 2016 17:06, Richard Cochran wrote:
> On Wed, Jun 15, 2016 at 12:15:24PM +0900, Takashi Sakamoto wrote:
>>> On Mon, Jun 13, 2016 at 01:47:13PM +0200, Richard Cochran wrote:
 I have seen audio PLL/multiplier chips that will take, for example, a
 10 kHz input and produce your 48 kHz media clock.  With the right HW
 design, you can tell your PTP Hardware Clock to produce a 1 PPS,
 and you will have a synchronized AVB endpoint.  The software is all
 there already.  Somebody should tell the ALSA guys about it.
>>
>> Just from my curiosity, could I ask you more explanation for it in ALSA
>> side?
> 
> (Disclaimer: I really don't know too much about ALSA, expect that is
> fairly big and complex ;)

In this morning, I read IEEE 1722:2011 and realized that it quite
roughly refers to IEC 61883-1/6 and includes much ambiguities to end
applications.

(In my opinion, the author just focuses on packet with timestamps,
without enough considering about how to implement endpoint applications
which perform semi-real sampling, fetching and queueing and so on, so as
you. They're satisfied just by handling packet with timestamp, without
enough consideration about actual hardware/software applications.)

> Here is what I think ALSA should provide:
> 
> - The DA and AD clocks should appear as attributes of the HW device.
> 
> - There should be a method for measuring the DA/AD clock rate with
>   respect to both the system time and the PTP Hardware Clock (PHC)
>   time.
> 
> - There should be a method for adjusting the DA/AD clock rate if
>   possible.  If not, then ALSA should fall back to sample rate
>   conversion.
> 
> - There should be a method to determine the time delay from the point
>   when the audio data are enqueued into ALSA until they pass through
>   the D/A converter.  If this cannot be known precisely, then the
>   library should provide an estimate with an error bound.
> 
> - I think some AVB use cases will need to know the time delay from A/D
>   until the data are available to the local application.  (Distributed
>   microphones?  I'm not too sure about that.)
> 
> - If the DA/AD clocks are connected to other clock devices in HW,
>   there should be a way to find this out in SW.  For example, if SW
>   can see the PTP-PHC-PLL-DA relationship from the above example, then
>   it knows how to synchronize the DA clock using the network.
> 
>   [ Implementing this point involves other subsystems beyond ALSA.  It
> isn't really necessary for people designing AVB systems, since
> they know their designs, but it would be nice to have for writing
> generic applications that can deal with any kind of HW setup. ]

Depends on which subsystem decides "AVTP presentation time"[3]. This
value is dominant to the number of events included in an IEC 61883-1
packet. If this TSN subsystem decides it, most of these items don't need
to be in ALSA.

As long as I know, the number of AVTPDU per second seems not to be
fixed. So each application is not allowed to calculate the timestamp by
its own way unless TSN implementation gives the information to each
applications.

For your information, in current ALSA implementation of IEC 61883-1/6 on
IEEE 1394 bus, the presentation timestamp is decided in ALSA side. The
number of isochronous packet transmitted per second is fixed by 8,000 in
IEEE 1394, and the number of data blocks in an IEC 61883-1 packet is
deterministic according to 'sampling transfer frequency' in IEC 61883-6
and isochronous cycle count passed from Linux FireWire subsystem.

In the TSN subsystem, like FireWire subsystem, callback for filling
payload should have information of 'when the packet is scheduled to be
transmitted'. With the information, each application can calculate the
number of event in the packet and presentation timestamp. Of cource,
this timestamp should be handled as 'avtp_timestamp' in packet queueing.

>> In ALSA, sampling rate conversion should be in userspace, not in kernel
>> land. In alsa-lib, sampling rate conversion is implemented in shared object.
>> When userspace applications start playbacking/capturing, depending on PCM
>> node to access, these applications load the shared object and convert PCM
>> frames from buffer in userspace to mmapped DMA-buffer, then commit them.
> 
> The AVB use case places an additional requirement on the rate
> conversion.  You will need to adjust the frequency on the fly, as the
> stream is playing.  I would guess that ALSA doesn't have that option?

In ALSA kernel/userspace interfaces , the specification cannot be
supported, at all.

Please explain about this requirement, where it comes from, which
specification and clause describe it (802.1AS or 802.1Q?). As long as I
read IEEE 1722, I cannot

Re: pull-request: can-next 2016-06-17,pull-request: can-next 2016-06-17

2016-06-17 Thread David Miller

From: Marc Kleine-Budde 
Date: Fri, 17 Jun 2016 16:31:46 +0200

> this is a pull request of 14 patches for net-next/master.
> 
> Geert Uytterhoeven contributes a patch that adds a file patterns for
> CAN device tree bindings to MAINTAINERS. A patch by Alexander Aring
> fixes warnings when building without proc support. A patch by me 
> improves the sample point calculation. Marek Vasut's patch converts
> the slcan driver to use CAN_MTU. A patch by William Breathitt Gray
> converts the tscan1 driver to use module_isa_driver.
> 
> Two patches by Maximilian Schneider for the gs_usb driver fix coding
> style and add support for set_phys_id callback. 5 patches by Oliver
> Hartkopp add support for CANFD to the bcm. And finally two patches
> by Ramesh Shanmugasundaram, which add support for the rcar_canfd
> driver.

Pulled, thanks Marc.

esp: Fix ESN generation under UDP encapsulation

2016-06-17 Thread Herbert Xu

On Fri, Jun 17, 2016 at 12:24:29PM +0200, Steffen Klassert wrote:
> On Wed, Jun 15, 2016 at 12:44:54AM +, Blair Steven wrote:
> > The restoration is happening - but being actioned on the wrong location.
> > 
> > The destination IP address is being saved and restored, and the SPI 
> > being written directly after the destination IP address. From my 
> > understanding though, the ESN shuffling should have saved and restored 
> > the UDP source / dest ports + SPI.
> 
> Yes, looks like we copy with a wrong offset if udp encapsulation
> is used, skb_transport_header() does not point to the esp header
> in this case. Ccing Herbert, he changed this part when switching
> to the new AEAD interface with
> commit 7021b2e1cddd ("esp4: Switch to new AEAD interface").

Thanks for catching this!

I think rather than changing the transport header (which isn't
quite right because UDP still is the transport protocol), we can
just save the offset locally.  Something like this:

---8<---
Blair Steven noticed that ESN in conjunction with UDP encapsulation
is broken because we set the temporary ESP header to the wrong spot.

This patch fixes this by first of all using the right spot, i.e.,
4 bytes off the real ESP header, and then saving this information
so that after encryption we can restore it properly.

Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
Reported-by: Blair Steven 
Signed-off-by: Herbert Xu 

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 4779374..d95631d 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -23,6 +23,11 @@ struct esp_skb_cb {
void *tmp;
 };
 
+struct esp_output_extra {
+   __be32 seqhi;
+   u32 esphoff;
+};
+
 #define ESP_SKB_CB(__skb) ((struct esp_skb_cb *)&((__skb)->cb[0]))
 
 static u32 esp4_get_mtu(struct xfrm_state *x, int mtu);
@@ -35,11 +40,11 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu);
  *
  * TODO: Use spare space in skb for this where possible.
  */
-static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int seqhilen)
+static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags, int extralen)
 {
unsigned int len;
 
-   len = seqhilen;
+   len = extralen;
 
len += crypto_aead_ivsize(aead);
 
@@ -57,15 +62,16 @@ static void *esp_alloc_tmp(struct crypto_aead *aead, int 
nfrags, int seqhilen)
return kmalloc(len, GFP_ATOMIC);
 }
 
-static inline __be32 *esp_tmp_seqhi(void *tmp)
+static inline void *esp_tmp_extra(void *tmp)
 {
-   return PTR_ALIGN((__be32 *)tmp, __alignof__(__be32));
+   return PTR_ALIGN(tmp, __alignof__(struct esp_output_extra));
 }
-static inline u8 *esp_tmp_iv(struct crypto_aead *aead, void *tmp, int seqhilen)
+
+static inline u8 *esp_tmp_iv(struct crypto_aead *aead, void *tmp, int extralen)
 {
return crypto_aead_ivsize(aead) ?
-  PTR_ALIGN((u8 *)tmp + seqhilen,
-crypto_aead_alignmask(aead) + 1) : tmp + seqhilen;
+  PTR_ALIGN((u8 *)tmp + extralen,
+crypto_aead_alignmask(aead) + 1) : tmp + extralen;
 }
 
 static inline struct aead_request *esp_tmp_req(struct crypto_aead *aead, u8 
*iv)
@@ -99,7 +105,7 @@ static void esp_restore_header(struct sk_buff *skb, unsigned 
int offset)
 {
struct ip_esp_hdr *esph = (void *)(skb->data + offset);
void *tmp = ESP_SKB_CB(skb)->tmp;
-   __be32 *seqhi = esp_tmp_seqhi(tmp);
+   __be32 *seqhi = esp_tmp_extra(tmp);
 
esph->seq_no = esph->spi;
esph->spi = *seqhi;
@@ -107,7 +113,11 @@ static void esp_restore_header(struct sk_buff *skb, 
unsigned int offset)
 
 static void esp_output_restore_header(struct sk_buff *skb)
 {
-   esp_restore_header(skb, skb_transport_offset(skb) - sizeof(__be32));
+   void *tmp = ESP_SKB_CB(skb)->tmp;
+   struct esp_output_extra *extra = esp_tmp_extra(tmp);
+
+   esp_restore_header(skb, skb_transport_offset(skb) + extra->esphoff -
+   sizeof(__be32));
 }
 
 static void esp_output_done_esn(struct crypto_async_request *base, int err)
@@ -121,6 +131,7 @@ static void esp_output_done_esn(struct crypto_async_request 
*base, int err)
 static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 {
int err;
+   struct esp_output_extra *extra;
struct ip_esp_hdr *esph;
struct crypto_aead *aead;
struct aead_request *req;
@@ -137,8 +148,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff 
*skb)
int tfclen;
int nfrags;
int assoclen;
-   int seqhilen;
-   __be32 *seqhi;
+   int extralen;
__be64 seqno;
 
/* skb is pure payload to encrypt */
@@ -166,21 +176,21 @@ static int esp_output(struct xfrm_state *x, struct 
sk_buff *skb)
nfrags = err;
 
assoclen = sizeof(*esph);
-   seqhilen = 0;
+   extralen = 0;
 
if (x->props.flags & XFRM_STATE_ESN) {
-   seqhilen

Re: [patch net 0/2] mlxsw: couple of fixes

2016-06-17 Thread David Miller

From: Jiri Pirko 
Date: Fri, 17 Jun 2016 15:09:04 +0200

> Couple of slowpath tx stats fixes for Spectrum and SwitchX-2.

Series applied, thanks Jiri.

Re: [PATCH 0/6] eBPF JIT for PPC64

2016-06-17 Thread mpe

On 2016-06-13 15:40, Naveen N. Rao wrote:

On 2016/06/10 10:47PM, David Miller wrote:

From: "Naveen N. Rao" 
Date: Tue,  7 Jun 2016 19:02:17 +0530

> Please note that patch [2] is a pre-requisite for this patchset, and is
> not yet upstream.
 ...
> [1] http://thread.gmane.org/gmane.linux.kernel/2188694
> [2] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/96514

Because of #2 I don't think I can take this directly into the 
networking

tree, right?

Therefore, how would you like this to be merged?

Hi David,
Thanks for asking. Yes, I think it is better to take this through the
powerpc tree as all the changes are contained within arch/powerpc,
unless Michael Ellerman feels differently.

Michael?

Yeah I was planning to take it.

I put it in my test tree last night but it broke the build for some 
configs.

Once that is fixed I'll take it via powerpc#next.

cheers

Re: [PATCH net 1/1] tipc: fix socket timer deadlock

2016-06-17 Thread David Miller

From: Jon Maloy 
Date: Fri, 17 Jun 2016 06:35:57 -0400

> We sometimes observe a 'deadly embrace' type deadlock occurring
> between mutually connected sockets on the same node. This happens
> when the one-hour peer supervision timers happen to expire
> simultaneously in both sockets.
> 
> The scenario is as follows:
 ...
> Further analysis reveals that there are three different locations in the
> socket code where tipc_sk_respond() is called within the context of the
> socket lock, with ensuing risk of similar deadlocks.
> 
> We now solve this by passing a buffer queue along with all upcalls where
> sk_lock.slock may potentially be held. Response or rejected message
> buffers are accumulated into this queue instead of being sent out
> directly, and only sent once we know we are safely outside the slock
> context.
> 
> Reported-by: GUNA 
> Acked-by: Ying Xue 
> Signed-off-by: Jon Maloy 

Applied, thanks Jon.

Re: [PATCH v2 net-next 0/3] net sched action timestamp improvements

2016-06-17 Thread David Miller

From: Jamal Hadi Salim 
Date: Fri, 17 Jun 2016 07:15:27 -0400

> Ok, mystery solved. Dave, this patch series is missing from current
> net-next.

Then what is:

commit 53eb440f4ada034ea43b295891feec3df0fa7a29
Author: Jamal Hadi Salim 
Date:   Mon Jun 6 06:32:54 2016 -0400

net sched actions: introduce timestamp for firsttime use

Useful to know when the action was first used for accounting
(and debugging)

Signed-off-by: Jamal Hadi Salim

Re: [PATCH v3 0/2] net: ethernet: ti: cpsw: delete rx_descs property

2016-06-17 Thread David Miller

From: Ivan Khoronzhuk 
Date: Fri, 17 Jun 2016 13:25:38 +0300

> There is no reason in rx_descs property because davinici_cpdma
> driver splits pool of descriptors equally between tx and rx channels.
> So, this patch series makes driver to use available number of
> descriptors for rx channels.
> 
> Based on
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> master
> 
> Since v2:
> - add declaration of buf_num in correct order

Series applied, thanks.

Re: [patch -next] tipc: potential shift wrapping bug in map_set()

2016-06-17 Thread David Miller

From: Dan Carpenter 
Date: Fri, 17 Jun 2016 12:22:26 +0300

> "up_map" is a u64 type but we're not using the high 32 bits.
> 
> Fixes: 35c55c9877f8 ('tipc: add neighbor monitoring framework')
> Signed-off-by: Dan Carpenter 

Applied.

Re: [PATCH net-next 0/3] net: vrf: Fix ipv6 source address selection

2016-06-17 Thread David Miller

From: David Ahern 
Date: Thu, 16 Jun 2016 16:24:23 -0700

> IPv6 address selection is currently messed up for several use cases such
> as unnumbered deployments with global addresses on the VRF device and none
> on the enslaved devices.
> 
> Update the source address selection to consider the real output route as
> opposed to the VRF route that sends packets to the VRF device first (ie.,
> implement get_saddr6 similar to the IPv4 method) and update the IPv6
> address selection to consider L3 domains and preference for addresses on
> the VRF device).

Series applied.

Re: [PATCH] net: lantiq_etop: remove unused variable

2016-06-17 Thread David Miller

From: Sudip Mukherjee 
Date: Thu, 16 Jun 2016 22:19:31 +0100

> The variable i was declared but was never used and we were getting a
> build warning for that.
> 
> Signed-off-by: Sudip Mukherjee 

Applied, thanks.

Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-17 Thread Tom Herbert

> We also now have to debug against every single userland TCP
> implementation someone can come up with, the barrier to entry is
> insanely low with TOU.  Maybe this sounds great to you, but to me
> it is quite terrifying
>
No, it doesn't sound great, but the major problem we have is that
Android and to some extent iOS & Windows take a long time to update
the kernel, and it can take an _extremely_ long time if we need them
to actively enable features that are needed by applications. For
instance, TFO was put in the Linux several years ago, but it still
hasn't been enabled in Android and only fairly recently enabled in
iOS. If we (e.g. Facebook) implement a userspace stack in clients and
control the stack in our servers we can roll out a feature like that
in a couple of months. I don't see anyway to fix other than trying to
take control of our own destiny.  It seems like it's either this or
use something like QUIC which bypasses the kernel completely and
discards TCP-- we have far too much invested to commit to that
alternative at this point.

> This sounds really great on paper, however I find it hard to believe
> that makers of middleware boxes are just going to throw their hands in
> the air and say "oh well, we lose."
>
Happy Eyeballs is implied. There's pretty good data that the majority
(>90%) of the Internet will pass UDP without issue, QUIC is already
seeing good deployment and there are several UDP based protocols
productively deployed. There is also an effort in IETF, PLUS,  to
define a substrate layer that makes UDP based transport protocol
palatable to middleboxes with some sort of signaling.

> Rather, I think people are going to start adding rules to block TOU
> tunnels entirely because they cannot inspect nor conditionally
> filter/rewrite the contents.  This is even more likely if Joe Random
> and so easily can do their own userland TCP stack over TOU.
>
Unfortunately, encryption is the only proven solution to protocol
ossification. If the network doesn't see it, it can't ossify it.

> BTW, I have a question about the pseudo header checksum.  If the outer
> IP addresses can change, due to NAT or whatever, and therefore the
> session key is used for connection demuxing instead of the addresses,
> why don't we also have to use the session key instead of the IP
> addresses for the pseudo header used in checksum calculations?

Yes, the inner checksum will need to be handled differently in case of
NAT. Need to update draft to take that into account.

Thanks,
Tom

Re: [net-next PATCH v3 00/17] Future-proof tunnel offload handlers

2016-06-17 Thread David Miller

From: Alexander Duyck 
Date: Thu, 16 Jun 2016 12:20:35 -0700

> s patch is meant to address two things.  First we are currently using
> the ndo_add/del_vxlan_port calls with VXLAN-GPE tunnels and we cannot
> really support that as it is likely to cause more harm than good since
> VXLAN-GPE can support tunnels without a MAC address on the inner header.
> 
> As such we need to add a new offload to advertise this, but in doing so it
> would mean introducing 3 new functions for the driver to request the ports,
> and then for the tunnel to push the changes to add and delete the ports to
> the device.  However instead of taking that approach I think it would be
> much better if we just made one common function for fetching the ports, and
> provided a generic means to push the tunnels to the device.  So in order to
> make this work this patch set does several things.
 ...

Series applied, thanks Alexander.

Tom, I've heard your arguments, but I think your fears are unfounded.
Look at what Alexander's patches are actually doing.

First, he's fixing a bug.  Hardware that supports VXLAN offloading
doesn't support VXLAN-GPE, yet we were sending such things to the
VXLAN offload paths.

Second, he's eliminating a metric ton of Kconfig garbage, as drivers
had to have all kinds of contrived dependencies to support UDP tunnel
offloads.

Third, he's consolidating several driver NDO methods into just two.

And finally, he added a big comment explaining that new tunnel types
should not be added to the tunnel type list, and those that exist
should only be used for RX.

Therefore, this isn't openning the door for new random offloads, quite
the contrary.  Instead, if it making clearer what the existing
facilitites support, and putting an explicit cap on them.

Thanks.

Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-17 Thread David Miller

From: Tom Herbert 
Date: Thu, 16 Jun 2016 10:51:54 -0700

> The goal of this work is twofold:
> 
> 1) Allow applications to run their own transport layer stack (i.e.from
>userspace). This eliminates dependencies on the OS (e.g. solves a
>major dependency issue for Facebook on clients).

Clients need to support TOU in their kernels, it's a similar kind of
dependency, but of course you only need to propagate it once rather
than once per desired stack change.

We also now have to debug against every single userland TCP
implementation someone can come up with, the barrier to entry is
insanely low with TOU.  Maybe this sounds great to you, but to me
it is quite terrifying

This has a monumental impact upon maintainence of our TCP stack.

I suspect a lot of bug reports will go straight to /dev/null once it
is clear that it is a TOU TCP connection.

For TCP the tight integration into the kernel is a benefit, because it
limits the number of variables at stake when trying to analyze
problems.

> 2) Make transport layer headers (all of L4) invisible to the network
>so that they can't do intrusive actions at L4. This will be enforced
>with DTLS in use.

This achievement is only met when DTLS is enabled, and it isn't enabled
by default.

This sounds really great on paper, however I find it hard to believe
that makers of middleware boxes are just going to throw their hands in
the air and say "oh well, we lose."

Rather, I think people are going to start adding rules to block TOU
tunnels entirely because they cannot inspect nor conditionally
filter/rewrite the contents.  This is even more likely if Joe Random
and so easily can do their own userland TCP stack over TOU.

BTW, I have a question about the pseudo header checksum.  If the outer
IP addresses can change, due to NAT or whatever, and therefore the
session key is used for connection demuxing instead of the addresses,
why don't we also have to use the session key instead of the IP
addresses for the pseudo header used in checksum calculations?

[PATCH] ibmvnic: fix to use list_for_each_safe() when delete items

2016-06-17 Thread weiyj_lk

From: Wei Yongjun 

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..0b6a922 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union 
ibmvnic_crq *crq,
 
 static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter)
 {
-   struct ibmvnic_inflight_cmd *inflight_cmd;
+   struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1;
struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp2;
unsigned long flags;
unsigned long flags2;
 
spin_lock_irqsave(>inflight_lock, flags);
-   list_for_each_entry(inflight_cmd, >inflight, list) {
+   list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) {
switch (inflight_cmd->crq.generic.cmd) {
case LOGIN:
dma_unmap_single(dev, adapter->login_buf_token,
@@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter 
*adapter)
break;
case REQUEST_ERROR_INFO:
spin_lock_irqsave(>error_list_lock, flags2);
-   list_for_each_entry(error_buff, >errors,
-   list) {
+   list_for_each_entry_safe(error_buff, tmp2,
+>errors, list) {
dma_unmap_single(dev, error_buff->dma,
 error_buff->len,
 DMA_FROM_DEVICE);

Re: [PATCH 0/7] Netfilter fixes for net

2016-06-17 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 17 Jun 2016 20:25:12 +0200

> The following patchset contains Netfilter fixes for your net tree,
> they are rather small patches but fixing several outstanding bugs in
> nf_conntrack and nf_tables, as well as minor problems with missing
> SYNPROXY header uapi installation:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

[PATCH iproute2 -master] ss: Add tcp_info fields data_segs_in/out

2016-06-17 Thread Martin KaFai Lau

tcp_info fields, data_segs_in and data_segs_out, have been added to the
kernel in commit a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
since kernel 4.6.

This patch supports those fileds in ss:

ESTAB  801736 360face:face:face:face::1:22  
face:face:face:face::face:46779
 cubic wscale:9,7 rto:223 rtt:22.195/8.202 ato:40 mss:1428 cwnd:11 
ssthresh:7 bytes_acked:203649 bytes_received:334034603 segs_out:18513 
segs_in:241825 data_segs_out:4192 data_segs_in:241672 send 5.7Mbps lastsnd:2 
lastack:3 pacing_rate 6.8Mbps unacked:10 retrans:0/1 rcv_rtt:29.375 
rcv_space:1241704 minrtt:0.013

Signed-off-by: Martin KaFai Lau 
---
 misc/ss.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/misc/ss.c b/misc/ss.c
index 9c456d4..02be7e7 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -766,6 +766,8 @@ struct tcpstat {
unsigned long long  bytes_received;
unsigned intsegs_out;
unsigned intsegs_in;
+   unsigned intdata_segs_out;
+   unsigned intdata_segs_in;
unsigned intunacked;
unsigned intretrans;
unsigned intretrans_total;
@@ -1716,6 +1718,10 @@ static void tcp_stats_print(struct tcpstat *s)
printf(" segs_out:%u", s->segs_out);
if (s->segs_in)
printf(" segs_in:%u", s->segs_in);
+   if (s->data_segs_out)
+   printf(" data_segs_out:%u", s->data_segs_out);
+   if (s->data_segs_in)
+   printf(" data_segs_in:%u", s->data_segs_in);
 
if (s->dctcp && s->dctcp->enabled) {
struct dctcpstat *dctcp = s->dctcp;
@@ -2022,6 +2028,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
s.bytes_received = info->tcpi_bytes_received;
s.segs_out = info->tcpi_segs_out;
s.segs_in = info->tcpi_segs_in;
+   s.data_segs_out = info->tcpi_data_segs_out;
+   s.data_segs_in = info->tcpi_data_segs_in;
s.not_sent = info->tcpi_notsent_bytes;
if (info->tcpi_min_rtt && info->tcpi_min_rtt != ~0U)
s.min_rtt = (double) info->tcpi_min_rtt / 1000;
-- 
2.5.1

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Stephen Hemminger

On Sat, 18 Jun 2016 02:02:21 +0200
Phil Sutter  wrote:

> On Fri, Jun 17, 2016 at 11:57:53AM -0700, Stephen Hemminger wrote:
> > It makes sense that if you can build a kernel with old toolchain, that
> > iproute2 needs to be buildable as well.
> > 
> > The current kernels are documented to require 3.2 or later.  
> 
> So in your opinion we should stay compatible to gcc-3.2? Clarifying
> requirements like this one would make sense in order to know what to
> check against.

I want to have the least user complaints possible.
Since the main reason people build new iproute2 utilities is to work with 
features
of newer kernels. It makes sense to keep the requirements the same. Not sure if 
kernel
still builds with gcc-3.2, has anyone tried.

Re: [for-next 0/1] [pull request] Mellanox ConnectX-4 Shared Code

2016-06-17 Thread Doug Ledford

On 6/16/2016 12:37 AM, David Miller wrote:
> From: Leon Romanovsky 
> Date: Tue, 14 Jun 2016 13:29:11 +0300
> 
>> are available in the git repository at:
>>
>>   g...@gitolite.kernel.org:pub/scm/linux/kernel/git/leon/linux-rdma 
>> tags/shared
> 
> You need to learn how to publish a public URL of your tree.  That's your
> private one that only you can access.
> 
> Anyways, I pulled the correct URL into net-next.
> 

Likewise, pulled into my mlx5-4.8 topic branch.

-- 
Doug Ledford 
GPG Key ID: 0E572FDD



signature.asc
Description: OpenPGP digital signature

[PATCH v3 net-next v3 10/14] net: dsa: mv88e6xxx: add SMI init helper

2016-06-17 Thread Vivien Didelot

Add an helper function to isolate SMI specific assignations and checks.

This function will later help choosing the different SMI accesses based
of the compatible info.

Since the chip structure is already allocated in the legacy probe, use
the mv88e6xxx_reg_read access routine instead of __mv88e6xxx_reg_read.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 113092a..4e24ac5 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3616,6 +3616,19 @@ static struct mv88e6xxx_priv_state 
*mv88e6xxx_alloc_chip(struct device *dev)
return ps;
 }
 
+static int mv88e6xxx_smi_init(struct mv88e6xxx_priv_state *ps,
+ struct mii_bus *bus, int sw_addr)
+{
+   /* ADDR[0] pin is unavailable externally and considered zero */
+   if (sw_addr & 0x1)
+   return -EINVAL;
+
+   ps->bus = bus;
+   ps->sw_addr = sw_addr;
+
+   return 0;
+}
+
 static const char *mv88e6xxx_drv_probe(struct device *dsa_dev,
   struct device *host_dev, int sw_addr,
   void **priv)
@@ -3635,7 +3648,11 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
if (!ps)
return NULL;
 
-   id = __mv88e6xxx_reg_read(bus, sw_addr, REG_PORT(0), PORT_SWITCH_ID);
+   err = mv88e6xxx_smi_init(ps, bus, sw_addr);
+   if (err)
+   goto free;
+
+   id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
if (id < 0)
goto free;
 
@@ -3648,8 +3665,6 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
 
name = info->name;
 
-   ps->bus = bus;
-   ps->sw_addr = sw_addr;
ps->info = info;
 
err = mv88e6xxx_mdio_register(ps, NULL);
@@ -3741,8 +3756,9 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (!ps)
return -ENOMEM;
 
-   ps->bus = mdiodev->bus;
-   ps->sw_addr = mdiodev->addr;
+   err = mv88e6xxx_smi_init(ps, mdiodev->bus, mdiodev->addr);
+   if (err)
+   return err;
 
id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
if (id < 0)
-- 
2.8.3

[PATCH v3 net-next v3 01/14] net: dsa: mv88e6xxx: fix style issues

2016-06-17 Thread Vivien Didelot

This patch fixes 5 style problems reported by checkpatch:

WARNING: suspect code indent for conditional statements (8, 24)
#492: FILE: drivers/net/dsa/mv88e6xxx.c:492:
+   if (phydev->link)
+   reg |= PORT_PCS_CTRL_LINK_UP;

CHECK: Logical continuations should be on the previous line
#1318: FILE: drivers/net/dsa/mv88e6xxx.c:1318:
+oldstate == PORT_CONTROL_STATE_FORWARDING)
+   && (state == PORT_CONTROL_STATE_DISABLED ||

CHECK: multiple assignments should be avoided
#1662: FILE: drivers/net/dsa/mv88e6xxx.c:1662:
+   vlan->vid_begin = vlan->vid_end = next.vid;

WARNING: line over 80 characters
#2097: FILE: drivers/net/dsa/mv88e6xxx.c:2097:
+  const struct switchdev_obj_port_vlan 
*vlan,

WARNING: suspect code indent for conditional statements (16, 32)
#2734: FILE: drivers/net/dsa/mv88e6xxx.c:2734:
+   if (mv88e6xxx_6352_family(ps) || mv88e6xxx_6351_family(ps) ||
[...]
+   reg |= PORT_CONTROL_EGRESS_ADD_TAG;

total: 0 errors, 3 warnings, 2 checks, 3805 lines checked

It also rebases and integrates changes sent by Ben Dooks [1]:

The driver has a number of functions that are not exported or
declared elsewhere, so make them static to avoid the following
warnings from sparse:

drivers/net/dsa/mv88e6xxx.c:113:5: warning: symbol 'mv88e6xxx_reg_read' was 
not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:167:5: warning: symbol 'mv88e6xxx_reg_write' 
was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:231:5: warning: symbol 'mv88e6xxx_set_addr' was 
not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:367:6: warning: symbol 
'mv88e6xxx_ppu_state_init' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3157:5: warning: symbol 
'mv88e6xxx_phy_page_read' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3169:5: warning: symbol 
'mv88e6xxx_phy_page_write' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3583:26: warning: symbol 
'mv88e6xxx_switch_driver' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3621:5: warning: symbol 'mv88e6xxx_probe' was 
not declared. Should it be static?

[1] http://patchwork.ozlabs.org/patch/632708/

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ee06055..1972ec5 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -111,7 +111,8 @@ static int _mv88e6xxx_reg_read(struct mv88e6xxx_priv_state 
*ps,
return ret;
 }
 
-int mv88e6xxx_reg_read(struct mv88e6xxx_priv_state *ps, int addr, int reg)
+static int mv88e6xxx_reg_read(struct mv88e6xxx_priv_state *ps, int addr,
+ int reg)
 {
int ret;
 
@@ -165,8 +166,8 @@ static int _mv88e6xxx_reg_write(struct mv88e6xxx_priv_state 
*ps, int addr,
return __mv88e6xxx_reg_write(ps->bus, ps->sw_addr, addr, reg, val);
 }
 
-int mv88e6xxx_reg_write(struct mv88e6xxx_priv_state *ps, int addr,
-   int reg, u16 val)
+static int mv88e6xxx_reg_write(struct mv88e6xxx_priv_state *ps, int addr,
+  int reg, u16 val)
 {
int ret;
 
@@ -229,7 +230,7 @@ static int mv88e6xxx_set_addr_indirect(struct dsa_switch 
*ds, u8 *addr)
return 0;
 }
 
-int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
+static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 
@@ -370,7 +371,7 @@ static void mv88e6xxx_ppu_access_put(struct 
mv88e6xxx_priv_state *ps)
mutex_unlock(>ppu_mutex);
 }
 
-void mv88e6xxx_ppu_state_init(struct mv88e6xxx_priv_state *ps)
+static void mv88e6xxx_ppu_state_init(struct mv88e6xxx_priv_state *ps)
 {
mutex_init(>ppu_mutex);
INIT_WORK(>ppu_work, mv88e6xxx_ppu_reenable_work);
@@ -490,7 +491,7 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
 
reg |= PORT_PCS_CTRL_FORCE_LINK;
if (phydev->link)
-   reg |= PORT_PCS_CTRL_LINK_UP;
+   reg |= PORT_PCS_CTRL_LINK_UP;
 
if (mv88e6xxx_6065_family(ps) && phydev->speed > SPEED_100)
goto out;
@@ -1314,9 +1315,9 @@ static int _mv88e6xxx_port_state(struct 
mv88e6xxx_priv_state *ps, int port,
 * Blocking or Listening state.
 */
if ((oldstate == PORT_CONTROL_STATE_LEARNING ||
-oldstate == PORT_CONTROL_STATE_FORWARDING)
-   && (state == PORT_CONTROL_STATE_DISABLED ||
-   state ==

[PATCH v3 net-next v3 02/14] net: dsa: mv88e6xxx: remove redundant assignments

2016-06-17 Thread Vivien Didelot

The chip->ds and ds->slave_mii_bus assignments are common to both legacy
and new MDIO probing and are already done in the later setup code.

Remove the duplicated assignments from the MDIO probing code.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 1972ec5..673283a 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3708,7 +3708,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
ds->priv = ps;
ds->dev = dev;
ps->dev = dev;
-   ps->ds = ds;
ps->bus = mdiodev->bus;
ps->sw_addr = mdiodev->addr;
mutex_init(>smi_mutex);
@@ -3748,8 +3747,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
return err;
 
-   ds->slave_mii_bus = ps->mdio_bus;
-
dev_set_drvdata(dev, ds);
 
err = dsa_register_switch(ds, mdiodev->dev.of_node);
-- 
2.8.3

[PATCH v3 net-next v3 03/14] net: dsa: mv88e6xxx: use already declared variables

2016-06-17 Thread Vivien Didelot

In the MDIO probing function, dev is already assigned to >dev
and np is already assigned to mdiodev->dev.of_node, so use them.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 673283a..b3170ea 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3728,7 +3728,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (!ps->info)
return -ENODEV;
 
-   ps->reset = devm_gpiod_get(>dev, "reset", GPIOD_ASIS);
+   ps->reset = devm_gpiod_get(dev, "reset", GPIOD_ASIS);
if (IS_ERR(ps->reset)) {
err = PTR_ERR(ps->reset);
if (err == -ENOENT) {
@@ -3743,13 +3743,13 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
!of_property_read_u32(np, "eeprom-length", _len))
ps->eeprom_len = eeprom_len;
 
-   err = mv88e6xxx_mdio_register(ps, mdiodev->dev.of_node);
+   err = mv88e6xxx_mdio_register(ps, np);
if (err)
return err;
 
dev_set_drvdata(dev, ds);
 
-   err = dsa_register_switch(ds, mdiodev->dev.of_node);
+   err = dsa_register_switch(ds, np);
if (err) {
mv88e6xxx_mdio_unregister(ps);
return err;
-- 
2.8.3

[PATCH v3 net-next v3 08/14] net: dsa: mv88e6xxx: rename smi_mutex to reg_lock

2016-06-17 Thread Vivien Didelot

The chip smi_mutex mutex is used to protect the access to the internal
switch registers, not only the Multi-chip Addressing Mode, as commented.

Other registers access (like management frames) may use this mutex.

Since we will isolate SMI-specific pieces of code, avoid the confusion
now by renaming smi_mutex to reg_lock. No functional changes here.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 120 ++--
 drivers/net/dsa/mv88e6xxx.h |   7 +--
 2 files changed, 62 insertions(+), 65 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 75e5408..2c86172 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -29,10 +29,10 @@
 #include 
 #include "mv88e6xxx.h"
 
-static void assert_smi_lock(struct mv88e6xxx_priv_state *ps)
+static void assert_reg_lock(struct mv88e6xxx_priv_state *ps)
 {
-   if (unlikely(!mutex_is_locked(>smi_mutex))) {
-   dev_err(ps->dev, "SMI lock not held!\n");
+   if (unlikely(!mutex_is_locked(>reg_lock))) {
+   dev_err(ps->dev, "Switch registers lock not held!\n");
dump_stack();
}
 }
@@ -99,7 +99,7 @@ static int _mv88e6xxx_reg_read(struct mv88e6xxx_priv_state 
*ps,
 {
int ret;
 
-   assert_smi_lock(ps);
+   assert_reg_lock(ps);
 
ret = __mv88e6xxx_reg_read(ps->bus, ps->sw_addr, addr, reg);
if (ret < 0)
@@ -116,9 +116,9 @@ static int mv88e6xxx_reg_read(struct mv88e6xxx_priv_state 
*ps, int addr,
 {
int ret;
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
ret = _mv88e6xxx_reg_read(ps, addr, reg);
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 
return ret;
 }
@@ -158,7 +158,7 @@ static int __mv88e6xxx_reg_write(struct mii_bus *bus, int 
sw_addr, int addr,
 static int _mv88e6xxx_reg_write(struct mv88e6xxx_priv_state *ps, int addr,
int reg, u16 val)
 {
-   assert_smi_lock(ps);
+   assert_reg_lock(ps);
 
dev_dbg(ps->dev, "-> addr: 0x%.2x reg: 0x%.2x val: 0x%.4x\n",
addr, reg, val);
@@ -171,9 +171,9 @@ static int mv88e6xxx_reg_write(struct mv88e6xxx_priv_state 
*ps, int addr,
 {
int ret;
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
ret = _mv88e6xxx_reg_write(ps, addr, reg, val);
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 
return ret;
 }
@@ -320,7 +320,7 @@ static void mv88e6xxx_ppu_reenable_work(struct work_struct 
*ugly)
 
ps = container_of(ugly, struct mv88e6xxx_priv_state, ppu_work);
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
 
if (mutex_trylock(>ppu_mutex)) {
if (mv88e6xxx_ppu_enable(ps) == 0)
@@ -328,7 +328,7 @@ static void mv88e6xxx_ppu_reenable_work(struct work_struct 
*ugly)
mutex_unlock(>ppu_mutex);
}
 
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 }
 
 static void mv88e6xxx_ppu_reenable_timer(unsigned long _ps)
@@ -477,7 +477,7 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
if (!phy_is_pseudo_fixed_link(phydev))
return;
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
 
ret = _mv88e6xxx_reg_read(ps, REG_PORT(port), PORT_PCS_CTRL);
if (ret < 0)
@@ -528,7 +528,7 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
_mv88e6xxx_reg_write(ps, REG_PORT(port), PORT_PCS_CTRL, reg);
 
 out:
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 }
 
 static int _mv88e6xxx_stats_wait(struct mv88e6xxx_priv_state *ps)
@@ -753,11 +753,11 @@ static void mv88e6xxx_get_ethtool_stats(struct dsa_switch 
*ds, int port,
int ret;
int i, j;
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
 
ret = _mv88e6xxx_stats_snapshot(ps, port);
if (ret < 0) {
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
return;
}
for (i = 0, j = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) {
@@ -768,7 +768,7 @@ static void mv88e6xxx_get_ethtool_stats(struct dsa_switch 
*ds, int port,
}
}
 
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 }
 
 static int mv88e6xxx_get_regs_len(struct dsa_switch *ds, int port)
@@ -787,7 +787,7 @@ static void mv88e6xxx_get_regs(struct dsa_switch *ds, int 
port,
 
memset(p, 0xff, 32 * sizeof(u16));
 
-   mutex_lock(>smi_mutex);
+   mutex_lock(>reg_lock);
 
for (i = 0; i < 32; i++) {
int ret;
@@ -797,7 +797,7 @@ static void mv88e6xxx_get_regs(struct dsa_switch *ds, int 
port,
p[i] = ret;
}
 
-   mutex_unlock(>smi_mutex);
+   mutex_unlock(>reg_lock);
 }
 
 static int _mv88e6xxx_wait(struct mv88e6xxx_priv_state *ps, int reg, int

[PATCH v3 net-next v3 06/14] net: dsa: mv88e6xxx: use gpio get optional variant

2016-06-17 Thread Vivien Didelot

Use the optional variant to get the reset GPIO line, instead of checking
for the -ENOENT error.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ad7735d..ec28465 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3744,16 +3744,9 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (!ps->info)
return -ENODEV;
 
-   ps->reset = devm_gpiod_get(dev, "reset", GPIOD_ASIS);
-   if (IS_ERR(ps->reset)) {
-   err = PTR_ERR(ps->reset);
-   if (err == -ENOENT) {
-   /* Optional, so not an error */
-   ps->reset = NULL;
-   } else {
-   return err;
-   }
-   }
+   ps->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_ASIS);
+   if (IS_ERR(ps->reset))
+   return PTR_ERR(ps->reset);
 
if (mv88e6xxx_has(ps, MV88E6XXX_FLAG_EEPROM) &&
!of_property_read_u32(np, "eeprom-length", _len))
-- 
2.8.3

[PATCH v3 net-next v3 05/14] net: dsa: mv88e6xxx: add switch register helpers

2016-06-17 Thread Vivien Didelot

The mixed assignments, allocations and registrations in the probe code
make it hard to follow the logic and figure out what is DSA or chip
specific.

Extract the struct dsa_switch related code in a simple
mv88e6xxx_register_switch helper function.

For symmetry in the code, add a mv88e6xxx_unregister_switch function.

Signed-off-by: Vivien Didelot 
Reviewed-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 4b4bffc..ad7735d 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3690,30 +3690,48 @@ static struct dsa_switch_driver mv88e6xxx_switch_driver 
= {
.port_fdb_dump  = mv88e6xxx_port_fdb_dump,
 };
 
+static int mv88e6xxx_register_switch(struct mv88e6xxx_priv_state *ps,
+struct device_node *np)
+{
+   struct device *dev = ps->dev;
+   struct dsa_switch *ds;
+
+   ds = devm_kzalloc(dev, sizeof(*ds), GFP_KERNEL);
+   if (!ds)
+   return -ENOMEM;
+
+   ds->dev = dev;
+   ds->priv = ps;
+   ds->drv = _switch_driver;
+
+   dev_set_drvdata(dev, ds);
+
+   return dsa_register_switch(ds, np);
+}
+
+static void mv88e6xxx_unregister_switch(struct mv88e6xxx_priv_state *ps)
+{
+   dsa_unregister_switch(ps->ds);
+}
+
 static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 {
struct device *dev = >dev;
struct device_node *np = dev->of_node;
struct mv88e6xxx_priv_state *ps;
int id, prod_num, rev;
-   struct dsa_switch *ds;
u32 eeprom_len;
int err;
 
-   ds = devm_kzalloc(dev, sizeof(*ds) + sizeof(*ps), GFP_KERNEL);
-   if (!ds)
+   ps = devm_kzalloc(dev, sizeof(*ps), GFP_KERNEL);
+   if (!ps)
return -ENOMEM;
 
-   ps = (struct mv88e6xxx_priv_state *)(ds + 1);
-   ds->priv = ps;
-   ds->dev = dev;
ps->dev = dev;
ps->bus = mdiodev->bus;
ps->sw_addr = mdiodev->addr;
mutex_init(>smi_mutex);
 
-   ds->drv = _switch_driver;
-
id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
if (id < 0)
return id;
@@ -3745,9 +3763,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
return err;
 
-   dev_set_drvdata(dev, ds);
-
-   err = dsa_register_switch(ds, np);
+   err = mv88e6xxx_register_switch(ps, np);
if (err) {
mv88e6xxx_mdio_unregister(ps);
return err;
@@ -3764,8 +3780,7 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
struct dsa_switch *ds = dev_get_drvdata(>dev);
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 
-   dsa_unregister_switch(ds);
-
+   mv88e6xxx_unregister_switch(ps);
mv88e6xxx_mdio_unregister(ps);
 }
 
-- 
2.8.3

[PATCH v3 net-next v3 04/14] net: dsa: mv88e6xxx: do not increment bus refcount

2016-06-17 Thread Vivien Didelot

The MDIO device probe and remove functions are respectively incrementing
and decrementing the bus refcount themselves. Since these bus level
actions are out of the device scope, remove them.

Signed-off-by: Vivien Didelot 
Acked-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index b3170ea..4b4bffc 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3712,8 +3712,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
ps->sw_addr = mdiodev->addr;
mutex_init(>smi_mutex);
 
-   get_device(>bus->dev);
-
ds->drv = _switch_driver;
 
id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
@@ -3767,7 +3765,6 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 
dsa_unregister_switch(ds);
-   put_device(>bus->dev);
 
mv88e6xxx_mdio_unregister(ps);
 }
-- 
2.8.3

[PATCH v3 net-next v3 09/14] net: dsa: mv88e6xxx: add chip allocation helper

2016-06-17 Thread Vivien Didelot

Add an helper function to allocate the chip structure at the beginning
of the probe functions. It will be used to initialize the SMI access.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 2c86172..113092a 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3601,6 +3601,21 @@ static const struct mv88e6xxx_info 
*mv88e6xxx_lookup_info(unsigned int prod_num)
return NULL;
 }
 
+static struct mv88e6xxx_priv_state *mv88e6xxx_alloc_chip(struct device *dev)
+{
+   struct mv88e6xxx_priv_state *ps;
+
+   ps = devm_kzalloc(dev, sizeof(*ps), GFP_KERNEL);
+   if (!ps)
+   return NULL;
+
+   ps->dev = dev;
+
+   mutex_init(>reg_lock);
+
+   return ps;
+}
+
 static const char *mv88e6xxx_drv_probe(struct device *dsa_dev,
   struct device *host_dev, int sw_addr,
   void **priv)
@@ -3616,32 +3631,30 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
if (!bus)
return NULL;
 
+   ps = mv88e6xxx_alloc_chip(dsa_dev);
+   if (!ps)
+   return NULL;
+
id = __mv88e6xxx_reg_read(bus, sw_addr, REG_PORT(0), PORT_SWITCH_ID);
if (id < 0)
-   return NULL;
+   goto free;
 
prod_num = (id & 0xfff0) >> 4;
rev = id & 0x000f;
 
info = mv88e6xxx_lookup_info(prod_num);
if (!info)
-   return NULL;
+   goto free;
 
name = info->name;
 
-   ps = devm_kzalloc(dsa_dev, sizeof(*ps), GFP_KERNEL);
-   if (!ps)
-   return NULL;
-
ps->bus = bus;
ps->sw_addr = sw_addr;
ps->info = info;
-   ps->dev = dsa_dev;
-   mutex_init(>reg_lock);
 
err = mv88e6xxx_mdio_register(ps, NULL);
if (err)
-   return NULL;
+   goto free;
 
*priv = ps;
 
@@ -3649,6 +3662,10 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
 prod_num, name, rev);
 
return name;
+free:
+   devm_kfree(dsa_dev, ps);
+
+   return NULL;
 }
 
 static struct dsa_switch_driver mv88e6xxx_switch_driver = {
@@ -3720,14 +3737,12 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
u32 eeprom_len;
int err;
 
-   ps = devm_kzalloc(dev, sizeof(*ps), GFP_KERNEL);
+   ps = mv88e6xxx_alloc_chip(dev);
if (!ps)
return -ENOMEM;
 
-   ps->dev = dev;
ps->bus = mdiodev->bus;
ps->sw_addr = mdiodev->addr;
-   mutex_init(>reg_lock);
 
id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
if (id < 0)
-- 
2.8.3

[PATCH v3 net-next v3 07/14] net: dsa: mv88e6xxx: remove table args in info lookup

2016-06-17 Thread Vivien Didelot

The mv88e6xxx_table array and the mv88e6xxx_lookup_info function are
static, so remove the table and size arguments from the lookup function.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ec28465..75e5408 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3590,15 +3590,13 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
},
 };
 
-static const struct mv88e6xxx_info *
-mv88e6xxx_lookup_info(unsigned int prod_num, const struct mv88e6xxx_info 
*table,
- unsigned int num)
+static const struct mv88e6xxx_info *mv88e6xxx_lookup_info(unsigned int 
prod_num)
 {
int i;
 
-   for (i = 0; i < num; ++i)
-   if (table[i].prod_num == prod_num)
-   return [i];
+   for (i = 0; i < ARRAY_SIZE(mv88e6xxx_table); ++i)
+   if (mv88e6xxx_table[i].prod_num == prod_num)
+   return _table[i];
 
return NULL;
 }
@@ -3625,8 +3623,7 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
prod_num = (id & 0xfff0) >> 4;
rev = id & 0x000f;
 
-   info = mv88e6xxx_lookup_info(prod_num, mv88e6xxx_table,
-ARRAY_SIZE(mv88e6xxx_table));
+   info = mv88e6xxx_lookup_info(prod_num);
if (!info)
return NULL;
 
@@ -3739,8 +3736,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
prod_num = (id & 0xfff0) >> 4;
rev = id & 0x000f;
 
-   ps->info = mv88e6xxx_lookup_info(prod_num, mv88e6xxx_table,
-ARRAY_SIZE(mv88e6xxx_table));
+   ps->info = mv88e6xxx_lookup_info(prod_num);
if (!ps->info)
return -ENODEV;
 
-- 
2.8.3

[PATCH v3 net-next v3 13/14] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-17 Thread Vivien Didelot

When the SMI address of the switch chip is zero, the chip assumes to be
the only one on the SMI master bus and thus responds to all its known
SMI devices addresses (port registers, Global2, etc.)

When its SMI address is not zero, some chips (e.g. 88E6352) use an
indirect access through two SMI Command and Data registers.

Other models (e.g. 88E6060) using less than 16 internal SMI addresses
always use a direct access.

Add a capability flag to describe chips supporting the (indirect)
Multi-chip Addressing Mode, and a low-level API to access the registers
via SMI.

Other accesses (like Ethernet management frames) may be added later.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 135 ++--
 drivers/net/dsa/mv88e6xxx.h |  19 ++-
 2 files changed, 123 insertions(+), 31 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index cadd1e3..4d32f5a 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -38,15 +38,75 @@ static void assert_reg_lock(struct mv88e6xxx_priv_state *ps)
}
 }
 
-/* If the switch's ADDR[4:0] strap pins are strapped to zero, it will
- * use all 32 SMI bus addresses on its SMI bus, and all switch registers
- * will be directly accessible on some {device address,register address}
- * pair.  If the ADDR[4:0] pins are not strapped to zero, the switch
- * will only respond to SMI transactions to that specific address, and
- * an indirect addressing mechanism needs to be used to access its
- * registers.
+/* The switch ADDR[4:1] configuration pins define the chip SMI device address
+ * (ADDR[0] is always zero, thus only even SMI addresses can be strapped).
+ *
+ * When ADDR is all zero, the chip uses Single-chip Addressing Mode, assuming 
it
+ * is the only device connected to the SMI master. In this mode it responds to
+ * all 32 possible SMI addresses, and thus maps directly the internal devices.
+ *
+ * When ADDR is non-zero, the chip uses Multi-chip Addressing Mode, allowing
+ * multiple devices to share the SMI interface. In this mode it responds to 
only
+ * 2 registers, used to indirectly access the internal SMI devices.
  */
-static int mv88e6xxx_reg_wait_ready(struct mii_bus *bus, int sw_addr)
+
+struct mv88e6xxx_smi_ops {
+   int (*read)(struct mii_bus *bus, int sw_addr,
+   int addr, int reg, u16 *val);
+   int (*write)(struct mii_bus *bus, int sw_addr,
+int addr, int reg, u16 val);
+};
+
+static int mv88e6xxx_smi_read(struct mv88e6xxx_priv_state *ps,
+ int addr, int reg, u16 *val)
+{
+   if (!ps->smi_ops)
+   return -EOPNOTSUPP;
+
+   return ps->smi_ops->read(ps->bus, ps->sw_addr, addr, reg, val);
+}
+
+static int mv88e6xxx_smi_write(struct mv88e6xxx_priv_state *ps,
+  int addr, int reg, u16 val)
+{
+   if (!ps->smi_ops)
+   return -EOPNOTSUPP;
+
+   return ps->smi_ops->write(ps->bus, ps->sw_addr, addr, reg, val);
+}
+
+static int mv88e6xxx_smi_single_chip_read(struct mii_bus *bus, int sw_addr,
+ int addr, int reg, u16 *val)
+{
+   int ret;
+
+   ret = mdiobus_read_nested(bus, addr, reg);
+   if (ret < 0)
+   return ret;
+
+   *val = ret & 0x;
+
+   return 0;
+}
+
+static int mv88e6xxx_smi_single_chip_write(struct mii_bus *bus, int sw_addr,
+  int addr, int reg, u16 val)
+{
+   int ret;
+
+   ret = mdiobus_write_nested(bus, addr, reg, val);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
+
+static const struct mv88e6xxx_smi_ops mv88e6xxx_smi_single_chip_ops = {
+   .read = mv88e6xxx_smi_single_chip_read,
+   .write = mv88e6xxx_smi_single_chip_write,
+};
+
+static int mv88e6xxx_smi_multi_chip_wait(struct mii_bus *bus, int sw_addr)
 {
int ret;
int i;
@@ -63,16 +123,13 @@ static int mv88e6xxx_reg_wait_ready(struct mii_bus *bus, 
int sw_addr)
return -ETIMEDOUT;
 }
 
-static int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, int addr,
-   int reg)
+static int mv88e6xxx_smi_multi_chip_read(struct mii_bus *bus, int sw_addr,
+int addr, int reg, u16 *val)
 {
int ret;
 
-   if (sw_addr == 0)
-   return mdiobus_read_nested(bus, addr, reg);
-
/* Wait for the bus to become free. */
-   ret = mv88e6xxx_reg_wait_ready(bus, sw_addr);
+   ret = mv88e6xxx_smi_multi_chip_wait(bus, sw_addr);
if (ret < 0)
return ret;
 
@@ -83,7 +140,7 @@ static int __mv88e6xxx_reg_read(struct mii_bus *bus, int 
sw_addr, int addr,
return ret;
 
/* Wait for the read command to complete. */
-   ret = mv88e6xxx_reg_wait_ready(bus, sw_addr);
+   ret =

[PATCH v3 net-next v3 14/14] net: dsa: mv88e6xxx: add port base address to info

2016-06-17 Thread Vivien Didelot

The switch ID is located at address 0x3 of every Port Registers bank.

But not all Marvell switches have their Port Registers SMI Addresses
starting at 0x10. 88E6060 starts at 0x8 and 88E6390 starts at 0x0.

Add this data in the info structure and use it in the detection code.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 19 ++-
 drivers/net/dsa/mv88e6xxx.h |  1 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 4d32f5a..aeb96db 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3511,6 +3511,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6085",
.num_databases = 4096,
.num_ports = 10,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6097,
},
 
@@ -3520,6 +3521,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6095/88E6095F",
.num_databases = 256,
.num_ports = 11,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6095,
},
 
@@ -3529,6 +3531,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6123",
.num_databases = 4096,
.num_ports = 3,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3538,6 +3541,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6131",
.num_databases = 256,
.num_ports = 8,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6185,
},
 
@@ -3547,6 +3551,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6161",
.num_databases = 4096,
.num_ports = 6,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3556,6 +3561,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6165",
.num_databases = 4096,
.num_ports = 6,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
},
 
@@ -3565,6 +3571,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6171",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6351,
},
 
@@ -3574,6 +3581,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6172",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6352,
},
 
@@ -3583,6 +3591,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6175",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6351,
},
 
@@ -3592,6 +3601,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6176",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6352,
},
 
@@ -3601,6 +3611,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6185",
.num_databases = 256,
.num_ports = 10,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6185,
},
 
@@ -3610,6 +3621,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6240",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6352,
},
 
@@ -3619,6 +3631,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6320",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6320,
},
 
@@ -3628,6 +3641,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6321",
.num_databases = 4096,
.num_ports = 7,
+   .port_base_addr = 0x10,
.flags = MV88E6XXX_FLAGS_FAMILY_6320,
},
 
@@ -3637,6 +3651,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.name = "Marvell 88E6350",

[PATCH v3 net-next v3 11/14] net: dsa: mv88e6xxx: add detection helper

2016-06-17 Thread Vivien Didelot

Extract the common detection code which assigns the info structure to
the chip given the read switch ID.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 64 +
 1 file changed, 30 insertions(+), 34 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 4e24ac5..de92add 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -3601,6 +3601,30 @@ static const struct mv88e6xxx_info 
*mv88e6xxx_lookup_info(unsigned int prod_num)
return NULL;
 }
 
+static int mv88e6xxx_detect(struct mv88e6xxx_priv_state *ps)
+{
+   const struct mv88e6xxx_info *info;
+   int id, prod_num, rev;
+
+   id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
+   if (id < 0)
+   return id;
+
+   prod_num = (id & 0xfff0) >> 4;
+   rev = id & 0x000f;
+
+   info = mv88e6xxx_lookup_info(prod_num);
+   if (!info)
+   return -ENODEV;
+
+   ps->info = info;
+
+   dev_info(ps->dev, "switch 0x%x detected: %s, revision %u\n",
+ps->info->prod_num, ps->info->name, rev);
+
+   return 0;
+}
+
 static struct mv88e6xxx_priv_state *mv88e6xxx_alloc_chip(struct device *dev)
 {
struct mv88e6xxx_priv_state *ps;
@@ -3633,11 +3657,8 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
   struct device *host_dev, int sw_addr,
   void **priv)
 {
-   const struct mv88e6xxx_info *info;
struct mv88e6xxx_priv_state *ps;
struct mii_bus *bus;
-   const char *name;
-   int id, prod_num, rev;
int err;
 
bus = dsa_host_dev_to_mii_bus(host_dev);
@@ -3652,31 +3673,17 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
if (err)
goto free;
 
-   id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
-   if (id < 0)
-   goto free;
-
-   prod_num = (id & 0xfff0) >> 4;
-   rev = id & 0x000f;
-
-   info = mv88e6xxx_lookup_info(prod_num);
-   if (!info)
+   err = mv88e6xxx_detect(ps);
+   if (err)
goto free;
 
-   name = info->name;
-
-   ps->info = info;
-
err = mv88e6xxx_mdio_register(ps, NULL);
if (err)
goto free;
 
*priv = ps;
 
-   dev_info(>bus->dev, "switch 0x%x probed: %s, revision %u\n",
-prod_num, name, rev);
-
-   return name;
+   return ps->info->name;
 free:
devm_kfree(dsa_dev, ps);
 
@@ -3748,7 +3755,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
struct device *dev = >dev;
struct device_node *np = dev->of_node;
struct mv88e6xxx_priv_state *ps;
-   int id, prod_num, rev;
u32 eeprom_len;
int err;
 
@@ -3760,16 +3766,9 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
return err;
 
-   id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
-   if (id < 0)
-   return id;
-
-   prod_num = (id & 0xfff0) >> 4;
-   rev = id & 0x000f;
-
-   ps->info = mv88e6xxx_lookup_info(prod_num);
-   if (!ps->info)
-   return -ENODEV;
+   err = mv88e6xxx_detect(ps);
+   if (err)
+   return err;
 
ps->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_ASIS);
if (IS_ERR(ps->reset))
@@ -3789,9 +3788,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
return err;
}
 
-   dev_info(dev, "switch 0x%x probed: %s, revision %u\n",
-prod_num, ps->info->name, rev);
-
return 0;
 }
 
-- 
2.8.3

[PATCH v3 net-next v3 12/14] net: dsa: mv88e6xxx: pass compatible info

2016-06-17 Thread Vivien Didelot

After allocating the chip structure, pass it a compatible info pointer.

The compatible info structure will be used later to describe how to
access the switch registers and where to read the switch ID.

For the standard MDIO probe, get it from the device node data. For the
legacy DSA driver probing, pass it the 88E6085 info.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index de92add..cadd1e3 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3617,6 +3618,7 @@ static int mv88e6xxx_detect(struct mv88e6xxx_priv_state 
*ps)
if (!info)
return -ENODEV;
 
+   /* Update the compatible info with the probed one */
ps->info = info;
 
dev_info(ps->dev, "switch 0x%x detected: %s, revision %u\n",
@@ -3669,6 +3671,9 @@ static const char *mv88e6xxx_drv_probe(struct device 
*dsa_dev,
if (!ps)
return NULL;
 
+   /* Legacy SMI probing will only support chips similar to 88E6085 */
+   ps->info = _table[MV88E6085];
+
err = mv88e6xxx_smi_init(ps, bus, sw_addr);
if (err)
goto free;
@@ -3754,14 +3759,21 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 {
struct device *dev = >dev;
struct device_node *np = dev->of_node;
+   const struct mv88e6xxx_info *compat_info;
struct mv88e6xxx_priv_state *ps;
u32 eeprom_len;
int err;
 
+   compat_info = of_device_get_match_data(dev);
+   if (!compat_info)
+   return -EINVAL;
+
ps = mv88e6xxx_alloc_chip(dev);
if (!ps)
return -ENOMEM;
 
+   ps->info = compat_info;
+
err = mv88e6xxx_smi_init(ps, mdiodev->bus, mdiodev->addr);
if (err)
return err;
@@ -3801,7 +3813,10 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
 }
 
 static const struct of_device_id mv88e6xxx_of_match[] = {
-   { .compatible = "marvell,mv88e6085" },
+   {
+   .compatible = "marvell,mv88e6085",
+   .data = _table[MV88E6085],
+   },
{ /* sentinel */ },
 };
 
-- 
2.8.3

[PATCH v3 net-next v3 00/14] net: dsa: mv88e6xxx: probe compatible

2016-06-17 Thread Vivien Didelot

This patchset factorizes the legacy DSA probing and the new MDIO probing
functions with a few helpers.

This will allow us to use a compatible chip info to describe how to access the
SMI device and its switch ID register at probe time.

For the legacy probe, we fix the compatible info to 88E6085. For the MDIO
probe, we will use the compatible info from the device node data.

The first 5 patches are already reviewed.
The next 3 patches are cosmetics, with no functional changes.
The next 4 patches add probe helpers and pass the compatible info.
The last 2 patches extend the info structure to store the SMI access mode and
the location of the switch ID register.

Changes since v2 [2]:

  - do not guess compatible model in legacy probe
  - add low level SMI API using a chip structure
  - allocate before probe and detection
  - add 3 cosmetic patches

Changes since v1 [1]:

  - merge style fix from Ben Dooks
  - add Acked-by/Reviewed-by tags
  - drop one compatible string per model
  - detect the SMI device based on the compatible info
  - add an SMI ops structure

[1] https://lkml.org/lkml/2016/6/8/1201
[2] https://lkml.org/lkml/2016/6/14/671

Vivien Didelot (14):
  net: dsa: mv88e6xxx: fix style issues
  net: dsa: mv88e6xxx: remove redundant assignments
  net: dsa: mv88e6xxx: use already declared variables
  net: dsa: mv88e6xxx: do not increment bus refcount
  net: dsa: mv88e6xxx: add switch register helpers
  net: dsa: mv88e6xxx: use gpio get optional variant
  net: dsa: mv88e6xxx: remove table args in info lookup
  net: dsa: mv88e6xxx: rename smi_mutex to reg_lock
  net: dsa: mv88e6xxx: add chip allocation helper
  net: dsa: mv88e6xxx: add SMI init helper
  net: dsa: mv88e6xxx: add detection helper
  net: dsa: mv88e6xxx: pass compatible info
  net: dsa: mv88e6xxx: add addressing mode to info
  net: dsa: mv88e6xxx: add port base address to info

 drivers/net/dsa/mv88e6xxx.c | 508 
 drivers/net/dsa/mv88e6xxx.h |  27 ++-
 2 files changed, 343 insertions(+), 192 deletions(-)

-- 
2.8.3

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Phil Sutter

Hi,

[Replying to multiple mails at once due to laziness.]

On Fri, Jun 17, 2016 at 06:09:20PM +0200, Daniel Borkmann wrote:
> Hmm, seems like a lot of stuff ...

It is. At some point I thought about maybe hack something in cocci
instead, but that would probably have taken longer given the code
diversity. :/

I know this is crap to review, but splitting it up into a 100 patches
doesn't make much sense, either.

> Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc 
> (< 4.6)") ...
> 
> Your changes effectively revert them again. Here, and some other parts of the 
> bpf frontend
> code bits.

Oh, good catch! Thanks a lot for pointing this out. It definitely needs
to be sorted prior to applying my mess.

On Fri, Jun 17, 2016 at 11:57:53AM -0700, Stephen Hemminger wrote:
> It makes sense that if you can build a kernel with old toolchain, that
> iproute2 needs to be buildable as well.
> 
> The current kernels are documented to require 3.2 or later.

So in your opinion we should stay compatible to gcc-3.2? Clarifying
requirements like this one would make sense in order to know what to
check against.

On Fri, Jun 17, 2016 at 02:47:51PM -0600, David Ahern wrote:
> On 6/17/16 2:36 PM, Daniel Borkmann wrote:
[...]
> > I just pointed to the fact that this would basically undo their changes
> > that they've submitted some time ago to the BPF frontend, reintroducing
> > the issue for them. Unfortunately, the anonymous struct cannot be named
> > due to uapi reasons. It should have been named from the very beginning,
> > but unfortunately too late now. So I would suggest to just leave those
> > affected parts as is.
> 
> I was referring to Phil's patch. All of the struct {} req; 
> initializations should be fine if you name the structs, but then need to 
> run it through whatever compiler version the 6wind folks care about to 
> verify that is true.

I'm not so sure about that. What I did regarding the anonymous struct
req is not new in iproute2 code base: There is the GENL_REQUEST macro
which expands to an identical construct and it's there since end of
2012.

Commit 8f80d450c3cb changes only the initializers of union bpf_attr, so
maybe the problem is limited to anonymous structs in unions? Anyway, I
guess defining which minimum gcc version to depend on and testing
against it is the only real solution here.

Thanks, Phil

Re: [PATCH net-next 12/18] IB/mlx5: Add kernel offload flow-tag

2016-06-17 Thread Eric Dumazet

On Fri, Jun 17, 2016 at 3:31 PM, Saeed Mahameed
 wrote:
> On Fri, Jun 17, 2016 at 7:00 PM, Alexei Starovoitov
>  wrote:
>> On Fri, Jun 17, 2016 at 05:43:53PM +0300, Saeed Mahameed wrote:
>>> From: Maor Gottlieb 
>>>
>>> Add kernel offload flow tag for packets that will bypass the kernel
>>> stack, e.g (RoCE/RDMA/RAW ETH (DPDK), etc ..).
>>
>> so the whole series is an elaborate way to enable dpdk? how nice.
>
> NO, God forbids! the whole series has nothing to do with dpdk!
> Please read the cover letter.
>
> Quoting my own words from the cover letter:
> "This patch set introduces mlx5 RoCE/RDMA packet sniffer, it allows
> mlx5e netdevice to receive RoCE/RDMA or RAW ETH traffic which isn't
> supposed to be passed to the kernel stack, for sniffing and diagnostics
> purposes."
>
> We simply want to selectively be able to see RoCE/RDMA ETH standard
> traffic in tcpdump, for diagnostic purposes.
> so in order to not overwhelm the kernel TCP/IP stack with this
> traffic, this patch in particular
> configures ConnectX4 hardware to tag those packets, so in downstream
> patches mlx5 netdevice will mark the SKBs of those packets
> to skip the TCP/IP stack and go only to tcpdump.
>
> DPDK is not enabled/disabled or even slightly affected in this series.
> It was just given as an example in this patch commit message for
> traffic that can be sniffed in standard tools such as tcpdump.
>
> Today there are some bad usages and abuse to skb->protocol where some
> device drivers set skb->protocol = 0xff to skip the kernel TCP/IP
> processing for the same diagnostic purposes.
> In this series we are just trying to do the right thing.

Well, your patch adds an extra test in kernel fast path, just to ease
the life of people using kernel bypass,
but willing to use tcpdump because they can not figure to do this in
user space properly.

Please find a way  _not_ adding a single instruction in kernel fast path.

Thanks.

Re: [PATCH net-next 13/18] net: Add offload kernel net stack packet type

2016-06-17 Thread Saeed Mahameed

On Fri, Jun 17, 2016 at 6:15 PM, Daniel Borkmann  wrote:
> On 06/17/2016 04:43 PM, Saeed Mahameed wrote:
>>
>> From: Maor Gottlieb 
>>
>> Add new packet type to skip kernel specific protocol handlers.
>>
>> This is needed so device drivers can pass packets up to user space
>> (af_packet/tcpdump, etc..) without the need for them to go through
>> the whole kernel data path.
>>
>> Signed-off-by: Maor Gottlieb 
>> CC: David S. Miller 
>> CC: Patrick McHardy 
>> CC: Eric Dumazet 
>> Signed-off-by: Saeed Mahameed 
>> ---
>>   include/linux/skbuff.h | 6 +++---
>>   include/uapi/linux/if_packet.h | 1 +
>>   net/core/dev.c | 4 
>>   3 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index dc0fca7..359724e 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -694,14 +694,14 @@ struct sk_buff {
>>
>>   /* if you move pkt_type around you also must adapt those constants */
>>   #ifdef __BIG_ENDIAN_BITFIELD
>> -#define PKT_TYPE_MAX   (7 << 5)
>> +#define PKT_TYPE_MAX   (8 << 5)
>>   #else
>> -#define PKT_TYPE_MAX   7
>> +#define PKT_TYPE_MAX   8
>>   #endif
>
>
> Aehm ... did you actually test this with BPF ?!
>
> PKT_TYPE_MAX is a mask (naming could be better no doubt), see also function
> convert_skb_access():
>
> [...]
> case SKF_AD_PKTTYPE:
> *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg,
> PKT_TYPE_OFFSET());
> *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, PKT_TYPE_MAX);
> #ifdef __BIG_ENDIAN_BITFIELD
> *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 5);
> #endif
> break;
> [...]
>
> Also, dunno if it's worth burning a skb bit for one driver.
>

oops! it didn't occur to me that it was used as a mask!
does this mean we are out of PKT_TYPE flags ?
Maybe we can use skb->mark instead ? any ideas ?

Also i am not sure it is a one driver thing, i know about other
examples where some device drivers set skb->protocol to 0x to
achieve the same ! which i didn't like.

Re: [PATCH net-next 1/8] net: Change SKB_GSO_DODGY to be a tx_flag

2016-06-17 Thread Tom Herbert

On Thu, Jun 16, 2016 at 11:58 AM, Alexander Duyck
 wrote:
> On Thu, Jun 16, 2016 at 10:51 AM, Tom Herbert  wrote:
>> This replaces gso_type SKB_GSO_DODGY with a new tx_flag named
>> SKBTX_UNTRUSTED_SOURCE. This more generically desrcibes the skb
>> being created from a untrusted source as a characteristic of and skbuff.
>> This also frees up one gso_type flag bit.
>>
>> Signed-off-by: Tom Herbert 
>
> Instead of leaving this bit in the shared_info why not look at moving
> it into the sk_buff itself?  It seems like this might be a better
> candidate for something like that as a large part of what the dodgy
> bit represents is that the header offsets are likely not set correctly
> and need to be parsed out and updated.  It might make more sense to
> place this in the slot just after remcsum_offload.  That way once all
> the header offsets have been updated you could just update this one
> bit to indicate that the header offsets stored in this sk_buff are
> valid.
>
I don't really understand what the point of SKB_GSO_DODGY is. Seems
like we should be verifying the correct values are set up front in the
vnet, not relying on the core stack to have to worry about this narrow
use case. Fields in the skbuff should be set correctly all the time in
the stack as an invariant I think, if there not correct it is the
fault of the code setting the fields.

> I also don't see where these changes address any changes needed to
> skb_gso_ok in order to actually trigger the partial walk though the
> GSO code.  You probably need to look at adding a statement there to do
> a check for your untrusted source bit versus the GSO_ROBUST feature.
> It probably doesn't need to be much, just something like tacking on a
> "&& (!skb_is_untrustued(skb) || (features & NETIF_F_GSO_ROBUST)" to
> the conditional statement.
>
> - Alex

Re: [PATCH net-next 12/18] IB/mlx5: Add kernel offload flow-tag

2016-06-17 Thread Saeed Mahameed

On Fri, Jun 17, 2016 at 7:00 PM, Alexei Starovoitov
 wrote:
> On Fri, Jun 17, 2016 at 05:43:53PM +0300, Saeed Mahameed wrote:
>> From: Maor Gottlieb 
>>
>> Add kernel offload flow tag for packets that will bypass the kernel
>> stack, e.g (RoCE/RDMA/RAW ETH (DPDK), etc ..).
>
> so the whole series is an elaborate way to enable dpdk? how nice.

NO, God forbids! the whole series has nothing to do with dpdk!
Please read the cover letter.

Quoting my own words from the cover letter:
"This patch set introduces mlx5 RoCE/RDMA packet sniffer, it allows
mlx5e netdevice to receive RoCE/RDMA or RAW ETH traffic which isn't
supposed to be passed to the kernel stack, for sniffing and diagnostics
purposes."

We simply want to selectively be able to see RoCE/RDMA ETH standard
traffic in tcpdump, for diagnostic purposes.
so in order to not overwhelm the kernel TCP/IP stack with this
traffic, this patch in particular
configures ConnectX4 hardware to tag those packets, so in downstream
patches mlx5 netdevice will mark the SKBs of those packets
to skip the TCP/IP stack and go only to tcpdump.

DPDK is not enabled/disabled or even slightly affected in this series.
It was just given as an example in this patch commit message for
traffic that can be sniffed in standard tools such as tcpdump.

Today there are some bad usages and abuse to skb->protocol where some
device drivers set skb->protocol = 0xff to skip the kernel TCP/IP
processing for the same diagnostic purposes.
In this series we are just trying to do the right thing.

Re: act_mirred: remove spinlock in fast path

2016-06-17 Thread Eric Dumazet

On Fri, Jun 17, 2016 at 2:59 PM, Cong Wang  wrote:

> Generally speaking I worry about we change multiple fields in a struct
> meanwhile we could still read them any time in the middle, we may
> get them correct for some easy case, but it is hard to insure the
> correctness when the struct becomes large.
>
> I am thinking to make more tc actions lockless, so this problem
> comes up immediately for other complex cases than mirred.

I certainly wont object to a patch.

Also note that instead of RCU with a pointer and the usual kfree_rcu() stuff,
we now can use seqcount_latch infra which might allow to not increase
memory foot print.

Re: act_mirred: remove spinlock in fast path

2016-06-17 Thread Cong Wang

On Fri, Jun 17, 2016 at 2:40 PM, Eric Dumazet  wrote:
> On Fri, Jun 17, 2016 at 2:35 PM, Cong Wang  wrote:
>> On Fri, Jun 17, 2016 at 2:24 PM, Eric Dumazet  wrote:
>>> Well, I added a READ_ONCE() to read tcf_action once.
>>>
>>> Adding rcu here would mean adding a pointer and extra cache line, to
>>> deref the values.
>>>
>>> IMHO the race here has no effect . You either read the old or new value.
>>
>> Sure, the point is we may read a new ->tcf_action and an old ->tcfm_eaction,
>> this is what I am worrying.
>>
>> If that is not a good example, what about new ->tcf_action and 
>> ->tcfm_eaction,
>> with an old ->tcfm_ifindex?
>>
>>>
>>> If the packet is processed before or after the 'change' it would have
>>> the same 'race'
>>>
>>
>> Why? As long as the change is like a transaction, we are safe.
>>
>>> All these fields are integers, they never are 'partially written'.
>>>
>>> The only case m->tcfm_eaction could be read twice is in the error
>>> path. Who cares ?
>>
>> This is not what I worry about. I guess you miss read eaction with action.
>
> No I did not. I am referring to the fact that we currently might read
> m->tcfm_eaction multiple times.
>
> Please explain what would be wrong reading a wrong pair of values ?
>
> One packet might come to a wrong device in the unlikely case an admin
> change all the fields during an update ?

Yes, that is what in my mind, since I only did code review, not actually
saw any real problem (mostly because here we don't use standalone actions).

>
> Is it going to crash or reveal highly sensitive security data ?
>
> If yes, then please send a patch. I considered all this when writing
> my patch and maybe I was wrong.

I don't know.

Generally speaking I worry about we change multiple fields in a struct
meanwhile we could still read them any time in the middle, we may
get them correct for some easy case, but it is hard to insure the
correctness when the struct becomes large.

I am thinking to make more tc actions lockless, so this problem
comes up immediately for other complex cases than mirred.

Re: [PATCH net-next v6] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In

2016-06-17 Thread Martin KaFai Lau

On Thu, Jun 16, 2016 at 09:30:28PM -0700, Eric Dumazet wrote:
> On Mon, 2016-03-14 at 10:52 -0700, Martin KaFai Lau wrote:
> > Per RFC4898, they count segments sent/received
> > containing a positive length data segment (that includes
> > retransmission segments carrying data).  Unlike
> > tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
> > carrying no data (e.g. pure ack).
> >
> > The patch also updates the segs_in in tcp_fastopen_add_skb()
> > so that segs_in >= data_segs_in property is kept.
> >
> > Together with retransmission data, tcpi_data_segs_out
> > gives a better signal on the rxmit rate.
> >
> > v6: Rebase on the latest net-next
> >
> > v5: Eric pointed out that checking skb->len is still needed in
> > tcp_fastopen_add_skb() because skb can carry a FIN without data.
> > Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
> > helper is used.  Comment is added to the fastopen case to explain why
> > segs_in has to be reset and tcp_segs_in() has to be called before
> > __skb_pull().
> >
> > v4: Add comment to the changes in tcp_fastopen_add_skb()
> > and also add remark on this case in the commit message.
> >
> > v3: Add const modifier to the skb parameter in tcp_segs_in()
> >
> > v2: Rework based on recent fix by Eric:
> > commit a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment")
> >
> > Signed-off-by: Martin KaFai Lau 
> > Cc: Chris Rapier 
> > Cc: Eric Dumazet 
> > Cc: Marcelo Ricardo Leitner 
> > Cc: Neal Cardwell 
> > Cc: Yuchung Cheng 
> > Acked-by: Eric Dumazet 
> > ---
>
> Hi Martin
>
> Have you sent the iproute2 ss corresponding patch ?
Sorry, I did not.  I will work on it shortly.

Thanks for bringing it up!

Re: act_mirred: remove spinlock in fast path

2016-06-17 Thread Eric Dumazet

On Fri, Jun 17, 2016 at 2:35 PM, Cong Wang  wrote:
> On Fri, Jun 17, 2016 at 2:24 PM, Eric Dumazet  wrote:
>> Well, I added a READ_ONCE() to read tcf_action once.
>>
>> Adding rcu here would mean adding a pointer and extra cache line, to
>> deref the values.
>>
>> IMHO the race here has no effect . You either read the old or new value.
>
> Sure, the point is we may read a new ->tcf_action and an old ->tcfm_eaction,
> this is what I am worrying.
>
> If that is not a good example, what about new ->tcf_action and ->tcfm_eaction,
> with an old ->tcfm_ifindex?
>
>>
>> If the packet is processed before or after the 'change' it would have
>> the same 'race'
>>
>
> Why? As long as the change is like a transaction, we are safe.
>
>> All these fields are integers, they never are 'partially written'.
>>
>> The only case m->tcfm_eaction could be read twice is in the error
>> path. Who cares ?
>
> This is not what I worry about. I guess you miss read eaction with action.

No I did not. I am referring to the fact that we currently might read
m->tcfm_eaction multiple times.

Please explain what would be wrong reading a wrong pair of values ?

One packet might come to a wrong device in the unlikely case an admin
change all the fields during an update ?

Is it going to crash or reveal highly sensitive security data ?

If yes, then please send a patch. I considered all this when writing
my patch and maybe I was wrong.

Re: act_mirred: remove spinlock in fast path

2016-06-17 Thread Cong Wang

On Fri, Jun 17, 2016 at 2:24 PM, Eric Dumazet  wrote:
> Well, I added a READ_ONCE() to read tcf_action once.
>
> Adding rcu here would mean adding a pointer and extra cache line, to
> deref the values.
>
> IMHO the race here has no effect . You either read the old or new value.

Sure, the point is we may read a new ->tcf_action and an old ->tcfm_eaction,
this is what I am worrying.

If that is not a good example, what about new ->tcf_action and ->tcfm_eaction,
with an old ->tcfm_ifindex?

>
> If the packet is processed before or after the 'change' it would have
> the same 'race'
>

Why? As long as the change is like a transaction, we are safe.

> All these fields are integers, they never are 'partially written'.
>
> The only case m->tcfm_eaction could be read twice is in the error
> path. Who cares ?

This is not what I worry about. I guess you miss read eaction with action.

[PATCH 1/2] net: ethernet: et131x: use phydev from struct net_device

2016-06-17 Thread Philippe Reynes

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/agere/et131x.c |   48 +-
 1 files changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/agere/et131x.c 
b/drivers/net/ethernet/agere/et131x.c
index 30defe6..d194992 100644
--- a/drivers/net/ethernet/agere/et131x.c
+++ b/drivers/net/ethernet/agere/et131x.c
@@ -440,7 +440,6 @@ struct et131x_adapter {
struct net_device *netdev;
struct pci_dev *pdev;
struct mii_bus *mii_bus;
-   struct phy_device *phydev;
struct napi_struct napi;
 
/* Flags that indicate current state of the adapter */
@@ -864,7 +863,7 @@ static void et1310_config_mac_regs2(struct et131x_adapter 
*adapter)
 {
int32_t delay = 0;
struct mac_regs __iomem *mac = >regs->mac;
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
u32 cfg1;
u32 cfg2;
u32 ifctrl;
@@ -1035,7 +1034,7 @@ static void et1310_setup_device_for_unicast(struct 
et131x_adapter *adapter)
 static void et1310_config_rxmac_regs(struct et131x_adapter *adapter)
 {
struct rxmac_regs __iomem *rxmac = >regs->rxmac;
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
u32 sa_lo;
u32 sa_hi = 0;
u32 pf_ctrl = 0;
@@ -1230,7 +1229,7 @@ out:
 
 static int et131x_mii_read(struct et131x_adapter *adapter, u8 reg, u16 *value)
 {
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
 
if (!phydev)
return -EIO;
@@ -1311,7 +1310,7 @@ static void et1310_phy_read_mii_bit(struct et131x_adapter 
*adapter,
 
 static void et1310_config_flow_control(struct et131x_adapter *adapter)
 {
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
 
if (phydev->duplex == DUPLEX_HALF) {
adapter->flow = FLOW_NONE;
@@ -1456,7 +1455,7 @@ static int et131x_mdio_write(struct mii_bus *bus, int 
phy_addr,
 static void et1310_phy_power_switch(struct et131x_adapter *adapter, bool down)
 {
u16 data;
-   struct  phy_device *phydev = adapter->phydev;
+   struct  phy_device *phydev = adapter->netdev->phydev;
 
et131x_mii_read(adapter, MII_BMCR, );
data &= ~BMCR_PDOWN;
@@ -1469,7 +1468,7 @@ static void et1310_phy_power_switch(struct et131x_adapter 
*adapter, bool down)
 static void et131x_xcvr_init(struct et131x_adapter *adapter)
 {
u16 lcr2;
-   struct  phy_device *phydev = adapter->phydev;
+   struct  phy_device *phydev = adapter->netdev->phydev;
 
/* Set the LED behavior such that LED 1 indicates speed (off =
 * 10Mbits, blink = 100Mbits, on = 1000Mbits) and LED 2 indicates
@@ -2111,7 +2110,7 @@ static int et131x_init_recv(struct et131x_adapter 
*adapter)
 /* et131x_set_rx_dma_timer - Set the heartbeat timer according to line rate */
 static void et131x_set_rx_dma_timer(struct et131x_adapter *adapter)
 {
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
 
/* For version B silicon, we do not use the RxDMA timer for 10 and 100
 * Mbits/s line rates. We do not enable and RxDMA interrupt coalescing.
@@ -2426,7 +2425,7 @@ static int nic_send_packet(struct et131x_adapter 
*adapter, struct tcb *tcb)
struct sk_buff *skb = tcb->skb;
u32 nr_frags = skb_shinfo(skb)->nr_frags + 1;
struct skb_frag_struct *frags = _shinfo(skb)->frags[0];
-   struct phy_device *phydev = adapter->phydev;
+   struct phy_device *phydev = adapter->netdev->phydev;
dma_addr_t dma_addr;
struct tx_ring *tx_ring = >tx_ring;
 
@@ -2794,17 +2793,13 @@ static void et131x_handle_send_pkts(struct 
et131x_adapter *adapter)
 static int et131x_get_settings(struct net_device *netdev,
   struct ethtool_cmd *cmd)
 {
-   struct et131x_adapter *adapter = netdev_priv(netdev);
-
-   return phy_ethtool_gset(adapter->phydev, cmd);
+   return phy_ethtool_gset(netdev->phydev, cmd);
 }
 
 static int et131x_set_settings(struct net_device *netdev,
   struct ethtool_cmd *cmd)
 {
-   struct et131x_adapter *adapter = netdev_priv(netdev);
-
-   return phy_ethtool_sset(adapter->phydev, cmd);
+   return phy_ethtool_sset(netdev->phydev, cmd);
 }
 
 static int et131x_get_regs_len(struct net_device *netdev)
@@ -3098,7 +3093,7 @@ err_out:
 static void et131x_error_timer_handler(unsigned long data)
 {
struct et131x_adapter *adapter = (struct et131x_adapter

[PATCH 2/2] net: ethernet: et131x: use phy_ethtool_{get|set}_link_ksettings

2016-06-17 Thread Philippe Reynes

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/agere/et131x.c |   16 ++--
 1 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/agere/et131x.c 
b/drivers/net/ethernet/agere/et131x.c
index d194992..cd7e2e5 100644
--- a/drivers/net/ethernet/agere/et131x.c
+++ b/drivers/net/ethernet/agere/et131x.c
@@ -2790,18 +2790,6 @@ static void et131x_handle_send_pkts(struct 
et131x_adapter *adapter)
spin_unlock_irqrestore(>tcb_send_qlock, flags);
 }
 
-static int et131x_get_settings(struct net_device *netdev,
-  struct ethtool_cmd *cmd)
-{
-   return phy_ethtool_gset(netdev->phydev, cmd);
-}
-
-static int et131x_set_settings(struct net_device *netdev,
-  struct ethtool_cmd *cmd)
-{
-   return phy_ethtool_sset(netdev->phydev, cmd);
-}
-
 static int et131x_get_regs_len(struct net_device *netdev)
 {
 #define ET131X_REGS_LEN 256
@@ -2974,12 +2962,12 @@ static void et131x_get_drvinfo(struct net_device 
*netdev,
 }
 
 static struct ethtool_ops et131x_ethtool_ops = {
-   .get_settings   = et131x_get_settings,
-   .set_settings   = et131x_set_settings,
.get_drvinfo= et131x_get_drvinfo,
.get_regs_len   = et131x_get_regs_len,
.get_regs   = et131x_get_regs,
.get_link   = ethtool_op_get_link,
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
 /* et131x_hwaddr_init - set up the MAC Address */
-- 
1.7.4.4

Re: act_mirred: remove spinlock in fast path

2016-06-17 Thread Eric Dumazet

On Fri, Jun 17, 2016 at 2:03 PM, Cong Wang  wrote:
> Hi, Eric
>
> During code review, I notice we might have some problem after we go
> lockless for the fast path in act_mirred.
>
> That is, what prevents us from the following possible race condition?
>
> change a standalone action with tcf_mirred_init():
>   // search for an existing action in hash
>   // found it and got struct tcf_common
>   m = to_mirred(a);
>   m->tcf_action = parm->action;
>   // Interrupted by BH
>
> tcf_mirred() jumps in:
>   rcu_read_lock()
>   retval = READ_ONCE(m->tcf_action);
>   if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
>   
>   rcu_unread_lock()
>
> now go back to tcf_mirred_init():
>   m->tcfm_eaction = parm->eaction;
>   
>
> IOW, the fast path could read a partially written change which could
> be a problem? We need to allocate a new copy and then replace the old
> one with it via RCU, don't we?
>
> I can work on some patches, I want to make sure I don't miss anything here.
>
> Thanks!

Well, I added a READ_ONCE() to read tcf_action once.

Adding rcu here would mean adding a pointer and extra cache line, to
deref the values.

IMHO the race here has no effect . You either read the old or new value.

If the packet is processed before or after the 'change' it would have
the same 'race'

All these fields are integers, they never are 'partially written'.

The only case m->tcfm_eaction could be read twice is in the error
path. Who cares ?

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-17 Thread Francois Romieu

Netanel Belgazal  :
[...]

More stuff below.

> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> new file mode 100644
> index 000..357e10f
> --- /dev/null
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[...]
> +static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
> +  struct ena_admin_aq_get_stats_cmd *get_cmd,
> +  struct ena_admin_acq_get_stats_resp *get_resp,

At first sight it should be possible avoid one pointer argument with:

struct ena_something {
struct ena_admin_aq_get_stats_cmd cmd;
struct ena_admin_acq_get_stats_resp resp;

};

[...]
> +int ena_com_get_dev_extended_stats(struct ena_com_dev *ena_dev, char *buff,
> +u32 len)

Where is it used ?

> +{
> + int ret = 0;
> + struct ena_admin_aq_get_stats_cmd get_cmd;
> + struct ena_admin_acq_get_stats_resp get_resp;
> + void *virt_addr;

int rc = -ENOMEM;
char *str;

> + dma_addr_t phys_addr;
> +
> + virt_addr = dma_alloc_coherent(ena_dev->dmadev,
> +len,
> +_addr,
> +GFP_KERNEL | __GFP_ZERO);
> + if (!virt_addr) {
> + ret = -ENOMEM;
> + goto done;
> + }
> + memset(_cmd, 0x0, sizeof(get_cmd));
> + ret = ena_com_mem_addr_set(ena_dev,
> +_cmd.u.control_buffer.address,
> +phys_addr);
> + if (unlikely(ret)) {
> + ena_trc_err("memory address set failed\n");
> + return ret;
> + }
> + get_cmd.u.control_buffer.length = len;
> +
> + get_cmd.device_id = ena_dev->stats_func;
> + get_cmd.queue_idx = ena_dev->stats_queue;
> +
> + ret = ena_get_dev_stats(ena_dev, _cmd, _resp,
> + ENA_ADMIN_GET_STATS_TYPE_EXTENDED);
> + if (ret < 0)
> + goto free_ext_stats_mem;
> +
> + ret = snprintf(buff, len, "%s", (char *)virt_addr);

So ENA_ADMIN_GET_STATS_TYPE_EXTENDED fills a buffer with bytes in the
[0; 127] range, right ?

> +
> +free_ext_stats_mem:
> + dma_free_coherent(ena_dev->dmadev, len, virt_addr, phys_addr);
> +done:
> + return ret;
> +}
> +
> +int ena_com_set_dev_mtu(struct ena_com_dev *ena_dev, int mtu)
> +{
> + struct ena_com_admin_queue *admin_queue;
> + struct ena_admin_set_feat_cmd cmd;
> + struct ena_admin_set_feat_resp resp;
> + int ret = 0;

Should be -ENODEV or left uninitialized.

> +
> + if (unlikely(!ena_dev)) {
> + ena_trc_err("%s : ena_dev is NULL\n", __func__);
> + return -ENODEV;
> + }

Does it mean that ena_com_dev may go away while the net_device is in use ?

> +
> + if (!ena_com_check_supported_feature_id(ena_dev, ENA_ADMIN_MTU)) {
> + ena_trc_info("Feature %d isn't supported\n", ENA_ADMIN_MTU);
> + return -EPERM;
> + }

rc = ena_com_check_supported_feature_id(...
if (rc < 0) {
ena_trc_info(...
goto out;
}
> +
> + memset(, 0x0, sizeof(cmd));
> + admin_queue = _dev->admin_queue;
> +
> + cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
> + cmd.aq_common_descriptor.flags = 0;
> + cmd.feat_common.feature_id = ENA_ADMIN_MTU;
> + cmd.u.mtu.mtu = mtu;
> +
> + ret = ena_com_execute_admin_command(admin_queue,
> + (struct ena_admin_aq_entry *),
> + sizeof(cmd),
> + (struct ena_admin_acq_entry *),
> + sizeof(resp));
> +
> + if (unlikely(ret)) {
> + ena_trc_err("Failed to set mtu %d. error: %d\n", mtu, ret);
> + return -EINVAL;

ena_com_execute_admin_command should return a proper status code.

[...]
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h 
b/drivers/net/ethernet/amazon/ena/ena_com.h
new file mode 100644
index 000..e49ba43
--- /dev/null
[...]
+static inline void ena_com_update_intr_reg(struct ena_eth_io_intr_reg 
*intr_reg,
+  u32 rx_delay_interval,
+  u32 tx_delay_interval,
+  bool unmask)
+{
+   intr_reg->intr_control = 0;
+   intr_reg->intr_control |= rx_delay_interval &
+   ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
+
+   intr_reg->intr_control |=
+   (tx_delay_interval << ENA_ETH_IO_INTR_REG_TX_INTR_DELAY_SHIFT)
+   & ENA_ETH_IO_INTR_REG_RX_INTR_DELAY_MASK;
  ^^ TX ?

Extra: you should not return NETDEV_TX_BUSY in the xmit handler while
queueing is still enabled. Please drop packet and return NETDEV_TX_OK.

-- 
Ueimor

act_mirred: remove spinlock in fast path

2016-06-17 Thread Cong Wang

Hi, Eric

During code review, I notice we might have some problem after we go
lockless for the fast path in act_mirred.

That is, what prevents us from the following possible race condition?

change a standalone action with tcf_mirred_init():
  // search for an existing action in hash
  // found it and got struct tcf_common
  m = to_mirred(a);
  m->tcf_action = parm->action;
  // Interrupted by BH

tcf_mirred() jumps in:
  rcu_read_lock()
  retval = READ_ONCE(m->tcf_action);
  if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
  
  rcu_unread_lock()

now go back to tcf_mirred_init():
  m->tcfm_eaction = parm->eaction;
  

IOW, the fast path could read a partially written change which could
be a problem? We need to allocate a new copy and then replace the old
one with it via RCU, don't we?

I can work on some patches, I want to make sure I don't miss anything here.

Thanks!

Re: [PATCH net] RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP

2016-06-17 Thread Santosh Shilimkar


On 6/17/2016 1:12 PM, Sowmini Varadhan wrote:

The state of the rds_connection after rds_tcp_reset_callbacks() would
be RDS_CONN_RESETTING and this is the value that should be passed
by rds_tcp_accept_one()  to rds_connect_path_complete() to transition
the socket to RDS_CONN_UP.

Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
by rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan 
---

Acked-by: Santosh Shilimkar

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread David Ahern


On 6/17/16 2:36 PM, Daniel Borkmann wrote:


Once again, you have to ask Nicolas or Julien for their RHEL version,
not me ... please have a look at 8f80d450c3cb. ;)


I did, just ran right over the Author ;-)



I just pointed to the fact that this would basically undo their changes
that they've submitted some time ago to the BPF frontend, reintroducing
the issue for them. Unfortunately, the anonymous struct cannot be named
due to uapi reasons. It should have been named from the very beginning,
but unfortunately too late now. So I would suggest to just leave those
affected parts as is.


I was referring to Phil's patch. All of the struct {} req; 
initializations should be fine if you name the structs, but then need to 
run it through whatever compiler version the 6wind folks care about to 
verify that is true.

Re: [PATCH RFC] brcmfmac: support deleting MBSS AP interfaces

2016-06-17 Thread Rafał Miłecki

On 17 June 2016 at 21:00, Arend van Spriel  wrote:
> On 17-06-16 14:30, Rafał Miłecki wrote:
>> On 1 June 2016 at 21:00, Arend van Spriel  
>> wrote:
>>> On 01-06-16 16:36, Rafał Miłecki wrote:
 We already support adding extra (AP) interfaces so it also makes an
 obvious sense to allow deleting them.

 Adding a new interface is implemented by sending request to firmware for
 creating a new BSS and waiting for a proper event. Ideally deleting
 interface should be handled in a similar way. There should be a request
 to firmware for deleting BSS and firmware should respond with an event.

 Unfortunately it doesn't seem to work with recent firmwares. They never
 seem to delete BSS and never send BRCMF_E_IF_DEL. As a workaround this
 patch deletes Linux interface while keeping a track of BSSes present in
 a firmware. If there is request for adding a new interface this code is
 capable of reusing existing BSS-es.
>>>
>>> It is not so much an issue of recent firmware. Actually, on recent
>>> firmware 7.x.y.z and higher there are other command to create *and*
>>> delete additional interfaces. On the other hand we aim to support a
>>> large number of devices going back to bcm4329 so we have to come up with
>>> a scheme to use the new commands or fallback to old api. Let's hope we
>>> can reuse much of this effort you put in.
>>
>> You gave me a complex puzzle there :D It took me a while to find out
>> what API you meant.
>
> Actually, the puzzle was supposed to be for me, but I like your
> curiosity and persistence in digging up the (partial) info.
>
>> Finally I found an interesting wlioctl.h in SDK 9.10.178.27 that gave
>> me some clue. I got this SDK from ASUS RT-AC1200G+ open souce tarball.
>> There are 2 interesting structs:
>>
>> typedef struct wl_interface_create {
>> uint16 ver; /* version of this struct */
>> uint32  flags; /* flags that defines the operation */
>> struct ether_addr   mac_addr; /* Optional Mac address */
>> uint32  wlc_index; /* Optional wlc index */
>> } wl_interface_create_t;
>>
>> typedef struct wl_interface_info {
>> uint16 ver; /* version of this struct */
>> struct ether_addrmac_addr; /* MAC address of the interface */
>> char ifname[BCM_MSG_IFNAME_MAX]; /* name of interface */
>> uint8 bsscfgidx; /* source bsscfg index */
>> } wl_interface_info_t;
>>
>> I couldn't find any corresponding WLC_* in wlioctl_defs.h, so I guess
>> I should use WLC_SET_VAR (or WLC_SET_VAR as you prefer) with some
>
> (huh)? anyway the api indeed uses what we call an iovar, ie.
> string-based ioctl.

That's what I meant, sorry for wrong naming :)


>> string. Any tip what would it be? Something like
>> "wl_interface_create"? Can you reveal such a small secret?
>
> It is "interface_create" and "interface_remove".

It (almost) works, thanks! I just hit some bug in brcmfmac in handling
events. It can be exposed by deleting 2 interfaces quickly, one by
one, it seems. I'll debug this.

-- 
Rafał

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Daniel Borkmann


On 06/17/2016 10:15 PM, David Ahern wrote:

On 6/17/16 12:57 PM, Stephen Hemminger wrote:

On Fri, 17 Jun 2016 16:58:14 +
Nicolas Dichtel  wrote:

Le 17/06/2016 18:46, Daniel Borkmann a écrit :

On 06/17/2016 06:34 PM, Stephen Hemminger wrote:

On Fri, 17 Jun 2016 16:09:20 +
Daniel Borkmann  wrote:


Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc
(< 4.6)") ...

Your changes effectively revert them again. Here, and some other parts of the
bpf frontend
code bits.


GCC 4.6 is 3 years old. So perhaps it is time to move on.
Maybe add a GCC version check in the makefile, to fail cleanly.


Well, you don't have to ask me but rather the patch submitters (Cc).

I haven't used RHEL in quite a while, but I could imagine it might
be related to built it there perhaps.

Yes. For some specific arch, we have only old toolchains.

The rule was always to be backward compatible with old kernels. It implies to
also support the compilation with old toolchains ;-)


It makes sense that if you can build a kernel with old toolchain, that
iproute2 needs to be buildable as well.

The current kernels are documented to require 3.2 or later.


Daniel's patch mentions anonymous structs so naming them should be fine.

Daniel: What OS were you using with the 4.6 gcc? I lost my range of OS VMs when I 
changed employers last year. At this point I don't recall which OS uses < 4.6.


Once again, you have to ask Nicolas or Julien for their RHEL version,
not me ... please have a look at 8f80d450c3cb. ;)

I just pointed to the fact that this would basically undo their changes
that they've submitted some time ago to the BPF frontend, reintroducing
the issue for them. Unfortunately, the anonymous struct cannot be named
due to uapi reasons. It should have been named from the very beginning,
but unfortunately too late now. So I would suggest to just leave those
affected parts as is.

Thanks,
Daniel

[PATCH net] RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP

2016-06-17 Thread Sowmini Varadhan

The state of the rds_connection after rds_tcp_reset_callbacks() would
be RDS_CONN_RESETTING and this is the value that should be passed
by rds_tcp_accept_one()  to rds_connect_path_complete() to transition
the socket to RDS_CONN_UP.

Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
by rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp_listen.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 686b1d0..245542c 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -138,7 +138,7 @@ int rds_tcp_accept_one(struct socket *sock)
rds_tcp_reset_callbacks(new_sock, conn);
conn->c_outgoing = 0;
/* rds_connect_path_complete() marks RDS_CONN_UP */
-   rds_connect_path_complete(conn, RDS_CONN_DISCONNECTING);
+   rds_connect_path_complete(conn, RDS_CONN_RESETTING);
}
} else {
rds_tcp_set_callbacks(new_sock, conn);
-- 
1.7.1

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread David Ahern


On 6/17/16 12:57 PM, Stephen Hemminger wrote:

On Fri, 17 Jun 2016 16:58:14 +
Nicolas Dichtel  wrote:


Le 17/06/2016 18:46, Daniel Borkmann a écrit :

On 06/17/2016 06:34 PM, Stephen Hemminger wrote:

On Fri, 17 Jun 2016 16:09:20 +
Daniel Borkmann  wrote:


Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc
(< 4.6)") ...

Your changes effectively revert them again. Here, and some other parts of the
bpf frontend
code bits.


GCC 4.6 is 3 years old. So perhaps it is time to move on.
Maybe add a GCC version check in the makefile, to fail cleanly.


Well, you don't have to ask me but rather the patch submitters (Cc).

I haven't used RHEL in quite a while, but I could imagine it might
be related to built it there perhaps.

Yes. For some specific arch, we have only old toolchains.

The rule was always to be backward compatible with old kernels. It implies to
also support the compilation with old toolchains ;-)


It makes sense that if you can build a kernel with old toolchain, that
iproute2 needs to be buildable as well.

The current kernels are documented to require 3.2 or later.




Daniel's patch mentions anonymous structs so naming them should be fine.

Daniel: What OS were you using with the 4.6 gcc? I lost my range of OS 
VMs when I changed employers last year. At this point I don't recall 
which OS uses < 4.6.

[net-next,v3] openvswitch: Add packet len info to upcall.

2016-06-17 Thread William Tu

The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
introduces packet truncation before sending to userspace upcall receiver.
This patch passes up the skb->len before truncation so that the upcall
receiver knows the original packet size. Potentially this will be used
by sFlow, where OVS translates sFlow config header=N to a sample action,
truncating packet to N byte in kernel datapath. Thus, only N bytes instead
of full-packet size is copied from kernel to userspace, saving the
kernel-to-userspace bandwidth.

Signed-off-by: William Tu 
Cc: Pravin Shelar 
---
v2->v3:
- remove platform specific name.
- fix issue when receiving GSO packet
- remove skblen in struct dp_upcall_info
v1->v2:
- pass skb->len to userspace instead of cutlen
---
 include/uapi/linux/openvswitch.h |  2 ++
 net/openvswitch/datapath.c   | 11 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 8274675..d95a301 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -166,6 +166,7 @@ enum ovs_packet_cmd {
  * output port is actually a tunnel port. Contains the output tunnel key
  * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
  * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
+ * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
  * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
  * size.
  *
@@ -185,6 +186,7 @@ enum ovs_packet_attr {
OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
   error logging should be suppressed. */
OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
+   OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
__OVS_PACKET_ATTR_MAX
 };
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6739342..623fc04 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -387,7 +387,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info 
*upcall_info,
 {
size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
-   + nla_total_size(ovs_key_attr_size()); /* OVS_PACKET_ATTR_KEY */
+   + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY */
+   + nla_total_size(sizeof(unsigned int)); /* OVS_PACKET_ATTR_LEN 
*/
 
/* OVS_PACKET_ATTR_USERDATA */
if (upcall_info->userdata)
@@ -514,6 +515,14 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
pad_packet(dp, user_skb);
}
 
+   /* Add OVS_PACKET_ATTR_LEN */
+   if (nla_put_u32(user_skb, OVS_PACKET_ATTR_LEN,
+   skb->len)) {
+   err = -ENOBUFS;
+   goto out;
+   }
+   pad_packet(dp, user_skb);
+
/* Only reserve room for attribute header, packet data is added
 * in skb_zerocopy() */
if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) {
-- 
2.5.0

Re: [PATCH] net: tilegx: use correct timespec64 type

2016-06-17 Thread Richard Cochran

On Fri, Jun 17, 2016 at 06:15:30PM +0200, Arnd Bergmann wrote:
> The conversion to the 64-bit time based ptp methods left two instances
> of 'struct timespec' in place. This is harmless because 64-bit
> architectures define timespec64 as timespec, and this driver is
> not used on 32-bit machines.
> 
> However, using 'struct timespec64' directly is obviously the right
> thing to do, and will help us remove 'struct timespec' in the future.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: b9acf24f779c ("ptp: tilegx: convert to the 64 bit get/set time 
> methods.")

Acked-by: Richard Cochran

Re: [PATCH iproute2 net-next 3/5] bridge: add json support for bridge fdb show

2016-06-17 Thread Anuradha Karuppiah

On Fri, Jun 17, 2016 at 12:04 PM, Stephen Hemminger
 wrote:
> On Fri, 17 Jun 2016 11:04:54 -0700
> Anuradha Karuppiah  wrote:
>
>> On Tue, May 31, 2016 at 12:22 PM, Anuradha Karuppiah
>>  wrote:
>> > On Tue, May 31, 2016 at 11:59 AM, Stephen Hemminger
>> >  wrote:
>> >> On Fri, 27 May 2016 21:37:14 -0700
>> >> Roopa Prabhu  wrote:
>> >>
>> >>> Sample output:
>> >>> $bridge -j fdb show
>> >>> [{
>> >>> "mac": "44:38:39:00:69:88",
>> >>> "dev": "swp2s0",
>> >>> "vlan": 2,
>> >>> "master": "br0",
>> >>> "state": "permanent"
>> >>> },{
>> >>> "mac": "00:02:00:00:00:01",
>> >>> "dev": "swp2s0",
>> >>> "vlan": 2,
>> >>> "master": "br0"
>> >>> },{
>> >>> "mac": "00:02:00:00:00:02",
>> >>> "dev": "swp2s1",
>> >>> "vlan": 2,
>> >>> "master": "br0"
>> >>> },{
>> >>> "mac": "44:38:39:00:69:89",
>> >>> "dev": "swp2s1",
>> >>> "master": "br0",
>> >>> "state": "permanent"
>> >>> },{
>> >>> "mac": "44:38:39:00:69:89",
>> >>> "dev": "swp2s1",
>> >>> "vlan": 2,
>> >>> "master": "br0",
>> >>> "state": "permanent"
>> >>> },{
>> >>> "mac": "44:38:39:00:69:88",
>> >>> "dev": "br0",
>> >>> "master": "br0",
>> >>> "state": "permanent"
>> >>> }
>> >>> ]
>> >>
>> >> In most JSON I have seen, the output would be:
>> >>
>> >> {
>> >>   "fdb" : [
>> >>  {
>> >>  "mac": "44:38:39:00:69:88",
>> >>  "dev": "swp2s0",
>> >>  "vlan": 2,
>> >>  "master": "br0",
>> >>  "state": "permanent"
>> >>  },
>> >> ...
>> >>]
>> >> }
>> >>
>> >> I.e never a bare array.
>> >>
>> > Yes Stephen, Adding an extra level would be one way to force the
>> > format to json-object. And that would definitely be the way to do it
>> > if we ever added a top level json dump - something like - "bridge -j
>> > show".
>> >
>> > But in the case of "bridge -j fdb show" that level is redundant. To be
>> > consistent we would have to add that extra level to all json dumps
>> > (even if they were already objects; such as the "bridge -j vlan
>> > show").The google json style guide recommends against adding hierarchy
>> > unless needed. And it is not that uncommon in java to have a
>> > json-array of objects for e.g. http://json-schema.org/example1.html
>> > talks about a schema that is an "array of products".
>> >
>> > What do you recommend?
>>
>> Hi Stephen,
>> We did a bit more digging around and found that other folks use json
>> output with top level array as well. Here’s a docker networks json
>> output sample -
>>
>> vagrant@host-21 ~ $ docker network inspect red
>> [
>> {
>> "Name": "red",
>> "Id": 
>> "d2fff9bafd7564c4012aa49f322fcd8f5743cc5ceb465dc218af5ba22c920981",
>> "Scope": "global",
>> "Driver": "overlay",
>> "EnableIPv6": false,
>> "IPAM": {
>> "Driver": "default",
>> "Options": {},
>> "Config": [
>> {
>> "Subnet": "10.252.20.0/24"
>> }
>> ]
>> },
>> "Internal": false,
>> "Containers": {
>> 
>> "c92084c1ebfb4f0a601537298c273078862207e3b564787ddd6ef564efbaca47":
>> {
>> "Name": "ctr21",
>> "EndpointID":
>> "e7468a70f13f1ea7b15445ab555374892ac41f71ea9023af1d9ede668bfd8742",
>> "MacAddress": "02:42:0a:fc:14:03",
>> "IPv4Address": "10.252.20.3/24",
>> "IPv6Address": ""
>> },
>> 
>> "ep-9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45":
>> {
>> "Name": "ctr22",
>> "EndpointID":
>> "9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45",
>> "MacAddress": "02:42:0a:fc:14:02",
>> "IPv4Address": "10.252.20.2/24",
>> "IPv6Address": ""
>> }
>> },
>> "Options": {},
>> "Labels": {}
>> }
>> ]
>>
>> Adding an additional namespace to all the json outputs (just to avoid
>> a top-level json-array for some) seems redundant. If a namespace is
>> needed for other reasons we can definitely add it. So we think it
>> would be better to just go with the top-level json-array for a
>> list/set-of-objects outputs.
>
> Ok, resubmit. I never claimed to be a JSON expert :=)

will do :)

Re: [PATCH iproute2 net-next 3/5] bridge: add json support for bridge fdb show

2016-06-17 Thread Stephen Hemminger

On Fri, 17 Jun 2016 11:04:54 -0700
Anuradha Karuppiah  wrote:

> On Tue, May 31, 2016 at 12:22 PM, Anuradha Karuppiah
>  wrote:
> > On Tue, May 31, 2016 at 11:59 AM, Stephen Hemminger
> >  wrote:
> >> On Fri, 27 May 2016 21:37:14 -0700
> >> Roopa Prabhu  wrote:
> >>
> >>> Sample output:
> >>> $bridge -j fdb show
> >>> [{
> >>> "mac": "44:38:39:00:69:88",
> >>> "dev": "swp2s0",
> >>> "vlan": 2,
> >>> "master": "br0",
> >>> "state": "permanent"
> >>> },{
> >>> "mac": "00:02:00:00:00:01",
> >>> "dev": "swp2s0",
> >>> "vlan": 2,
> >>> "master": "br0"
> >>> },{
> >>> "mac": "00:02:00:00:00:02",
> >>> "dev": "swp2s1",
> >>> "vlan": 2,
> >>> "master": "br0"
> >>> },{
> >>> "mac": "44:38:39:00:69:89",
> >>> "dev": "swp2s1",
> >>> "master": "br0",
> >>> "state": "permanent"
> >>> },{
> >>> "mac": "44:38:39:00:69:89",
> >>> "dev": "swp2s1",
> >>> "vlan": 2,
> >>> "master": "br0",
> >>> "state": "permanent"
> >>> },{
> >>> "mac": "44:38:39:00:69:88",
> >>> "dev": "br0",
> >>> "master": "br0",
> >>> "state": "permanent"
> >>> }
> >>> ]
> >>
> >> In most JSON I have seen, the output would be:
> >>
> >> {
> >>   "fdb" : [
> >>  {
> >>  "mac": "44:38:39:00:69:88",
> >>  "dev": "swp2s0",
> >>  "vlan": 2,
> >>  "master": "br0",
> >>  "state": "permanent"
> >>  },
> >> ...
> >>]
> >> }
> >>
> >> I.e never a bare array.
> >>
> > Yes Stephen, Adding an extra level would be one way to force the
> > format to json-object. And that would definitely be the way to do it
> > if we ever added a top level json dump - something like - "bridge -j
> > show".
> >
> > But in the case of "bridge -j fdb show" that level is redundant. To be
> > consistent we would have to add that extra level to all json dumps
> > (even if they were already objects; such as the "bridge -j vlan
> > show").The google json style guide recommends against adding hierarchy
> > unless needed. And it is not that uncommon in java to have a
> > json-array of objects for e.g. http://json-schema.org/example1.html
> > talks about a schema that is an "array of products".
> >
> > What do you recommend?
> 
> Hi Stephen,
> We did a bit more digging around and found that other folks use json
> output with top level array as well. Here’s a docker networks json
> output sample -
> 
> vagrant@host-21 ~ $ docker network inspect red
> [
> {
> "Name": "red",
> "Id": 
> "d2fff9bafd7564c4012aa49f322fcd8f5743cc5ceb465dc218af5ba22c920981",
> "Scope": "global",
> "Driver": "overlay",
> "EnableIPv6": false,
> "IPAM": {
> "Driver": "default",
> "Options": {},
> "Config": [
> {
> "Subnet": "10.252.20.0/24"
> }
> ]
> },
> "Internal": false,
> "Containers": {
> 
> "c92084c1ebfb4f0a601537298c273078862207e3b564787ddd6ef564efbaca47":
> {
> "Name": "ctr21",
> "EndpointID":
> "e7468a70f13f1ea7b15445ab555374892ac41f71ea9023af1d9ede668bfd8742",
> "MacAddress": "02:42:0a:fc:14:03",
> "IPv4Address": "10.252.20.3/24",
> "IPv6Address": ""
> },
> 
> "ep-9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45":
> {
> "Name": "ctr22",
> "EndpointID":
> "9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45",
> "MacAddress": "02:42:0a:fc:14:02",
> "IPv4Address": "10.252.20.2/24",
> "IPv6Address": ""
> }
> },
> "Options": {},
> "Labels": {}
> }
> ]
> 
> Adding an additional namespace to all the json outputs (just to avoid
> a top-level json-array for some) seems redundant. If a namespace is
> needed for other reasons we can definitely add it. So we think it
> would be better to just go with the top-level json-array for a
> list/set-of-objects outputs.

Ok, resubmit. I never claimed to be a JSON expert :=)

Re: [PATCH RFC] brcmfmac: support deleting MBSS AP interfaces

2016-06-17 Thread Arend van Spriel

On 17-06-16 14:30, Rafał Miłecki wrote:
> On 1 June 2016 at 21:00, Arend van Spriel  
> wrote:
>> On 01-06-16 16:36, Rafał Miłecki wrote:
>>> We already support adding extra (AP) interfaces so it also makes an
>>> obvious sense to allow deleting them.
>>>
>>> Adding a new interface is implemented by sending request to firmware for
>>> creating a new BSS and waiting for a proper event. Ideally deleting
>>> interface should be handled in a similar way. There should be a request
>>> to firmware for deleting BSS and firmware should respond with an event.
>>>
>>> Unfortunately it doesn't seem to work with recent firmwares. They never
>>> seem to delete BSS and never send BRCMF_E_IF_DEL. As a workaround this
>>> patch deletes Linux interface while keeping a track of BSSes present in
>>> a firmware. If there is request for adding a new interface this code is
>>> capable of reusing existing BSS-es.
>>
>> It is not so much an issue of recent firmware. Actually, on recent
>> firmware 7.x.y.z and higher there are other command to create *and*
>> delete additional interfaces. On the other hand we aim to support a
>> large number of devices going back to bcm4329 so we have to come up with
>> a scheme to use the new commands or fallback to old api. Let's hope we
>> can reuse much of this effort you put in.
> 
> You gave me a complex puzzle there :D It took me a while to find out
> what API you meant.

Actually, the puzzle was supposed to be for me, but I like your
curiosity and persistence in digging up the (partial) info.

> Finally I found an interesting wlioctl.h in SDK 9.10.178.27 that gave
> me some clue. I got this SDK from ASUS RT-AC1200G+ open souce tarball.
> There are 2 interesting structs:
> 
> typedef struct wl_interface_create {
> uint16 ver; /* version of this struct */
> uint32  flags; /* flags that defines the operation */
> struct ether_addr   mac_addr; /* Optional Mac address */
> uint32  wlc_index; /* Optional wlc index */
> } wl_interface_create_t;
> 
> typedef struct wl_interface_info {
> uint16 ver; /* version of this struct */
> struct ether_addrmac_addr; /* MAC address of the interface */
> char ifname[BCM_MSG_IFNAME_MAX]; /* name of interface */
> uint8 bsscfgidx; /* source bsscfg index */
> } wl_interface_info_t;
> 
> I couldn't find any corresponding WLC_* in wlioctl_defs.h, so I guess
> I should use WLC_SET_VAR (or WLC_SET_VAR as you prefer) with some

(huh)? anyway the api indeed uses what we call an iovar, ie.
string-based ioctl.

> string. Any tip what would it be? Something like
> "wl_interface_create"? Can you reveal such a small secret?

It is "interface_create" and "interface_remove".

Regards,
Arend

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Stephen Hemminger

On Fri, 17 Jun 2016 16:58:14 +
Nicolas Dichtel  wrote:

> Le 17/06/2016 18:46, Daniel Borkmann a écrit :
> > On 06/17/2016 06:34 PM, Stephen Hemminger wrote:
> >> On Fri, 17 Jun 2016 16:09:20 +
> >> Daniel Borkmann  wrote:
> >>
> >>> Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old 
> >>> gcc
> >>> (< 4.6)") ...
> >>>
> >>> Your changes effectively revert them again. Here, and some other parts of 
> >>> the
> >>> bpf frontend
> >>> code bits.
> >>
> >> GCC 4.6 is 3 years old. So perhaps it is time to move on.
> >> Maybe add a GCC version check in the makefile, to fail cleanly.
> > 
> > Well, you don't have to ask me but rather the patch submitters (Cc).
> > 
> > I haven't used RHEL in quite a while, but I could imagine it might
> > be related to built it there perhaps.
> Yes. For some specific arch, we have only old toolchains.
> 
> The rule was always to be backward compatible with old kernels. It implies to
> also support the compilation with old toolchains ;-)

It makes sense that if you can build a kernel with old toolchain, that
iproute2 needs to be buildable as well.

The current kernels are documented to require 3.2 or later.

Re: [RFC PATCH 1/1] net/ipv4: Enable flow-based ECMP

2016-06-17 Thread Jean He

Richard Laing  alliedtelesis.co.nz> writes:

> >>> Fair enough, I will look at making it a sysctl option. I guess the
> >>> default can be the current behaviour.

> Cheers
> Richard
> 

Hi Richard,

Could you please share if this new patch for per-flow IPv4 ECMP has landed?

If so, which version should I use? 

And is there a sysctl option that we can just use to enable per-flow hashing?

Many thanks!
Jean

Re: [PATCH net-next v3] tcp: use RFC6298 compliant TCP RTO calculation

2016-06-17 Thread Yuchung Cheng

On Fri, Jun 17, 2016 at 11:32 AM, David Miller  wrote:
>
> From: Daniel Metz 
> Date: Wed, 15 Jun 2016 20:00:03 +0200
>
> > This patch adjusts Linux RTO calculation to be RFC6298 Standard
> > compliant. MinRTO is no longer added to the computed RTO, RTO damping
> > and overestimation are decreased.
>  ...
>
> Yuchung, I assume I am waiting for you to do the testing you said
> you would do for this patch, right?
Yes I spent the last two days resolving some unrelated glitches to
start my testing on Web servers. I should be able to get some results
over the weekend.

I will test
0) current Linux
1) this patch
2) RFC6298 with min_RTO=1sec
3) RFC6298 with minimum RTTVAR of 200ms (so it is more like current
Linux style of min RTO which only applies to RTTVAR)

and collect the TCP latency (how long to send an HTTP response) and
(spurious) timeout & retransmission stats.

I didn't respond to Hagen's email yet b/c I thought data would help
the discussion better :-)

Re: Per-flow IPv4 ECMP

2016-06-17 Thread Jean He

Peter Nørlund  ordbogen.com> writes:

> > This is being worked on currently by Peter Nørlund
> > https://lwn.net/Articles/657431/
> 

Hi Peter,

Following up on this thread, has the new patch for supporting per-flow IPv4 
ECMP landed? 

Which version should I be able to use that has this feature?

Thanks,
Jean

Re: [PATCH -next] RDS: TCP: Fix non static symbol warnings

2016-06-17 Thread Santosh Shilimkar


On 6/17/2016 11:12 AM, weiyj...@163.com wrote:

From: Wei Yongjun 

Fixes the following sparse warnings:

net/rds/tcp.c:59:5: warning:
 symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
net/rds/tcp.c:60:5: warning:
 symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---

Look ok.
Acked-by: Santosh Shilimkar

Re: PATCH 1/1] AX.25: Close socket connection on session completion

2016-06-17 Thread David Miller

From: Basil Gunn 
Date: Thu, 16 Jun 2016 09:42:30 -0700

> A socket connection made in ax.25 is not closed when session is
> completed.  The heartbeat timer is stopped prematurely and this is
> where the socket gets closed. Allow heatbeat timer to run to close
> socket. Symptom occurs in kernels >= 4.2.0
> 
> Originally sent 6/15/2016. Resend with distribution list matching
> scripts/maintainer.pl output.
> 
> Signed-off-by: Basil Gunn 

What changed in 4.2.x that broke this?

Re: [PATCH net-next v3] tcp: use RFC6298 compliant TCP RTO calculation

2016-06-17 Thread David Miller

From: Daniel Metz 
Date: Wed, 15 Jun 2016 20:00:03 +0200

> This patch adjusts Linux RTO calculation to be RFC6298 Standard
> compliant. MinRTO is no longer added to the computed RTO, RTO damping
> and overestimation are decreased.
 ...

Yuchung, I assume I am waiting for you to do the testing you said
you would do for this patch, right?

Re: [PATCH -next] RDS: TCP: Fix non static symbol warnings

2016-06-17 Thread Sowmini Varadhan

On (06/17/16 18:12), weiyj...@163.com wrote:
> From: Wei Yongjun 
> 
> Fixes the following sparse warnings:
> 
> net/rds/tcp.c:59:5: warning:
>  symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
> net/rds/tcp.c:60:5: warning:
>  symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?

Acked-by: Sowmini Varadhan

[PATCH 5/7] netfilter: nf_tables: fix a wrong check to skip the inactive rules

2016-06-17 Thread Pablo Neira Ayuso

From: Liping Zhang 

nft_genmask_cur has already done left-shift operator on the gencursor,
so there's no need to do left-shift operator on it again.

Fixes: ea4bd995b0f2 ("netfilter: nf_tables: add transaction helper functions")
Cc: Patrick McHardy 
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index e9f8dff..fb8b589 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -143,7 +143,7 @@ next_rule:
list_for_each_entry_continue_rcu(rule, >rules, list) {
 
/* This rule is not active, skip. */
-   if (unlikely(rule->genmask & (1 << gencursor)))
+   if (unlikely(rule->genmask & gencursor))
continue;
 
rulenum++;
-- 
2.1.4

[PATCH 4/7] netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails

2016-06-17 Thread Pablo Neira Ayuso

From: Liping Zhang 

When we add a nft rule like follows:
  # nft add rule filter test tcp dport vmap {1: jump test}
-ELOOP error will be returned, and the anonymous set will be
destroyed.

But after that, nf_tables_abort will also try to remove the
element and destroy the set, which was already destroyed and
freed.

If we add a nft wrong rule, nft_tables_abort will do the cleanup
work rightly, so nf_tables_set_destroy call here is redundant and
wrong, remove it.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 0fd6998..2c88187 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2958,13 +2958,8 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct 
nft_set *set,
iter.fn = nf_tables_bind_check_setelem;
 
set->ops->walk(ctx, set, );
-   if (iter.err < 0) {
-   /* Destroy anonymous sets if binding fails */
-   if (set->flags & NFT_SET_ANONYMOUS)
-   nf_tables_set_destroy(ctx, set);
-
+   if (iter.err < 0)
return iter.err;
-   }
}
 bind:
binding->chain = ctx->chain;
-- 
2.1.4

[PATCH 6/7] netfilter: xt_SYNPROXY: add missing header to Kbuild

2016-06-17 Thread Pablo Neira Ayuso

Matt Whitlock says:

 Without this line, the file xt_SYNPROXY.h does not get installed in
 /usr/include/linux/netfilter/, and thus user-space programs cannot make
 use of it.

Reported-by: Matt Whitlock 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/Kbuild | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/netfilter/Kbuild 
b/include/uapi/linux/netfilter/Kbuild
index 1d973d2..cd26d7a 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -33,6 +33,7 @@ header-y += xt_NFLOG.h
 header-y += xt_NFQUEUE.h
 header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
+header-y += xt_SYNPROXY.h
 header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
 header-y += xt_TEE.h
-- 
2.1.4

[PATCH 2/7] netfilter: nf_tables: fix wrong check of NFT_SET_MAP in nf_tables_bind_set

2016-06-17 Thread Pablo Neira Ayuso

From: Liping Zhang 

We should check "i" is used as a dictionary or not, "binding" is already
checked before.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 7b7aa87..492f6f8 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2946,7 +2946,7 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct 
nft_set *set,
 * jumps are already validated for that chain.
 */
list_for_each_entry(i, >bindings, list) {
-   if (binding->flags & NFT_SET_MAP &&
+   if (i->flags & NFT_SET_MAP &&
i->chain == binding->chain)
goto bind;
}
-- 
2.1.4

[PATCH 3/7] netfilter: nf_tables: reject loops from set element jump to chain

2016-06-17 Thread Pablo Neira Ayuso

Liping Zhang says:

"Users may add such a wrong nft rules successfully, which will cause an
endless jump loop:

  # nft add rule filter test tcp dport vmap {1: jump test}

This is because before we commit, the element in the current anonymous
set is inactive, so osp->walk will skip this element and miss the
validate check."

To resolve this problem, this patch passes the generation mask to the
walk function through the iter container structure depending on the code
path:

1) If we're dumping the elements, then we have to check if the element
   is active in the current generation. Thus, we check for the current
   bit in the genmask.

2) If we're checking for loops, then we have to check if the element is
   active in the next generation, as we're in the middle of a
   transaction. Thus, we check for the next bit in the genmask.

Based on original patch from Liping Zhang.

Reported-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
Tested-by: Liping Zhang 
---
 include/net/netfilter/nf_tables.h |  1 +
 net/netfilter/nf_tables_api.c | 15 +--
 net/netfilter/nft_hash.c  |  3 +--
 net/netfilter/nft_rbtree.c|  3 +--
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 0922354..f7c291f 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -167,6 +167,7 @@ struct nft_set_elem {
 
 struct nft_set;
 struct nft_set_iter {
+   u8  genmask;
unsigned intcount;
unsigned intskip;
int err;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 492f6f8..0fd6998 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2951,6 +2951,7 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct 
nft_set *set,
goto bind;
}
 
+   iter.genmask= nft_genmask_next(ctx->net);
iter.skip   = 0;
iter.count  = 0;
iter.err= 0;
@@ -3192,12 +3193,13 @@ static int nf_tables_dump_set(struct sk_buff *skb, 
struct netlink_callback *cb)
if (nest == NULL)
goto nla_put_failure;
 
-   args.cb = cb;
-   args.skb= skb;
-   args.iter.skip  = cb->args[0];
-   args.iter.count = 0;
-   args.iter.err   = 0;
-   args.iter.fn= nf_tables_dump_setelem;
+   args.cb = cb;
+   args.skb= skb;
+   args.iter.genmask   = nft_genmask_cur(ctx.net);
+   args.iter.skip  = cb->args[0];
+   args.iter.count = 0;
+   args.iter.err   = 0;
+   args.iter.fn= nf_tables_dump_setelem;
set->ops->walk(, set, );
 
nla_nest_end(skb, nest);
@@ -4284,6 +4286,7 @@ static int nf_tables_check_loops(const struct nft_ctx 
*ctx,
binding->chain != chain)
continue;
 
+   iter.genmask= nft_genmask_next(ctx->net);
iter.skip   = 0;
iter.count  = 0;
iter.err= 0;
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 6fa0165..f39c53a 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -189,7 +189,6 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const 
struct nft_set *set,
struct nft_hash_elem *he;
struct rhashtable_iter hti;
struct nft_set_elem elem;
-   u8 genmask = nft_genmask_cur(read_pnet(>pnet));
int err;
 
err = rhashtable_walk_init(>ht, , GFP_KERNEL);
@@ -218,7 +217,7 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const 
struct nft_set *set,
goto cont;
if (nft_set_elem_expired(>ext))
goto cont;
-   if (!nft_set_elem_active(>ext, genmask))
+   if (!nft_set_elem_active(>ext, iter->genmask))
goto cont;
 
elem.priv = he;
diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
index f762094..7201d57 100644
--- a/net/netfilter/nft_rbtree.c
+++ b/net/netfilter/nft_rbtree.c
@@ -211,7 +211,6 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
struct nft_rbtree_elem *rbe;
struct nft_set_elem elem;
struct rb_node *node;
-   u8 genmask = nft_genmask_cur(read_pnet(>pnet));
 
spin_lock_bh(_rbtree_lock);
for (node = rb_first(>root); node != NULL; node = rb_next(node)) {
@@ -219,7 +218,7 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 
if (iter->count < iter->skip)
goto cont;
-   if (!nft_set_elem_active(>ext, genmask))
+   if

[PATCH 0/7] Netfilter fixes for net

2016-06-17 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains Netfilter fixes for your net tree,
they are rather small patches but fixing several outstanding bugs in
nf_conntrack and nf_tables, as well as minor problems with missing
SYNPROXY header uapi installation:

1) Oneliner not to leak conntrack kmemcache on module removal, this
   problem was introduced in the previous merge window, patch from
   Florian Westphal.

2) Two fixes for insufficient ruleset loop validation, one due to
   incorrect flag check in nf_tables_bind_set() and another related to
   silly wrong generation mask logic from the walk path, from Liping
   Zhang.

3) Fix double-free of anonymous sets on error, this fix simplifies the
   code to let the abort path take care of releasing the set object,
   also from Liping Zhang.

4) The introduction of helper function for transactions broke the skip
   inactive rules logic from the nft_do_chain(), again from Liping
   Zhang.

5) Two patches to install uapi xt_SYNPROXY.h header and calm down
   kbuild robot due to missing #include .

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 61e0979a497b07f5a82f3050e37ecc7093e2971d:

  Merge branch 'ovs-notifications' (2016-06-14 22:21:45 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 1463847e93fe693e89c52b03ab4ede6800d717c1:

  netfilter: xt_SYNPROXY: include missing  (2016-06-17 13:47:40 
+0200)


Florian Westphal (1):
  netfilter: conntrack: destroy kmemcache on module removal

Liping Zhang (3):
  netfilter: nf_tables: fix wrong check of NFT_SET_MAP in nf_tables_bind_set
  netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails
  netfilter: nf_tables: fix a wrong check to skip the inactive rules

Pablo Neira Ayuso (3):
  netfilter: nf_tables: reject loops from set element jump to chain
  netfilter: xt_SYNPROXY: add missing header to Kbuild
  netfilter: xt_SYNPROXY: include missing 

 include/net/netfilter/nf_tables.h  |  1 +
 include/uapi/linux/netfilter/Kbuild|  1 +
 include/uapi/linux/netfilter/xt_SYNPROXY.h |  2 ++
 net/netfilter/nf_conntrack_core.c  |  2 ++
 net/netfilter/nf_tables_api.c  | 24 +++-
 net/netfilter/nf_tables_core.c |  2 +-
 net/netfilter/nft_hash.c   |  3 +--
 net/netfilter/nft_rbtree.c |  3 +--
 8 files changed, 20 insertions(+), 18 deletions(-)

[PATCH 1/7] netfilter: conntrack: destroy kmemcache on module removal

2016-06-17 Thread Pablo Neira Ayuso

From: Florian Westphal 

I forgot to move the kmem_cache_destroy into the exit path.

Fixes: 0c5366b3a8c7 ("netfilter: conntrack: use single slab cache)
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index db2312e..f204274 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1544,6 +1544,8 @@ void nf_conntrack_cleanup_end(void)
nf_conntrack_tstamp_fini();
nf_conntrack_acct_fini();
nf_conntrack_expect_fini();
+
+   kmem_cache_destroy(nf_conntrack_cachep);
 }
 
 /*
-- 
2.1.4

[PATCH 7/7] netfilter: xt_SYNPROXY: include missing

2016-06-17 Thread Pablo Neira Ayuso

./usr/include/linux/netfilter/xt_SYNPROXY.h:11: found __[us]{8,16,32,64} type 
without #include 

Reported-by: kbuild test robot 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/xt_SYNPROXY.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/netfilter/xt_SYNPROXY.h 
b/include/uapi/linux/netfilter/xt_SYNPROXY.h
index 2d59fba..ca67e61 100644
--- a/include/uapi/linux/netfilter/xt_SYNPROXY.h
+++ b/include/uapi/linux/netfilter/xt_SYNPROXY.h
@@ -1,6 +1,8 @@
 #ifndef _XT_SYNPROXY_H
 #define _XT_SYNPROXY_H
 
+#include 
+
 #define XT_SYNPROXY_OPT_MSS0x01
 #define XT_SYNPROXY_OPT_WSCALE 0x02
 #define XT_SYNPROXY_OPT_SACK_PERM  0x04
-- 
2.1.4

[PATCH -next] RDS: TCP: Fix non static symbol warnings

2016-06-17 Thread weiyj_lk

From: Wei Yongjun 

Fixes the following sparse warnings:

net/rds/tcp.c:59:5: warning:
 symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
net/rds/tcp.c:60:5: warning:
 symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 net/rds/tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 86187da..49fab0c 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -56,8 +56,8 @@ static int rds_tcp_skbuf_handler(struct ctl_table *ctl, int 
write,
 void __user *buffer, size_t *lenp,
 loff_t *fpos);
 
-int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
-int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
+static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF;
+static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF;
 
 static struct ctl_table rds_tcp_sysctl_table[] = {
 #defineRDS_TCP_SNDBUF  0

Re: [PATCH iproute2 net-next 3/5] bridge: add json support for bridge fdb show

2016-06-17 Thread Anuradha Karuppiah

On Tue, May 31, 2016 at 12:22 PM, Anuradha Karuppiah
 wrote:
> On Tue, May 31, 2016 at 11:59 AM, Stephen Hemminger
>  wrote:
>> On Fri, 27 May 2016 21:37:14 -0700
>> Roopa Prabhu  wrote:
>>
>>> Sample output:
>>> $bridge -j fdb show
>>> [{
>>> "mac": "44:38:39:00:69:88",
>>> "dev": "swp2s0",
>>> "vlan": 2,
>>> "master": "br0",
>>> "state": "permanent"
>>> },{
>>> "mac": "00:02:00:00:00:01",
>>> "dev": "swp2s0",
>>> "vlan": 2,
>>> "master": "br0"
>>> },{
>>> "mac": "00:02:00:00:00:02",
>>> "dev": "swp2s1",
>>> "vlan": 2,
>>> "master": "br0"
>>> },{
>>> "mac": "44:38:39:00:69:89",
>>> "dev": "swp2s1",
>>> "master": "br0",
>>> "state": "permanent"
>>> },{
>>> "mac": "44:38:39:00:69:89",
>>> "dev": "swp2s1",
>>> "vlan": 2,
>>> "master": "br0",
>>> "state": "permanent"
>>> },{
>>> "mac": "44:38:39:00:69:88",
>>> "dev": "br0",
>>> "master": "br0",
>>> "state": "permanent"
>>> }
>>> ]
>>
>> In most JSON I have seen, the output would be:
>>
>> {
>>   "fdb" : [
>>  {
>>  "mac": "44:38:39:00:69:88",
>>  "dev": "swp2s0",
>>  "vlan": 2,
>>  "master": "br0",
>>  "state": "permanent"
>>  },
>> ...
>>]
>> }
>>
>> I.e never a bare array.
>>
> Yes Stephen, Adding an extra level would be one way to force the
> format to json-object. And that would definitely be the way to do it
> if we ever added a top level json dump - something like - "bridge -j
> show".
>
> But in the case of "bridge -j fdb show" that level is redundant. To be
> consistent we would have to add that extra level to all json dumps
> (even if they were already objects; such as the "bridge -j vlan
> show").The google json style guide recommends against adding hierarchy
> unless needed. And it is not that uncommon in java to have a
> json-array of objects for e.g. http://json-schema.org/example1.html
> talks about a schema that is an "array of products".
>
> What do you recommend?

Hi Stephen,
We did a bit more digging around and found that other folks use json
output with top level array as well. Here’s a docker networks json
output sample -

vagrant@host-21 ~ $ docker network inspect red
[
{
"Name": "red",
"Id": 
"d2fff9bafd7564c4012aa49f322fcd8f5743cc5ceb465dc218af5ba22c920981",
"Scope": "global",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "10.252.20.0/24"
}
]
},
"Internal": false,
"Containers": {
"c92084c1ebfb4f0a601537298c273078862207e3b564787ddd6ef564efbaca47":
{
"Name": "ctr21",
"EndpointID":
"e7468a70f13f1ea7b15445ab555374892ac41f71ea9023af1d9ede668bfd8742",
"MacAddress": "02:42:0a:fc:14:03",
"IPv4Address": "10.252.20.3/24",
"IPv6Address": ""
},

"ep-9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45":
{
"Name": "ctr22",
"EndpointID":
"9bfc004b4046512f0f0104fe022d3686f5237ae08475741f8d520552cbb63d45",
"MacAddress": "02:42:0a:fc:14:02",
"IPv4Address": "10.252.20.2/24",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]

Adding an additional namespace to all the json outputs (just to avoid
a top-level json-array for some) seems redundant. If a namespace is
needed for other reasons we can definitely add it. So we think it
would be better to just go with the top-level json-array for a
list/set-of-objects outputs.

thanks
Anuradha.

[PATCH] net:liquidio: remove unused including

2016-06-17 Thread weiyj_lk

From: Wei Yongjun 

Remove including  that don't need it.

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/cavium/liquidio/octeon_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 8e23e3f..ba04375 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -19,7 +19,6 @@
 * This file may also be available under a different license from Cavium.
 * Contact Cavium, Inc. for more information
 **/
-#include 
 #include 
 #include 
 #include

[PATCH] gtp: remove unused including

2016-06-17 Thread weiyj_lk

From: Wei Yongjun 

Remove including  that don't need it.

Signed-off-by: Wei Yongjun 
---
 drivers/net/gtp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4e976a0..97e0cbc 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -16,7 +16,6 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
-#include 
 #include 
 #include 
 #include

Re: [BUG] act_ife: sleeping functions called in atomic context

2016-06-17 Thread Cong Wang

On Fri, Jun 17, 2016 at 4:07 AM, Jamal Hadi Salim  wrote:
> On 16-06-17 01:38 AM, Cong Wang wrote:
>>
>> On Thu, Jun 16, 2016 at 7:14 PM, Cong Wang 
>> wrote:
>>>
>>>
>>> I think we can just remove that tcf_lock, I am testing a patch now.
>>
>>
>> Please try the attached patch, I will do more tests tomorrow.
>>
>> Thanks!
>>
>
> Cong, What tree are you using? I dont see the time aggregation patches
> that I sent (and Dave took in) in your changes.

My patch is against -net. (I see you already figured out your patch is
missing in -net-next.)

Or are you suggesting to rebase it for -net-next? I think it fixes some real
bug so -net is better, although it is slightly large as a bug fix.

>
> Comments:
> Is GFP_ATOMIC really necessary? Thats user->kernel interface. GFP_KERNEL
> should be sufficient.

I added a read_lock(ife_mod_lock), this is why we need
GFP_ATOMIC.

Again, don't worry, this change should be in a separated patch,
you will not miss it again when I send them formally. ;)

> Also, it would be nice to kill the lock - but this feels like two
> patches in one. 1) to fix the alloc not to be under the lock 2) to
> kill said lock. Maybe split it as such for easier review.
> I am using this action extensively so will be happy to test.
> I think my patch is a good beginning to #1 - if you fix the forgotten
> unlock and ensure we lock around updating ife fields when it exists
> already (you said it in your earlier email and I thought about
> that afterwards).

Yes, it makes sense too. Your patch is smaller, if you plan to
backport it to stable, we can use your patch for -net  and -stable
and I am happy to rebase mine for -net-next.

I am fine with either way.

Thanks!

Re: [BUG] act_ife: sleeping functions called in atomic context

2016-06-17 Thread Cong Wang

On Fri, Jun 17, 2016 at 4:05 AM, Alexey Khoroshilov
 wrote:
> On 17.06.2016 08:38, Cong Wang wrote:
>> On Thu, Jun 16, 2016 at 7:14 PM, Cong Wang  wrote:
>>>
>>> I think we can just remove that tcf_lock, I am testing a patch now.
>>
>> Please try the attached patch, I will do more tests tomorrow.
>>
>> Thanks!
>>
>
> Looks good with two notes:
> 1. add_metainfo() still contains
> ret = ops->alloc(mi, metaval);
> that allocates memory with GFP_KERNEL.
> So, I would add gfpflag argument to alloc() operation.

I thought about this too, but we just allocate 32+ bytes here,
not sure if it is really worth to pass a gfp flag.

>
> 2. It makes sense to mention ife_mod_lock in the comment before
> add_metainfo(), because ife_mod_lock is the reason to use GFP_ATOMIC there.

Don't worry, it is in a separated patch, I will explain this
in the changelog. (I sent a combined patch just for review/tests.)

Thanks!

Re: [patch net-next v4 0/4] return offloaded stats as default and expose original sw stats

2016-06-17 Thread Florian Fainelli

On 06/17/2016 08:42 AM, Jiri Pirko wrote:
> Fri, Jun 17, 2016 at 05:35:53PM CEST, d...@cumulusnetworks.com wrote:
>> On 6/17/16 8:54 AM, Jamal Hadi Salim wrote:
>>> On 16-06-17 10:05 AM, Jiri Pirko wrote:
 Fri, Jun 17, 2016 at 03:48:35PM CEST, d...@cumulusnetworks.com wrote:
> On 6/17/16 2:24 AM, Jiri Pirko wrote:
>>
>>>

 That is problematic. Existing apps depend on rtnetlink stats. But if we
 don't count offloaded forwarded packets, the apps don't see anything.
 Therefore I believe that this patchset approach is better. The existing
 apps continue to work and future apps can use newly introduces sw_stats
 to query slowpath traffic. Makes sense to me.

>>>
>>> I agree with Jiri. It is a bad idea to depend on ethtool for any of
>>> this stuff. Is there a way we can tag netlink stats instead
>>> to indicate they are hardware or software?
>>
>> Right, old API but the key here is that low level h/w stats are returned by a
>> different API.
>>
>> By default ip, ifconfig, snmpd, etc all continue to get traditional S/W stats
>> - counters as seen by the CPU.
> 
> Yep. And I believe that for offloaded forwarding, this tools should see
> hw counters, as they show what is going on in real.

If your NIC is offloading packets today, these tools typically won't see
these stats, but ethtool -S likely will report what is going on under
the hood.

Do we actually need to tell apart SW maintained from HW maintained
stats, or at the end all that matters is just, as DaveM pointed out,
getting the information, and in the case of an Ethernet switch, return
HW stats by default and supplement with SW stats whenever we have them,
all in the same namespace?
-- 
Florian

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Nicolas Dichtel

Le 17/06/2016 18:46, Daniel Borkmann a écrit :
> On 06/17/2016 06:34 PM, Stephen Hemminger wrote:
>> On Fri, 17 Jun 2016 16:09:20 +
>> Daniel Borkmann  wrote:
>>
>>> Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc
>>> (< 4.6)") ...
>>>
>>> Your changes effectively revert them again. Here, and some other parts of 
>>> the
>>> bpf frontend
>>> code bits.
>>
>> GCC 4.6 is 3 years old. So perhaps it is time to move on.
>> Maybe add a GCC version check in the makefile, to fail cleanly.
> 
> Well, you don't have to ask me but rather the patch submitters (Cc).
> 
> I haven't used RHEL in quite a while, but I could imagine it might
> be related to built it there perhaps.
Yes. For some specific arch, we have only old toolchains.

The rule was always to be backward compatible with old kernels. It implies to
also support the compilation with old toolchains ;-)

Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-17 Thread Tom Herbert

On Thu, Jun 16, 2016 at 4:15 PM, Hannes Frederic Sowa
 wrote:
> On 16.06.2016 19:51, Tom Herbert wrote:
>> Transports over UDP is intended to encapsulate TCP and other transport
>> protocols directly and securely in UDP.
>>
>> The goal of this work is twofold:
>>
>> 1) Allow applications to run their own transport layer stack (i.e.from
>>userspace). This eliminates dependencies on the OS (e.g. solves a
>>major dependency issue for Facebook on clients).
>>
>> 2) Make transport layer headers (all of L4) invisible to the network
>>so that they can't do intrusive actions at L4. This will be enforced
>>with DTLS in use.
>>
>> Note that #1 is really about running a transport stack in userspace
>> applications in clients, not necessarily servers. For servers we
>> intend to modified the kernel stack in order to leverage existing
>> implementation for building scalable serves (hence these patches).
>>
>> This is described in more detail in the Internet Draft:
>> https://tools.ietf.org/html/draft-herbert-transports-over-udp-00
>>
>> In Part I we implement a straightforward encapsulation of TCP in GUE.
>> The implements the basic mechanics of TOU encapsulation for TCP,
>> however does not yet implement the IP addressing interactions so
>> therefore so this is not robust to use in the presence of NAT.
>> TOU is enabled per socket with a new socket option. This
>> implementation includes GSO, GRO, and RCO support.
>>
>> These patches also establish the baseline performance of TOU
>> and isolate the performance cost of UDP encapsulation. Performance
>> results are below.
>>
>> Tested: Various cases of TOU with IPv4, IPv6 using TCP_STREAM and
>> TCP_RR. Also, tested IPIP for comparing TOU encapsulation to IP
>> tunneling.
>
> Thinking about middleboxes again:
>
> E.g. https://tools.ietf.org/html/rfc6347#section-4.2.3 states that DTLS
> packets are not allowed to be fragmented. Because of this and
> furthermore because of the impossibility of clamp-mss-to-pmtu to work
> anymore, do you have any idea on how reliable this can work?
>
> Or is your plan to use a smaller MSS on all paths by default?
>
Normal PMTU discovery mechanisms are applicable to prevent
fragmentation. The overhead is accounted for in the MSS (similar to
overhead of TCP options of IPv6 extension headers). Besides that,
RFC6347 describes how fragmentation should be avoided, it does not
explicitly forbid fragmentation, no IP protocol can outright forbid
it. At most they could try to require DF bit is always set but that
won't always be obeyed like when packets are tunneled in the network.

Tom

> Thanks,
> Hannes
>

Re: [PATCH net-next 12/18] IB/mlx5: Add kernel offload flow-tag

2016-06-17 Thread John Fastabend

On 16-06-17 09:00 AM, Alexei Starovoitov wrote:
> On Fri, Jun 17, 2016 at 05:43:53PM +0300, Saeed Mahameed wrote:
>> From: Maor Gottlieb 
>>
>> Add kernel offload flow tag for packets that will bypass the kernel
>> stack, e.g (RoCE/RDMA/RAW ETH (DPDK), etc ..).
> 
> so the whole series is an elaborate way to enable dpdk? how nice.
> NACK.
> 

Well there is certainly room for augmenting the af_packet interfac with
hardware support.

Some things on my list (its a bit behind a few other things though) is
to align queues to af_packet sockets, put those queues in busy poll and
if possible look at zero copy RX. The problem with zero copy rx is it
bypasses the stack but we should be able to detect hooks being added on
ingress and disable it dynamically. Maybe I could look at this in a few
months but think about it for me I'm a bit busy lately. Also it requires
the driver to translate descriptor formats but I'm not convinced it is
that costly to do.

For DPDK why not just use SR-IOV like everyone else and bind a VF to
your favorite user space implementation (DPDK, NETMAP, PFRING, foobar0)
even Windows if you like. DPDK even supports this as far as I know.
Then you don't need to bother kernel folks at all. And you don't have
any overhead except from whatever your usermode code does.

Here's a really old patch I wrote that I would like to revisit at some
point,

---

>> This adds ndo ops for upper layer objects to request direct DMA from
>> the network interface into memory "slots". The slots must be DMA'able
>> memory given by a page/offset/size vector in a packet_ring_buffer
>> structure.
>>
>> The PF_PACKET socket interface can use these ndo_ops to do zerocopy
>> RX from the network device into memory mapped userspace memory. For
>> this to work ixgbe encodes the correct descriptor blocks and headers
>> so that existing PF_PACKET applications work without any modification.
>> This only supports the V2 header formats. And works by mapping a ring
>> of the network device to these slots.
>>
>> V3 header formats added bulk polling via socket calls and timers
>> used in the polling interface to return every n milliseconds. Currently,
>> I don't see any way to support this in hardware because we can't
>> know if the hardware is in the middle of a DMA operation or not
>> on a slot. So when a timer fires I don't know how to advance the
>> descriptor ring leaving empty descriptors similar to how the software
>> ring works. The easiest (best?) route is to simply not support this.
>>
>> The ndo operations and new socket option PACKET_RX_DIRECT work by
>> selecting a queue_index to run the direct dma operations over. Once
>> setsockopt returns sucessfully the indicated queue is mapped
>> directly to the requesting application and can not be used for
>> other purposes. Also any kernel layers such as BPF will be bypassed
>> and need to be implemented in the hardware via some other mechanism
>> such as flow classifier or future offload interfaces.
>>
>> Users steer traffic to the selected queue using flow director or
>> VMDQ. This needs to implemented through the ioctl flow classifier
>> interface (ethtool) or macvlan+hardware offload via netlink the
>> command line tool ip also supports macvlan+hardware_offload.
>>
>> The new socket option added to PF_PACKET is called PACKET_RX_DIRECT.
>> It takes a single unsigned int value specifying the queue index,
>>
>>  setsockopt(sock, SOL_PACKET, PACKET_RX_DIRECT,
>> _index, sizeof(queue_index));
>>
>> To test this I modified the tool psock_tpacket in the selftests
>> kernel directory here:
>>
>>  ./tools/testing/selftests/net/psock_tpacket.c
>>
>> Running this tool opens a socket and listend for packets over
>> the PACKET_RX_DIRECT enabled socket.
>>
>> One more caveat is the ring size of ixgbe and the size used by
>> the software socket need to be the same. There is currently an
>> ethtool to configure this and a warnding is pushed to dmesg when
>> it is not the case. To set use an ioctl directly or call
>>
>>  ethtool -G ethx rx 
>>
>> where  is the number of configured slots. The demo program
>> psock_tpacket needs size=2400.
>>
>> Known Limitations (TBD):
>>
>>  (1) Users are required to match the number of rx ring
>>  slots with ethtool to the number requested by the
>>  setsockopt PF_PACKET layout. In the future we could
>>  possibly do this automatically. More importantly this
>>  setting is currently global for all rings and needs
>>  to be per queue.
>>
>>  (2) Users need to configure Flow director or VMDQ (macvlan)
>>  to steer traffic to the correct queues. I don't believe
>>  this needs to be changed it seems to be a good mechanism
>>  for driving ddma.
>>
>>  (3) Not supporting timestamps or priv space yet
>>
>>  (4) Not supporting BPF yet...
>>
>>  (5) Only RX supported so far. TX already supports direct DMA
>>

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Daniel Borkmann


On 06/17/2016 06:34 PM, Stephen Hemminger wrote:

On Fri, 17 Jun 2016 16:09:20 +
Daniel Borkmann  wrote:


Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 
4.6)") ...

Your changes effectively revert them again. Here, and some other parts of the 
bpf frontend
code bits.


GCC 4.6 is 3 years old. So perhaps it is time to move on.
Maybe add a GCC version check in the makefile, to fail cleanly.


Well, you don't have to ask me but rather the patch submitters (Cc).

I haven't used RHEL in quite a while, but I could imagine it might
be related to built it there perhaps.

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Stephen Hemminger

On Fri, 17 Jun 2016 16:09:20 +
Daniel Borkmann  wrote:

> Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc 
> (< 4.6)") ...
> 
> Your changes effectively revert them again. Here, and some other parts of the 
> bpf frontend
> code bits.

GCC 4.6 is 3 years old. So perhaps it is time to move on.
Maybe add a GCC version check in the makefile, to fail cleanly.

[PATCH] net: tilegx: use correct timespec64 type

2016-06-17 Thread Arnd Bergmann

The conversion to the 64-bit time based ptp methods left two instances
of 'struct timespec' in place. This is harmless because 64-bit
architectures define timespec64 as timespec, and this driver is
not used on 32-bit machines.

However, using 'struct timespec64' directly is obviously the right
thing to do, and will help us remove 'struct timespec' in the future.

Signed-off-by: Arnd Bergmann 
Fixes: b9acf24f779c ("ptp: tilegx: convert to the 64 bit get/set time methods.")
---
 drivers/net/ethernet/tile/tilegx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/tile/tilegx.c 
b/drivers/net/ethernet/tile/tilegx.c
index 0a15acc075b3..11213a38c795 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -462,7 +462,7 @@ static void tile_tx_timestamp(struct sk_buff *skb, int 
instance)
if (unlikely((shtx->tx_flags & SKBTX_HW_TSTAMP) != 0)) {
struct mpipe_data *md = _data[instance];
struct skb_shared_hwtstamps shhwtstamps;
-   struct timespec ts;
+   struct timespec64 ts;
 
shtx->tx_flags |= SKBTX_IN_PROGRESS;
gxio_mpipe_get_timestamp(>context, );
@@ -886,9 +886,9 @@ static struct ptp_clock_info ptp_mpipe_caps = {
 /* Sync mPIPE's timestamp up with Linux system time and register PTP clock. */
 static void register_ptp_clock(struct net_device *dev, struct mpipe_data *md)
 {
-   struct timespec ts;
+   struct timespec64 ts;
 
-   getnstimeofday();
+   ktime_get_ts64();
gxio_mpipe_set_timestamp(>context, );
 
mutex_init(>ptp_lock);
-- 
2.9.0

Re: [PATCH 1/2] mlx5: only register devlink when ethernet is available

2016-06-17 Thread Leon Romanovsky

On Fri, Jun 17, 2016 at 05:02:33PM +0200, Arnd Bergmann wrote:
> On Friday, June 17, 2016 5:50:14 PM CEST Saeed Mahameed wrote:
> > On Wed, Jun 15, 2016 at 11:50 PM, Arnd Bergmann  wrote:
> > > On Wednesday, June 15, 2016 7:04:54 PM CEST Saeed Mahameed wrote:
> > > Ok, I see. It would be nice if the process had a way to avoid build 
> > > regressions
> > > in linux-next, in particular if you already have a fix by the time a patch
> > > that introduces a problem gets added.
> > >
> > 
> > The reason we added this tree is to get 0-day testing but currently it
> > makes some unwanted noise
> > so we will remove it until we figure it out.
> 
> I think you can simply ask Fengguang Wu to add your git tree to the list
> of trees he pulls from for the 0-day test bot.

It is not 0-day only, but linux-next too. It works flawlessly for RDMA
topics and Doug receives cleaned and fully tested patches. Sadly enough,
it didn't work well for mlx5 net part.

Till further notice, I removed mlx5 net part (submission queue) from my
tree and from linux-next.


signature.asc
Description: Digital signature

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Daniel Borkmann


On 06/17/2016 05:56 PM, Phil Sutter wrote:

This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable.

Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.

The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.

Signed-off-by: Phil Sutter 
---
  bridge/fdb.c |  29 +++--
  bridge/link.c|  16 +++
  bridge/mdb.c |  19 -
  bridge/vlan.c|  19 -
  genl/ctrl.c  |  48 +
  ip/ip6tunnel.c   |  10 ++---
  ip/ipaddress.c   |  31 ++
  ip/ipaddrlabel.c |  23 +-
  ip/iplink.c  |  67 ++---
  ip/iplink_can.c  |   4 +-
  ip/ipmaddr.c |  27 +---
  ip/ipmroute.c|   8 +---
  ip/ipneigh.c |  36 
  ip/ipnetconf.c   |  12 +++---
  ip/ipnetns.c |  45 ++-
  ip/ipntable.c|  27 +---
  ip/iproute.c |  85 
  ip/iprule.c  |  26 +--
  ip/iptoken.c |  21 -
  ip/iptunnel.c|  31 --
  ip/ipxfrm.c  |  26 +++
  ip/link_gre.c|  22 +-
  ip/link_gre6.c   |  22 +-
  ip/link_ip6tnl.c |  29 ++---
  ip/link_iptnl.c  |  26 +--
  ip/link_vti.c|  22 +-
  ip/link_vti6.c   |  22 +-
  ip/xfrm_policy.c | 110 ++-
  ip/xfrm_state.c  | 128 +++
  lib/libnetlink.c |  74 ++--
  lib/ll_map.c |   1 -
  misc/arpd.c  |  68 ++---
  misc/ss.c|  41 --
  tc/e_bpf.c   |   7 +--
  tc/em_cmp.c  |   4 +-
  tc/em_ipset.c|   4 +-
  tc/em_meta.c |   4 +-
  tc/em_nbyte.c|   4 +-
  tc/em_u32.c  |   4 +-
  tc/f_flow.c  |   3 --
  tc/f_flower.c|   3 +-
  tc/f_fw.c|   6 +--
  tc/f_route.c |   3 --
  tc/f_rsvp.c  |   6 +--
  tc/f_u32.c   |  12 ++
  tc/m_action.c|  51 +-
  tc/m_bpf.c   |   5 +--
  tc/m_csum.c  |   4 +-
  tc/m_ematch.c|   4 +-
  tc/m_gact.c  |   5 +--
  tc/m_ife.c   |   5 +--
  tc/m_mirred.c|   7 +--
  tc/m_nat.c   |   4 +-
  tc/m_pedit.c |   8 +---
  tc/m_police.c|   5 +--
  tc/q_atm.c   |   3 +-
  tc/q_cbq.c   |  22 +++---
  tc/q_choke.c |   4 +-
  tc/q_codel.c |   3 +-
  tc/q_dsmark.c|   1 -
  tc/q_fifo.c  |   4 +-
  tc/q_fq_codel.c  |   3 +-
  tc/q_hfsc.c  |  13 ++
  tc/q_htb.c   |  15 +++
  tc/q_netem.c |  16 +++
  tc/q_red.c   |   4 +-
  tc/q_sfb.c   |  17 
  tc/q_sfq.c   |   4 +-
  tc/q_tbf.c   |   4 +-
  tc/tc_bpf.c  |  96 +
  tc/tc_class.c|  33 ++
  tc/tc_exec.c |   3 +-
  tc/tc_filter.c   |  35 ++-
  tc/tc_qdisc.c|  35 ++-
  tc/tc_stab.c |   4 +-
  tc/tc_util.c |   3 +-
  76 files changed, 699 insertions(+), 956 deletions(-)

[...]

Hmm, seems like a lot of stuff ...

[...]

diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index 86c6069b68ec4..8a264531dcdd4 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -86,13 +86,12 @@ static int bpf(int cmd, union bpf_attr *attr, unsigned int 
size)
  static int bpf_map_update(int fd, const void *key, const void *value,
  uint64_t flags)
  {
-   union bpf_attr attr;
-
-   memset(, 0, sizeof(attr));
-   attr.map_fd = fd;
-   attr.key = bpf_ptr_to_u64(key);
-   attr.value = bpf_ptr_to_u64(value);
-   attr.flags = flags;
+   union bpf_attr attr = {
+   .map_fd = fd,
+   .key = bpf_ptr_to_u64(key),
+   .value = bpf_ptr_to_u64(value),
+   .flags = flags
+   };

return bpf(BPF_MAP_UPDATE_ELEM, , sizeof(attr));

[...]

Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 
4.6)") ...

Your changes effectively revert them again. Here, and some other parts of the 
bpf frontend
code bits.

Thanks,
Daniel

Re: [PATCH resend] net: sock: Add option for memory optimized hints.

2016-06-17 Thread Eric Dumazet

On Fri, 2016-06-17 at 16:39 +0200, peter enderborg wrote:

> It is mainly for af_unix sockets, and the effect is
> quite significant when you hit a compaction, or with
> this patch avoid get in to compaction, but it
> can also be used for reducing the pressure on memory
> for tcp.

BTW, TCP always attempt order-3 allocations, even if you do a write(fd,
buffer, 4000)


So really your patch wont help.

We need to fix the mm layer (if needed), not add various works around.

Re: [PATCH resend] net: sock: Add option for memory optimized hints.

2016-06-17 Thread Eric Dumazet

On Fri, 2016-06-17 at 16:39 +0200, peter enderborg wrote:
> On 06/17/2016 04:14 PM, Eric Dumazet wrote:
> > On Fri, 2016-06-17 at 15:58 +0200, peter enderborg wrote:
> >> From: Peter Enderborg 
> >>
> >> When sending data the socket allocates memory for
> >> payload on a cache or a page alloc. The page alloc
> >> then might trigger compation that takes long time.
> >> This can be avoided with smaller chunks. But
> >> userspace can not know what is the right size for
> >> the smaller sends. For this we add a SIZEHINT
> >> getsocketopt where the userspace can get the size
> >> for send that will fit into one page (order 0) or
> >> the max for a slab cache allocation.
> >
> > For which kind of sockets exactly you hit a problem ?
> >
> > Sorry, this patch is probably not helping in any way.
> >
> It is mainly for af_unix sockets, and the effect is
> quite significant when you hit a compaction, or with
> this patch avoid get in to compaction, but it
> can also be used for reducing the pressure on memory
> for tcp. And the patches you suggested have been
> applied (with the addition "af_unix: fix bug on large send()")
> I see that there is a lot of other compaction fixes
> recently but the problem are still there. And of course
> to make any difference you need to change your
> userland application too. But in our Qualcomm/Google
> bastard to kernel. It makes a huge difference on the
> behaviour of send(). But I also does not see this as
> perfect solution. A wake-up function that has
> the buffers reserved would be better.Or a pre allocated
> send buffer would also be better. But I dont expect that
> linux will have a real-time socket implementation in
> near future.

I have no evidence the problem you describe still exists in current
linux kernels.

Please patch your kernels, but do not send networking patches that seem
to work around a mm-layer problem, without notifying mm maintainers.

1 2 >

1 - 100 of 183 matches

Mail list logo