Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Grant Zhang
Thank you Neal. Most likely I will test the patch on Monday and report
back the result.

As for the TcpExtTCPSACKReneging counter, attached is the captured
counter value on a 1-second interval for 10 minutes.

Thanks,

Grant




reneg.log
Description: Binary data




 On May 30, 2015, at 10:29 AM, Neal Cardwell ncardw...@google.com wrote:
 
 On Fri, May 29, 2015 at 3:53 PM, Grant Zhang gzh...@fastly.com wrote:
 Hi Neal,
 
 I will be more happy to test the patch. Please send it my way.
 
 Great. Thank you so much for being willing to do this. Attached is a
 patch for testing. I generated it and tested it relative to Linux
 v3.14.39, since your stack trace seemed to suggest that you were
 seeing this on some variant of v3.14.39. (Newer kernels would need a
 slightly different patch, since the reneging code path has changed a
 little since 3.14.)
 
 Can you please try it out and see if it makes that warning go away?
 
 Also, I would be interested in seeing the value of your
 TcpExtTCPSACKReneging counter, and some sense of how fast that value
 is increasing, on a machine that's seeing this issue:
  nstat -z -a | grep Reneg
 
 Thanks!
 
 neal
 0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch



Re: Ingress tc filters with IPSec

2015-05-30 Thread jsulli...@opensourcedevel.com

 On May 30, 2015 at 2:24 AM John A. Sullivan III
 jsulli...@opensourcedevel.com wrote:


 On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote:
  Argh! yet another obstacle from my ignorance. We are attempting ingress
  traffic shaping using IFB interfaces on traffic coming via GRE / IPSec.
  Filters and hash tables are working fine with plain GRE including
  stripping the header. We even got the ematch filter working so that the
  ESP packets are the only packets not redirected to IFB.
 
  But, regardless of whether we redirect ESP packets to IFB, the filters
  never see the decrypted packets. I thought the packets passed through
  the interface twice - first encrypted and they decrypted. However,
  tcpdump only shows the ESP packets on the interface.
 
  How do we apply filters to the packets after decryption? Thanks - John

 I see what changed. In the past, this seemed to work but we were using
 tunnel mode. We were trying to use transport mode in this application
 but that seems to prevent the decrypted packet contents from appearing
 again on the interface. Reverting to tunnel mode made the contents
 visible again and our filters are working as expected - John

Alas, this is still a problem since we are using VRRP and the tunnel end points
are the virtual IP addresses.  That makes StrongSWAN choke on selector matching
in tunnel mode so back to trying to make transport mode work.

I am guessing we do not see the second pass of the packet because it is only
encrypted and not encapsulated.  So my hunch is that we ned to pass the ESP
packet into the ifb qdisc but need to look elsewhere the packet for the filter
matching information.  We know that matching on the normal offsets does not work
so I am hoping the decrypted packet is decipherable by the filter matching logic
but just still has all the ESP transport header attached.

Normally, to extract the contents of my GRE tunnel, I would place them into a
separate hash table with the GRE header stripped off and then filter them into
TCP and UDP hast tables:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus 4
eat

So we match the GRE protocol and determine that GRE is carrying an IP packet.
 With the ESP transport header and IV (AES = 16B) interposed between the IP
header and the GRE header, I suppose the first part of this filter becomes:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff match u16 0x0800 0x at 46

but what do I do with the second half to find the start of the TCP/UDP header?
Is it still offset at 0 because tc filter somehow knows where the interior IP
header starts or should it be offset at 48 to account for the GRE + ESP headers?
Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? Thanks
- John
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ingress tc filters with IPSec

2015-05-30 Thread jsulli...@opensourcedevel.com

 On May 30, 2015 at 4:12 PM jsulli...@opensourcedevel.com
 jsulli...@opensourcedevel.com wrote:



  On May 30, 2015 at 2:24 AM John A. Sullivan III
  jsulli...@opensourcedevel.com wrote:
 
 
  On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote:
   Argh! yet another obstacle from my ignorance. We are attempting ingress
   traffic shaping using IFB interfaces on traffic coming via GRE / IPSec.
   Filters and hash tables are working fine with plain GRE including
   stripping the header. We even got the ematch filter working so that the
   ESP packets are the only packets not redirected to IFB.
  
   But, regardless of whether we redirect ESP packets to IFB, the filters
   never see the decrypted packets. I thought the packets passed through
   the interface twice - first encrypted and they decrypted. However,
   tcpdump only shows the ESP packets on the interface.
  
   How do we apply filters to the packets after decryption? Thanks - John
 
  I see what changed. In the past, this seemed to work but we were using
  tunnel mode. We were trying to use transport mode in this application
  but that seems to prevent the decrypted packet contents from appearing
  again on the interface. Reverting to tunnel mode made the contents
  visible again and our filters are working as expected - John

 Alas, this is still a problem since we are using VRRP and the tunnel end
 points
 are the virtual IP addresses. That makes StrongSWAN choke on selector matching
 in tunnel mode so back to trying to make transport mode work.

 I am guessing we do not see the second pass of the packet because it is only
 encrypted and not encapsulated. So my hunch is that we ned to pass the ESP
 packet into the ifb qdisc but need to look elsewhere the packet for the filter
 matching information. We know that matching on the normal offsets does not
 work
 so I am hoping the decrypted packet is decipherable by the filter matching
 logic
 but just still has all the ESP transport header attached.

 Normally, to extract the contents of my GRE tunnel, I would place them into a
 separate hash table with the GRE header stripped off and then filter them into
 TCP and UDP hast tables:

 tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus
 4
 eat

 So we match the GRE protocol and determine that GRE is carrying an IP packet.
 With the ESP transport header and IV (AES = 16B) interposed between the IP
 header and the GRE header, I suppose the first part of this filter becomes:

 tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
 0xff match u16 0x0800 0x at 46

 but what do I do with the second half to find the start of the TCP/UDP header?
 Is it still offset at 0 because tc filter somehow knows where the interior IP
 header starts or should it be offset at 48 to account for the GRE + ESP
 headers?
 Or is there a better way to filter ingress traffic on GRE/IPSec tunnels?
 Thanks
 - John

Alas, this is not working.  I set a continue action for the ESP traffic:

tc filter replace dev ifb0 parent 11:0 protocol ip prio 1 u32 match ip protocol
50 0xff action continue

and that seems to be matching:

filter parent 11: protocol ip pref 1 u32 fh 802::800 order 2048 key ht 802 bkt 0
terminal flowid ???  (rule hit 3130003 success 2931853)
  match 0032/00ff at 8 (success 2931853 ) 
action order 1: gact action continue
 random type none pass val 0
 index 1 ref 1 bind 1 installed 294 sec

And I even reduced the GRE filter to just look for the GRE protocol in the IP
header:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff link 11: offset at 48 mask 0f00 shift 6 plus 4 eat

but it does not appear to be matching at all:

filter parent 11: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0
link 11:  (rule hit 3130012 success 0)
  match 002f/00ff at 8 (success 0 ) 
offset 0f006 at 48 plus 4  eat 

Any suggestions about how to traffic shape ingest traffic coming off an ESP
Transport connection? Thanks - John
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 81/98] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compiler error:

error: unknown type name ‘uint32_t’

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/openvswitch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..0ab8eca 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -586,8 +586,8 @@ enum ovs_hash_alg {
  * @hash_basis: basis used for computing hash.
  */
 struct ovs_action_hash {
-   uint32_t  hash_alg; /* One of ovs_hash_alg. */
-   uint32_t  hash_basis;
+   __u32  hash_alg; /* One of ovs_hash_alg. */
+   __u32  hash_basis;
 };
 
 /**
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Neal Cardwell
On Fri, May 29, 2015 at 3:53 PM, Grant Zhang gzh...@fastly.com wrote:
 Hi Neal,

 I will be more happy to test the patch. Please send it my way.

Great. Thank you so much for being willing to do this. Attached is a
patch for testing. I generated it and tested it relative to Linux
v3.14.39, since your stack trace seemed to suggest that you were
seeing this on some variant of v3.14.39. (Newer kernels would need a
slightly different patch, since the reneging code path has changed a
little since 3.14.)

Can you please try it out and see if it makes that warning go away?

Also, I would be interested in seeing the value of your
TcpExtTCPSACKReneging counter, and some sense of how fast that value
is increasing, on a machine that's seeing this issue:
  nstat -z -a | grep Reneg

Thanks!

neal


0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch
Description: Binary data


[PATCH net-next 1/3] s390/bpf: fix stack allocation

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu holz...@linux.vnet.ibm.com

On s390x we have to provide 160 bytes stack space before we can call
the next function. From the 160 bytes that we got from the previous
function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left.
Currently for BPF we allocate additional 160 - 11 * 8 bytes for the
next function. This is wrong because then the next function only gets:

 (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes

Fix this and allocate enough memory for the next function.

Cc: sta...@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
Acked-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/s390/net/bpf_jit.h |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index ba8593a515ba..de156ba3bd71 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -48,7 +48,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  * We get 160 bytes stack space from calling function, but only use
  * 11 * 8 byte (old backchain + r15 - r6) for storing registers.
  */
-#define STK_OFF (MAX_BPF_STACK + 8 + 4 + 4 + (160 - 11 * 8))
+#define STK_SPACE  (MAX_BPF_STACK + 8 + 4 + 4 + 160)
+#define STK_160_UNUSED (160 - 11 * 8)
+#define STK_OFF(STK_SPACE - STK_160_UNUSED)
 #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */
 #define STK_OFF_HLEN   168 /* Offset of SKB header length on stack */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/3] s390/bpf: fix bpf frame pointer setup

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu holz...@linux.vnet.ibm.com

Currently the bpf frame pointer is set to the old r15. This is
wrong because of packed stack. Fix this and adjust the frame pointer
to respect packed stack. This now generates a prolog like the following:

 3ff8001c3fa: eb67f0480024   stmg%r6,%r7,72(%r15)
 3ff8001c400: ebcff0780024   stmg%r12,%r15,120(%r15)
 3ff8001c406: b904001f   lgr %r1,%r15  - load backchain
 3ff8001c40a: 41d0f048   la  %r13,72(%r15) - load adjusted bfp
 3ff8001c40e: a7fbfd98   aghi%r15,-616
 3ff8001c412: e310f0980024   stg %r1,152(%r15) - save backchain

Cc: sta...@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
Acked-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/s390/net/bpf_jit_comp.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 20c146d1251a..55423d8be580 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -384,13 +384,16 @@ static void bpf_jit_prologue(struct bpf_jit *jit)
}
/* Setup stack and backchain */
if (jit-seen  SEEN_STACK) {
-   /* lgr %bfp,%r15 (BPF frame pointer) */
-   EMIT4(0xb904, BPF_REG_FP, REG_15);
+   if (jit-seen  SEEN_FUNC)
+   /* lgr %w1,%r15 (backchain) */
+   EMIT4(0xb904, REG_W1, REG_15);
+   /* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */
+   EMIT4_DISP(0x4100, BPF_REG_FP, REG_15, STK_160_UNUSED);
/* aghi %r15,-STK_OFF */
EMIT4_IMM(0xa70b, REG_15, -STK_OFF);
if (jit-seen  SEEN_FUNC)
-   /* stg %bfp,152(%r15) (backchain) */
-   EMIT6_DISP_LH(0xe300, 0x0024, BPF_REG_FP, REG_0,
+   /* stg %w1,152(%r15) (backchain) */
+   EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0,
  REG_15, 152);
}
/*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/3] s390/bpf: implement bpf_tail_call() helper

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu holz...@linux.vnet.ibm.com

bpf_tail_call() arguments:

 - ctx..: Context pointer
 - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
 - index: Index in the jump table

In this implementation s390x JIT does stack unwinding and jumps into the
callee program prologue. Caller and callee use the same stack.

With this patch a tail call generates the following code on s390x:

 if (index = array-map.max_entries)
 goto out
 03ff8001c7e4: e31030100016   llgf%r1,16(%r3)
 03ff8001c7ea: ec41001fa065   clgrj   %r4,%r1,10,3ff8001c828

 if (tail_call_cnt++  MAX_TAIL_CALL_CNT)
 goto out;
 03ff8001c7f0: a7080001   lhi %r0,1
 03ff8001c7f4: eb10f25000fa   laal%r1,%r0,592(%r15)
 03ff8001c7fa: ec120017207f   clij%r1,32,2,3ff8001c828

 prog = array-prog[index];
 if (prog == NULL)
 goto out;
 03ff8001c800: eb140003000d   sllg%r1,%r4,3
 03ff8001c806: e3131084   lg  %r1,128(%r3,%r1)
 03ff8001c80c: ec18000e007d   clgij   %r1,0,8,3ff8001c828

 Restore registers before calling function
 03ff8001c812: eb68f2980004   lmg %r6,%r8,664(%r15)
 03ff8001c818: ebbff2c4   lmg %r11,%r15,704(%r15)

 goto *(prog-bpf_func + tail_call_start);
 03ff8001c81e: e3110024   lg  %r1,32(%r1,%r0)
 03ff8001c824: 47f01006   bc  15,6(%r1)

Reviewed-by: Martin Schwidefsky schwidef...@de.ibm.com
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
Acked-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/s390/net/bpf_jit.h  |   10 +++-
 arch/s390/net/bpf_jit_comp.c |  106 +-
 2 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index de156ba3bd71..f6498eec9ee1 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -28,6 +28,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  *   | old backchain | |
  *   +---+ |
  *   |   r15 - r6| |
+ *   +---+ |
+ *   | 4 byte align  | |
+ *   | tail_call_cnt | |
  * BFP- +===+ |
  *   |   | |
  *   |   BPF stack   | |
@@ -46,14 +49,17 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  * R15- +---+ + low
  *
  * We get 160 bytes stack space from calling function, but only use
- * 11 * 8 byte (old backchain + r15 - r6) for storing registers.
+ * 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt.
  */
 #define STK_SPACE  (MAX_BPF_STACK + 8 + 4 + 4 + 160)
-#define STK_160_UNUSED (160 - 11 * 8)
+#define STK_160_UNUSED (160 - 12 * 8)
 #define STK_OFF(STK_SPACE - STK_160_UNUSED)
 #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */
 #define STK_OFF_HLEN   168 /* Offset of SKB header length on stack */
 
+#define STK_OFF_R6 (160 - 11 * 8)  /* Offset of r6 on stack */
+#define STK_OFF_TCCNT  (160 - 12 * 8)  /* Offset of tail_call_cnt on stack */
+
 /* Offset to skip condition code check */
 #define OFF_OK 4
 
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 55423d8be580..d3766dd67e23 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -21,6 +21,7 @@
 #include linux/netdevice.h
 #include linux/filter.h
 #include linux/init.h
+#include linux/bpf.h
 #include asm/cacheflush.h
 #include asm/dis.h
 #include bpf_jit.h
@@ -40,6 +41,8 @@ struct bpf_jit {
int base_ip;/* Base address for literal pool */
int ret0_ip;/* Address of return 0 */
int exit_ip;/* Address of exit */
+   int tail_call_start;/* Tail call start offset */
+   int labels[1];  /* Labels for local jumps */
 };
 
 #define BPF_SIZE_MAX   4096/* Max size for program */
@@ -49,6 +52,7 @@ struct bpf_jit {
 #define SEEN_RET0  4   /* ret0_ip points to a valid return 0 */
 #define SEEN_LITERAL   8   /* code uses literals */
 #define SEEN_FUNC  16  /* calls C functions */
+#define SEEN_TAIL_CALL 32  /* code uses tail calls */
 #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
 
 /*
@@ -60,6 +64,7 @@ struct bpf_jit {
 #define REG_L  (__MAX_BPF_REG+3)   /* Literal pool register */
 #define REG_15 (__MAX_BPF_REG+4)   /* Register 15 */
 #define REG_0  REG_W0  /* Register 0 */
+#define REG_1  REG_W1  /* Register 1 */
 #define REG_2  BPF_REG_1   /* Register 2 */
 #define REG_14 BPF_REG_0   /* Register 14 */
 
@@ -223,6 +228,24 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 
b1)
REG_SET_SEEN(b3);   \
 })
 
+#define EMIT6_PCREL_LABEL(op1, op2, 

[PATCH net-next 0/3] s390/bpf: implement bpf_tail_call JIT support

2015-05-30 Thread Alexei Starovoitov
This set is for net-next tree.

Patch 3 adds bpf_tail_call() support for s390x JIT. It has
a dependency on patches 1 and 2 that will also be submitted
to stable via Martin Schwidefsky.

Michael Holzheu (3):
  s390/bpf: fix stack allocation
  s390/bpf: fix bpf frame pointer setup
  s390/bpf: implement bpf_tail_call() helper

 arch/s390/net/bpf_jit.h  |   12 -
 arch/s390/net/bpf_jit_comp.c |  117 +++---
 2 files changed, 121 insertions(+), 8 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ingress tc filters with IPSec

2015-05-30 Thread John A. Sullivan III
On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote:
 Argh! yet another obstacle from my ignorance.  We are attempting ingress
 traffic shaping using IFB interfaces on traffic coming via GRE / IPSec.
 Filters and hash tables are working fine with plain GRE including
 stripping the header.  We even got the ematch filter working so that the
 ESP packets are the only packets not redirected to IFB.
 
 But, regardless of whether we redirect ESP packets to IFB, the filters
 never see the decrypted packets.  I thought the packets passed through
 the interface twice - first encrypted and they decrypted.  However,
 tcpdump only shows the ESP packets on the interface.
 
 How do we apply filters to the packets after decryption? Thanks - John

I see what changed.  In the past, this seemed to work but we were using
tunnel mode.  We were trying to use transport mode in this application
but that seems to prevent the decrypted packet contents from appearing
again on the interface.  Reverting to tunnel mode made the contents
visible again and our filters are working as expected - John

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] if_vlan: fix vlaue - value typo

2015-05-30 Thread David Miller
From: Vivien Didelot vivien.dide...@savoirfairelinux.com
Date: Wed, 27 May 2015 21:07:26 -0400

 Fixes vlaue for value in include/linux/if_vlan.h.
 
 Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: allow BPF programs access skb-skb_iif and skb-dev-ifindex fields

2015-05-30 Thread David Miller
From: Alexei Starovoitov a...@plumgrid.com
Date: Wed, 27 May 2015 15:30:39 -0700

 classic BPF already exposes skb-dev-ifindex via SKF_AD_IFINDEX extension.
 Allow eBPF program to access it as well. Note that classic aborts execution
 of the program if 'skb-dev == NULL' (which is inconvenient for program
 writers), whereas eBPF returns zero in such case.
 Also expose the 'skb_iif' field, since programs triggered by redirected
 packet need to known the original interface index.
 Summary:
 __skb-ifindex - skb-dev-ifindex
 __skb-ingress_ifindex - skb-skb_iif
 
 Signed-off-by: Alexei Starovoitov a...@plumgrid.com

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Neal Cardwell
On Sat, May 30, 2015 at 2:52 PM, Grant Zhang gzh...@fastly.com wrote:
 Thank you Neal. Most likely I will test the patch on Monday and report
 back the result.

 As for the TcpExtTCPSACKReneging counter, attached is the captured
 counter value on a 1-second interval for 10 minutes.

OK, great. Those TcpExtTCPSACKReneging values look consistent with the
theory underlying the patch, so that's a good sign.

Thanks!

neal
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2015-05-28

2015-05-30 Thread David Miller
From: Jeff Kirsher jeffrey.t.kirs...@intel.com
Date: Thu, 28 May 2015 04:25:25 -0700

 This series contains updates to ethtool, ixgbe, i40e and i40evf.
 
 John adds helper routines for ethtool to pass VF to rx_flow_spec.  Since
 the ring_cookie is 64 bits wide which is much larger than what could be
 used for actual queue index values, provide helper routines to pack a VF
 index into the cookie.  Then John provides a ixgbe patch to allow flow
 director to use the entire queue space.
 
 Neerav provides a i40e patch to collect XOFF Rx stats, where it was not
 being collected before.
 
 Anjali provides ATR support for tunneled packets, as well as stats to
 count tunnel ATR hits.  Cleaned up PF struct members which are
 unnecessary, since we can use the stat index macro directly.  Cleaned
 up flow director ATR/SB messages to a higher debug level since they
 are not useful unless silicon validation is happening.
 
 Greg provides a patch to disable offline diagnostics if VFs are enabled
 since ethtool offline diagnostic tests are not designed (out of scope)
 to disable VF functions for testing and re-enable afterward.  Also cleans
 up TODO comment that is no longer needed.
 
 Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe
 called before i40e_fcoe_eof_is_supported() is called.
 
 Jesse adds skb-xmit_more support for i40evf.  Then provides a performance
 enhancement for i40evf by inlining some functions which provides a 15%
 gain in small packet performance.  Also cleans up the use of time_stamp
 since it is no longer used to determine if there is a tx_hang and was
 a part of a previous tx_hang design which is no longer used.

Pulled, thanks Jeff.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

2015-05-30 Thread John Fastabend

On 05/30/2015 02:00 AM, Jiri Pirko wrote:

Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:

On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko j...@resnulli.us wrote:

Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:

On Tue, May 19, 2015 at 1:28 PM, David Miller da...@davemloft.net wrote:

From: Andy Gospodarek go...@cumulusnetworks.com
Date: Tue, 19 May 2015 15:47:32 -0400


Are you actually saying that if users complain loudly enough about
the current behavior (not the change Roopa has proposed) that you
would be open to considering a change the current behavior?


I am saying that we have a contract with users not to break existing
behavior.  Full stop.


After rehearing David's argument, we should probably explore option d)
which is a refinement on the fib_offload_disable mechanism we have
today.  fib_offload_disable is global for all routes.  Once we hit a
HW install problem, the global flag is set and all routes fallback to
SW.  We did this because we can't allow the failed route to exist in
SW and not in HW because it could mess up LPM searches (HW could hit
on a lesser prefix even when SW has the true LPM, because HW gets
first shot at match).  The refinement on fib_offload_disable is this:
make it per-related-prefix rather than global, and on a HW install
problem, set the flag for the related-prefix and uninstall only those
routes from HW.  Related-prefix (is there a correct term for this?)
are routes to the same dst addr but with different prefix lengths.  I
haven't parsed the fib_trie structure to see how routes are organized,
but I suspect since it's optimized for lookup the related-prefix
tracking is already there and we can build on that.


This looks interesting. However, I'm not sure that it is acceptable for
user to experience this hw evict of random entries. User knows what
entries are essential to have in hw. With your solution, I can see no way
user can actually say what should be offloaded or not. Kernel just
automagically decides.


The default eviction policy could be based on RTA_PRIORITY: evict
lower priority routes first.  It would be up to the device driver to
decide between two routes of same priority.

To help device driver make the decision, we could have eviction policy options:

Priority-base (default)
Prefer IPv6 over IPv4
Prefer IPv4 over IPv6
Prefer single path over multipath
Prefer longer prefix lengths over shorter
Optimize for resource utilization

These are portable across different switches.   They're in terms a
user understands.  It's up to the device driver which truly
understands the device constraints to translates the user's eviction
policy choices into something that makes sense to that device.


This sounds tempting... You plan to throw in some patches, or should I
take care of that?



This is encoding specific policies into the kernel. I was hoping to
avoid this and let user space develop whatever policy it wants. If you
use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this.

Also I don't understand the truly  understands the device constraints
comment. We can export a model of the device and know how many rules
of each type will fit exactly into the table. This doesn't seem like
much of a problem to me. In fact the driver developer should know this
anyway.

Part of my motivation here is I really don't want to get stuck with a
case where each driver writer gets to translate the eviction policy
onto their device in some device specific and slightly different way.
It means every developer has to write a new mapping and get it correct.
At very least we should put a layer in switchdev that reads the table
out of the driver and does the mapping so we have it one spot. At least
then the kernel is enforcing policy the same on all devices. Better
still IMO would be to develop the policy in user space and have a
library/tool that does this so we don't end up with a bunch of policy
blobs in the kernel. The 6 above is a good start but over time we more
policy blobs will surely pop up. I would for example put 'optimize for
throughput' on the list.

.John

--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-4 100G Ethernet driver

2015-05-30 Thread David Miller
From: Amir Vadai am...@mellanox.com
Date: Thu, 28 May 2015 22:28:37 +0300

 This patchset extends the mlx5_core driver to support Ethernet
 functionality. The Ethernet functionality in the mlx5 driver is
 integrated into the core driver and not as separated driver. The
 IB functionality remains in the mlx5_ib driver as before.

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] net: dsa: ar8xxx: add regmap support

2015-05-30 Thread Sergey Ryazanov
2015-05-29 20:59 GMT+03:00 Andrew Lunn and...@lunn.ch:
 On Fri, May 29, 2015 at 10:36:49AM -0700, Mathieu Olivari wrote:
 Alternatively, we could have something similar to what happens for the phy
 in the wireless subsystems. Wireless PHYs are not registered as net_device
 but they can still be listed, queried or configured through netlink.

 It is a reasonable idea, but you retrieve most of the useful
 information using ethtool. That, as far as i know, operates on
 net_devices, not phys.

May be it's time to rework Ethernet cards handling to decouple
Network interfaces from Ethernet ports?

-- 
Sergey
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-30 Thread Greg KH
On Sun, May 31, 2015 at 11:53:47AM +0900, Greg KH wrote:
 On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote:
  On 05/23/2015 04:16 PM, Larry Finger wrote:
  The driver is reporting a warning at kernel/time/timer.c:1096 due to 
  calling
  del_timer_sync() while in interrupt mode. Such warnings are fixed by 
  calling
  del_timer() instead.
  
  Signed-off-by: Larry Finger larry.fin...@lwfinger.net
  Cc: Stable sta...@vger.kernel.org
  Cc: Haggi Eran haggai.e...@gmail.com
  ---
  
  Greg,
  
  Please drop this patch. The same fixes were submitted as
  https://lkml.org/lkml/2015/5/15/226.
 
 That's not working for me at the moment, what was the subject: name?  I
 think I already applied it to the testing tree...

Nevermind, found it...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-30 Thread Greg KH
On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote:
 On 05/23/2015 04:16 PM, Larry Finger wrote:
 The driver is reporting a warning at kernel/time/timer.c:1096 due to calling
 del_timer_sync() while in interrupt mode. Such warnings are fixed by calling
 del_timer() instead.
 
 Signed-off-by: Larry Finger larry.fin...@lwfinger.net
 Cc: Stable sta...@vger.kernel.org
 Cc: Haggi Eran haggai.e...@gmail.com
 ---
 
 Greg,
 
 Please drop this patch. The same fixes were submitted as
 https://lkml.org/lkml/2015/5/15/226.

That's not working for me at the moment, what was the subject: name?  I
think I already applied it to the testing tree...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/5] Add support for QCA IPQ806x Ethernet GMAC controller

2015-05-30 Thread David Miller
From: Mathieu Olivari math...@codeaurora.org
Date: Wed, 27 May 2015 11:02:45 -0700

 This patch set adds support for the integrated Ethernet GMAC controller
 on QCA IPQ806x SoC. This controller is based on a Gigabit Synopsys
 DesignWare IP, already supported in the stmmac driver located in
 drivers/net/ethernet/stmicro/stmmac.
 
 This change is done as a follow-up to the following thread:
 *http://www.spinics.net/lists/netdev/msg311265.html
 While previous attempt was creating a new driver to drive this controller,
 this new post leverages the existing stmmac driver by implementing the
 SoC specific glue to it.
 
 Aside from the pure stmmac glue layer, we have a couple of related
 patches:
 *IPQ806x NSS clock addition is cherry-picked and refreshed from the
  following thread: https://lkml.org/lkml/2014/8/6/390
 *phy-handle and fixed-link support are also added in this change set so the
  driver can be fully functional on platforms using device-trees as well as
  ethernet switches.
 
 V2:
  *Fix MODULE_LICENSE to Dual BSD/GPL as the dwmac-ipq806x.c is using
   ISC license.

Series applied to net-next, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.

2015-05-30 Thread Sergey Ryazanov
2015-05-30 15:09 GMT+03:00 Bjørn Mork bj...@mork.no:
 Andrew Lunn and...@lunn.ch writes:

 Some boards have two CPU interfaces connected to the switch, e.g. WiFi
 access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
 two port connected to the SoC.

 This patch extends DSA to allows both CPU ports to be used. The cpu
 node in the DSA tree can now have a phandle to the host interface it
 connects to. Each user port can have a phandle to a cpu port which
 should be used for traffic between the port and the CPU. Thus simple
 load sharing over the two CPU ports can be achieved.

 Signed-off-by: Andrew Lunn and...@lunn.ch
 ---
  Documentation/devicetree/bindings/net/dsa/dsa.txt |  66 -
  drivers/net/dsa/mv88e6xxx.c   |   8 +-
  include/net/dsa.h |  28 +-
  net/dsa/dsa.c | 109 
 ++
  net/dsa/dsa_priv.h|   6 ++
  net/dsa/slave.c   |  10 +-
  net/dsa/tag_brcm.c|   2 +-
  net/dsa/tag_dsa.c |   2 +-
  net/dsa/tag_edsa.c|   2 +-
  net/dsa/tag_trailer.c |   2 +-
  10 files changed, 206 insertions(+), 29 deletions(-)

 diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
 b/Documentation/devicetree/bindings/net/dsa/dsa.txt
 index f0b4cd72411d..34f7f18026e5 100644
 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
 +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
 @@ -58,13 +58,24 @@ Optionnal property:
 Documentation/devicetree/bindings/net/ethernet.txt
 for details.

 +- ethernet   : Optional for cpu ports. A phandle to an ethernet
 +  device which will be used by this CPU port for
 +   passing packets to/from the host. If not present,
 +   the port will use the dsa,ethernet property
 +   defined above.
 +
 +- cpu: Option for non cpu/dsa ports. A phandle 
 to a
 +   cpu port, which will be used for passing packets
 +   from this port to the host. If not present, the first
 +   cpu port will be used.
 +


Forgive me my intrusion. Maybe I could answer to some of your questions.

 I'm in deep water here, but this scheme sounds a little too static to me
 if I understand your proposal correctly.  Why would you want to create a
 static mapping of CPU ports to external ports for any given device?

Vendor already assumes that this mapping is static and DT just
describes this assumption. Single switch chip with two ports connected
to CPU on such devices is cheaper than switch chip + dedicated phy
chip. In other words, one of the switch ports just used as independent
phy and Andrew's patch gives an ability to perfectly describe such
situation.

 To me, that's part of the switch VLAN configuration.

AFAIK DSA is designed to allow L3 routing between ports as opposed to
switching and VLANs at L2.
DSA facilitates work of hardware designer by providing more
configurable chips. If so then interconnection tasks should be
resolved by kernel in plug-and-play manner, just as kernel assigns
memory regions to PCI devices :)

 My experience with these devices is limited to running OpenWRT on an
 WRT1900AC, having a Marvell 88E6172 switch.  And using the OpenWRT
 switch API of course. There I've found it very useful to be able to mix
 and match the two CPU ports as I like with the external ports. How you
 want the CPU ports used is not as much depeing on device properties as
 on your network configuration, IMHO.  How many and which links do you
 have?  What bandwith are they? Trunks or not?  Etc.  You cannot describe
 these answers as device properties, because they aren't.

Nobody forbids to run custom kernel with custom DT in case of custom setup :)

 You can currently configure this as you like in OpenWRT using their
 usual swconfig tool.  The CPU ports are added or removed from VLANs like
 any other port on the switch, and that feels very natural for me as an
 end user.  The only distinction necessary to know, is your 'ethernet'
 property above:  Which host device is this switch port connected to.

 So I wonder: Do you plan to put all of the switch config into DT?  Where
 does that stop? How about trunking between external ports and CPU ports?
 Will every VLAN in the trunk have to go into DT too?

IMHO VLANs shouldn't be described by DT. VLANs is part of network
configuration and should be configured by end user, if he needs them.
In the same time, DSA configuration is part of hw configuration and
that's why it placed in DT.

In any case, Andrew as an author could give a better explanation. So
let's wait for his answer.

-- 
Sergey
--
To unsubscribe from this list: send 

Gefeliciteerd !!!

2015-05-30 Thread Facebook Rewards Program



--
Gefeliciteerd !!!

Including we Vieren Onze 10 jaar Van het internet Journey en Global 
Communication we are Blij aan te kondigen aan u DAT Uw Facebook-rekening 
are willekeurig geselecteerd als begunstigde van $ 1,000,000.00usd in de 
2014/2015 Facebook account van het Jaar {Grote Rewards winnaar} .


E-mail ons de informatie hieronder: fb_deliveryserv...@mynet.com

BERICHT VAN identificatie: NW90W0W0-XANSIEW-1015
1) Bedrag gewonnen: $ 1.000.000,00 usd
2) facebook Gebruikersnaam:
3) De dialog Land van Woonplaats:
4) Paspoort / Identity Number:


E-mail: fb_deliveryserv...@mynet.com
George Jones.

Program Coordinator,
Facebook Rewards Program,
www.facebook.com
Alle Rechten voorbehouden 2015.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN

2015-05-30 Thread David Miller
From: Sorin Dumitru so...@returnze.ro
Date: Wed, 27 May 2015 22:16:49 +0300

 This is similar to b1cb59cf2efe(net: sysctl_net_core: check SNDBUF
 and RCVBUF for min length). I don't think too small values can cause
 crashes in the case of udp and tcp, but I've seen this set to too
 small values which triggered awful performance. It also makes the
 setting consistent across all the wmem/rmem sysctls.
 
 Signed-off-by: Sorin Dumitru sdumi...@ixiacom.com

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net-next 1/1] hv_netvsc: Properly size the vrss queues

2015-05-30 Thread David Miller
From: K. Y. Srinivasan k...@microsoft.com
Date: Wed, 27 May 2015 13:16:57 -0700

 The current algorithm for deciding on the number of VRSS channels is
 not optimal since we open up the min of number of CPUs online and the
 number of VRSS channels the host is offering. So on a 32 VCPU guest
 we could potentially open 32 VRSS subchannels. Experimentation has
 shown that it is best to limit the number of VRSS channels to the number
 of CPUs within a NUMA node.
 
 Here is the new algorithm for deciding on the number of sub-channels we
 would open up:
 1) Pick the minimum of what the host is offering and what the driver
in the guest is specifying as the default value.
 2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
node the primary channel is bound to.
 
 
 Signed-off-by: K. Y. Srinivasan k...@microsoft.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tipc: unconditionally put sock refcnt when sock timer to be deleted is pending

2015-05-30 Thread David Miller
From: Ying Xue ying@windriver.com
Date: Thu, 28 May 2015 13:19:22 +0800

 As sock refcnt is taken when sock timer is started in
 sk_reset_timer(), the sock refcnt should be put when sock timer
 to be deleted is in pending state no matter what probing_state
 value of tipc sock is.
 
 Reviewed-by: Erik Hugne erik.hu...@ericsson.com
 Reviewed-by: Jon Maloy jon.ma...@ericsson.com
 Signed-off-by: Ying Xue ying@windriver.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] can: mcp251x: not correct register address

2015-05-30 Thread Jakub Kicinski
On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote:
 This patch corrects addresses of acceptance filters.
 These registers are not in use, but values should be correct.
 Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
 and MCP2510.
 
 Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz
 
 ---
   drivers/net/can/spi/mcp251x.c |9 +
   1 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
 index bf63fee..c1a95a3 100644
 --- a/drivers/net/can/spi/mcp251x.c
 +++ b/drivers/net/can/spi/mcp251x.c
 @@ -190,10 +190,11 @@
   #define RXBEID0_OFF 4
   #define RXBDLC_OFF  5
   #define RXBDAT_OFF  6
 -#define RXFSIDH(n) ((n) * 4)
 -#define RXFSIDL(n) ((n) * 4 + 1)
 -#define RXFEID8(n) ((n) * 4 + 2)
 -#define RXFEID0(n) ((n) * 4 + 3)
 +#define RXFSID(n) ((n  3) ? 0 : 4)
 +#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
 +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
 +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
 +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
   #define RXMSIDH(n) ((n) * 4 + 0x20)
   #define RXMSIDL(n) ((n) * 4 + 0x21)
   #define RXMEID8(n) ((n) * 4 + 0x22)

I think your patch was corrupted.  It doesn't apply because you have
extra space before each surviving #define.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] can: mcp251x: not correct register address

2015-05-30 Thread Tomas Krcka
You are right, sorry for that. I'll send v2.

Thanks.

2015-05-30 9:41 GMT+02:00 Jakub Kicinski moorr...@wp.pl:
 On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote:
 This patch corrects addresses of acceptance filters.
 These registers are not in use, but values should be correct.
 Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
 and MCP2510.

 Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz

 ---
   drivers/net/can/spi/mcp251x.c |9 +
   1 files changed, 5 insertions(+), 4 deletions(-)

 diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
 index bf63fee..c1a95a3 100644
 --- a/drivers/net/can/spi/mcp251x.c
 +++ b/drivers/net/can/spi/mcp251x.c
 @@ -190,10 +190,11 @@
   #define RXBEID0_OFF 4
   #define RXBDLC_OFF  5
   #define RXBDAT_OFF  6
 -#define RXFSIDH(n) ((n) * 4)
 -#define RXFSIDL(n) ((n) * 4 + 1)
 -#define RXFEID8(n) ((n) * 4 + 2)
 -#define RXFEID0(n) ((n) * 4 + 3)
 +#define RXFSID(n) ((n  3) ? 0 : 4)
 +#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
 +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
 +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
 +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
   #define RXMSIDH(n) ((n) * 4 + 0x20)
   #define RXMSIDL(n) ((n) * 4 + 0x21)
   #define RXMEID8(n) ((n) * 4 + 0x22)

 I think your patch was corrupted.  It doesn't apply because you have
 extra space before each surviving #define.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add missing rcu protection when releasing programs from prog_array

2015-05-30 Thread Daniel Borkmann

On 05/30/2015 01:22 AM, Alexei Starovoitov wrote:
...

Like __sk_filter_release() and __bpf_prog_release() should be removed.


The whole filter cleanup procedure needs to be simplified a bit, got a
bit too complicated over time, agreed.


Of course, it's a grey line when to introduce a helper and when not to,
but just because two lines are close enough between two functions it
doesn't mean that helper is warranted. In this bpf_prog_put() case
I think helper is not needed _today_. If it grows, we'll reconsider.


Yes, that's what I meant.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

2015-05-30 Thread Jiri Pirko
Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:
On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko j...@resnulli.us wrote:
 Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:
On Tue, May 19, 2015 at 1:28 PM, David Miller da...@davemloft.net wrote:
 From: Andy Gospodarek go...@cumulusnetworks.com
 Date: Tue, 19 May 2015 15:47:32 -0400

 Are you actually saying that if users complain loudly enough about
 the current behavior (not the change Roopa has proposed) that you
 would be open to considering a change the current behavior?

 I am saying that we have a contract with users not to break existing
 behavior.  Full stop.

After rehearing David's argument, we should probably explore option d)
which is a refinement on the fib_offload_disable mechanism we have
today.  fib_offload_disable is global for all routes.  Once we hit a
HW install problem, the global flag is set and all routes fallback to
SW.  We did this because we can't allow the failed route to exist in
SW and not in HW because it could mess up LPM searches (HW could hit
on a lesser prefix even when SW has the true LPM, because HW gets
first shot at match).  The refinement on fib_offload_disable is this:
make it per-related-prefix rather than global, and on a HW install
problem, set the flag for the related-prefix and uninstall only those
routes from HW.  Related-prefix (is there a correct term for this?)
are routes to the same dst addr but with different prefix lengths.  I
haven't parsed the fib_trie structure to see how routes are organized,
but I suspect since it's optimized for lookup the related-prefix
tracking is already there and we can build on that.

 This looks interesting. However, I'm not sure that it is acceptable for
 user to experience this hw evict of random entries. User knows what
 entries are essential to have in hw. With your solution, I can see no way
 user can actually say what should be offloaded or not. Kernel just
 automagically decides.

The default eviction policy could be based on RTA_PRIORITY: evict
lower priority routes first.  It would be up to the device driver to
decide between two routes of same priority.

To help device driver make the decision, we could have eviction policy options:

Priority-base (default)
Prefer IPv6 over IPv4
Prefer IPv4 over IPv6
Prefer single path over multipath
Prefer longer prefix lengths over shorter
Optimize for resource utilization

These are portable across different switches.   They're in terms a
user understands.  It's up to the device driver which truly
understands the device constraints to translates the user's eviction
policy choices into something that makes sense to that device.

This sounds tempting... You plan to throw in some patches, or should I
take care of that?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.

2015-05-30 Thread Bjørn Mork
Andrew Lunn and...@lunn.ch writes:

 Some boards have two CPU interfaces connected to the switch, e.g. WiFi
 access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
 two port connected to the SoC.

 This patch extends DSA to allows both CPU ports to be used. The cpu
 node in the DSA tree can now have a phandle to the host interface it
 connects to. Each user port can have a phandle to a cpu port which
 should be used for traffic between the port and the CPU. Thus simple
 load sharing over the two CPU ports can be achieved.

 Signed-off-by: Andrew Lunn and...@lunn.ch
 ---
  Documentation/devicetree/bindings/net/dsa/dsa.txt |  66 -
  drivers/net/dsa/mv88e6xxx.c   |   8 +-
  include/net/dsa.h |  28 +-
  net/dsa/dsa.c | 109 
 ++
  net/dsa/dsa_priv.h|   6 ++
  net/dsa/slave.c   |  10 +-
  net/dsa/tag_brcm.c|   2 +-
  net/dsa/tag_dsa.c |   2 +-
  net/dsa/tag_edsa.c|   2 +-
  net/dsa/tag_trailer.c |   2 +-
  10 files changed, 206 insertions(+), 29 deletions(-)

 diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
 b/Documentation/devicetree/bindings/net/dsa/dsa.txt
 index f0b4cd72411d..34f7f18026e5 100644
 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
 +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
 @@ -58,13 +58,24 @@ Optionnal property:
 Documentation/devicetree/bindings/net/ethernet.txt
 for details.
  
 +- ethernet   : Optional for cpu ports. A phandle to an ethernet
 +  device which will be used by this CPU port for
 +   passing packets to/from the host. If not present,
 +   the port will use the dsa,ethernet property
 +   defined above.
 +
 +- cpu: Option for non cpu/dsa ports. A phandle 
 to a
 +   cpu port, which will be used for passing packets
 +   from this port to the host. If not present, the first
 +   cpu port will be used.
 +

I'm in deep water here, but this scheme sounds a little too static to me
if I understand your proposal correctly.  Why would you want to create a
static mapping of CPU ports to external ports for any given device?  To
me, that's part of the switch VLAN configuration.

My experience with these devices is limited to running OpenWRT on an
WRT1900AC, having a Marvell 88E6172 switch.  And using the OpenWRT
switch API of course. There I've found it very useful to be able to mix
and match the two CPU ports as I like with the external ports. How you
want the CPU ports used is not as much depeing on device properties as
on your network configuration, IMHO.  How many and which links do you
have?  What bandwith are they? Trunks or not?  Etc.  You cannot describe
these answers as device properties, because they aren't.

You can currently configure this as you like in OpenWRT using their
usual swconfig tool.  The CPU ports are added or removed from VLANs like
any other port on the switch, and that feels very natural for me as an
end user.  The only distinction necessary to know, is your 'ethernet'
property above:  Which host device is this switch port connected to.

So I wonder: Do you plan to put all of the switch config into DT?  Where
does that stop? How about trunking between external ports and CPU ports?
Will every VLAN in the trunk have to go into DT too?


Bjørn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] can: mcp251x: not correct register address

2015-05-30 Thread Tomas Krcka
v2: fix of corrupted patch

This patch corrects addresses of acceptance filters.
These registers are not in use, but values should be correct.
Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
and MCP2510.

Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz 

---
 drivers/net/can/spi/mcp251x.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index bf63fee..c1a95a3 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -190,10 +190,11 @@
 #define RXBEID0_OFF 4
 #define RXBDLC_OFF  5
 #define RXBDAT_OFF  6
-#define RXFSIDH(n) ((n) * 4)
-#define RXFSIDL(n) ((n) * 4 + 1)
-#define RXFEID8(n) ((n) * 4 + 2)
-#define RXFEID0(n) ((n) * 4 + 3)
+#define RXFSID(n) ((n  3) ? 0 : 4)
+#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
+#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
+#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
+#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
 #define RXMSIDH(n) ((n) * 4 + 0x20)
 #define RXMSIDL(n) ((n) * 4 + 0x21)
 #define RXMEID8(n) ((n) * 4 + 0x22)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 42/98] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘iph’ has incomplete type
error: field ‘prefix’ has incomplete type

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/if_tunnel.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index bd3cc11..2a36080 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -2,6 +2,9 @@
 #define _UAPI_IF_TUNNEL_H_
 
 #include linux/types.h
+#include linux/if.h
+#include linux/ip.h
+#include linux/in6.h
 #include asm/byteorder.h
 
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 41/98] include/uapi/linux/if_pppox.h: include linux/if.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation error:

error: ‘IFNAMSIZ’ undeclared here (not in a function)

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/if_pppox.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index e128769..473c3c4 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -21,6 +21,7 @@
 #include asm/byteorder.h
 
 #include linux/socket.h
+#include linux/if.h
 #include linux/if_ether.h
 #include linux/if_pppol2tp.h
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 48/98] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */

error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/if_pppox.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index 473c3c4..d37bbb1 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -24,6 +24,8 @@
 #include linux/if.h
 #include linux/if_ether.h
 #include linux/if_pppol2tp.h
+#include linux/in.h
+#include linux/in6.h
 
 /* For user-space programs to pick up these definitions
  * which they wouldn't get otherwise without defining __KERNEL__
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 47/98] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */
^
error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/if_pppol2tp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h
index 163e8ad..4bd1f55 100644
--- a/include/uapi/linux/if_pppol2tp.h
+++ b/include/uapi/linux/if_pppol2tp.h
@@ -16,7 +16,8 @@
 #define _UAPI__LINUX_IF_PPPOL2TP_H
 
 #include linux/types.h
-
+#include linux/in.h
+#include linux/in6.h
 
 /* Structure used to connect() the socket to a particular tunnel UDP
  * socket over IPv4.
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 84/98] include/uapi/linux/atm_zatm.h: include linux/time.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compile error:

error: field ‘real’ has incomplete type
 struct timeval real;  /* real (wall-clock) time */

Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi
---
 include/uapi/linux/atm_zatm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h
index 10f0fa2..adbaa6c 100644
--- a/include/uapi/linux/atm_zatm.h
+++ b/include/uapi/linux/atm_zatm.h
@@ -14,6 +14,7 @@
 
 #include linux/atmapi.h
 #include linux/atmioc.h
+#include linux/time.h
 
 #define ZATM_GETPOOL   _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc)
/* get pool statistics */
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] udp: fix behavior of wrong checksums

2015-05-30 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.


This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Willem de Bruijn will...@google.com
---
 net/ipv4/udp.c |6 ++
 net/ipv6/udp.c |6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d10b7e0112eb..1c92ea67baef 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1345,10 +1345,8 @@ csum_copy_err:
}
unlock_sock_fast(sk, slow);
 
-   if (noblock)
-   return -EAGAIN;
-
-   /* starting over for a new packet */
+   /* starting over for a new packet, but check if we need to yield */
+   cond_resched();
msg-msg_flags = ~MSG_TRUNC;
goto try_again;
 }
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c2ec41617a35..e51fc3eee6db 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -525,10 +525,8 @@ csum_copy_err:
}
unlock_sock_fast(sk, slow);
 
-   if (noblock)
-   return -EAGAIN;
-
-   /* starting over for a new packet */
+   /* starting over for a new packet, but check if we need to yield */
+   cond_resched();
msg-msg_flags = ~MSG_TRUNC;
goto try_again;
 }


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html