Re: Recurring trace from tcp_fragment()
Thank you Neal. Most likely I will test the patch on Monday and report back the result. As for the TcpExtTCPSACKReneging counter, attached is the captured counter value on a 1-second interval for 10 minutes. Thanks, Grant reneg.log Description: Binary data On May 30, 2015, at 10:29 AM, Neal Cardwell ncardw...@google.com wrote: On Fri, May 29, 2015 at 3:53 PM, Grant Zhang gzh...@fastly.com wrote: Hi Neal, I will be more happy to test the patch. Please send it my way. Great. Thank you so much for being willing to do this. Attached is a patch for testing. I generated it and tested it relative to Linux v3.14.39, since your stack trace seemed to suggest that you were seeing this on some variant of v3.14.39. (Newer kernels would need a slightly different patch, since the reneging code path has changed a little since 3.14.) Can you please try it out and see if it makes that warning go away? Also, I would be interested in seeing the value of your TcpExtTCPSACKReneging counter, and some sense of how fast that value is increasing, on a machine that's seeing this issue: nstat -z -a | grep Reneg Thanks! neal 0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch
Re: Ingress tc filters with IPSec
On May 30, 2015 at 2:24 AM John A. Sullivan III jsulli...@opensourcedevel.com wrote: On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote: Argh! yet another obstacle from my ignorance. We are attempting ingress traffic shaping using IFB interfaces on traffic coming via GRE / IPSec. Filters and hash tables are working fine with plain GRE including stripping the header. We even got the ematch filter working so that the ESP packets are the only packets not redirected to IFB. But, regardless of whether we redirect ESP packets to IFB, the filters never see the decrypted packets. I thought the packets passed through the interface twice - first encrypted and they decrypted. However, tcpdump only shows the ESP packets on the interface. How do we apply filters to the packets after decryption? Thanks - John I see what changed. In the past, this seemed to work but we were using tunnel mode. We were trying to use transport mode in this application but that seems to prevent the decrypted packet contents from appearing again on the interface. Reverting to tunnel mode made the contents visible again and our filters are working as expected - John Alas, this is still a problem since we are using VRRP and the tunnel end points are the virtual IP addresses. That makes StrongSWAN choke on selector matching in tunnel mode so back to trying to make transport mode work. I am guessing we do not see the second pass of the packet because it is only encrypted and not encapsulated. So my hunch is that we ned to pass the ESP packet into the ifb qdisc but need to look elsewhere the packet for the filter matching information. We know that matching on the normal offsets does not work so I am hoping the decrypted packet is decipherable by the filter matching logic but just still has all the ESP transport header attached. Normally, to extract the contents of my GRE tunnel, I would place them into a separate hash table with the GRE header stripped off and then filter them into TCP and UDP hast tables: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus 4 eat So we match the GRE protocol and determine that GRE is carrying an IP packet. With the ESP transport header and IV (AES = 16B) interposed between the IP header and the GRE header, I suppose the first part of this filter becomes: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 46 but what do I do with the second half to find the start of the TCP/UDP header? Is it still offset at 0 because tc filter somehow knows where the interior IP header starts or should it be offset at 48 to account for the GRE + ESP headers? Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ingress tc filters with IPSec
On May 30, 2015 at 4:12 PM jsulli...@opensourcedevel.com jsulli...@opensourcedevel.com wrote: On May 30, 2015 at 2:24 AM John A. Sullivan III jsulli...@opensourcedevel.com wrote: On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote: Argh! yet another obstacle from my ignorance. We are attempting ingress traffic shaping using IFB interfaces on traffic coming via GRE / IPSec. Filters and hash tables are working fine with plain GRE including stripping the header. We even got the ematch filter working so that the ESP packets are the only packets not redirected to IFB. But, regardless of whether we redirect ESP packets to IFB, the filters never see the decrypted packets. I thought the packets passed through the interface twice - first encrypted and they decrypted. However, tcpdump only shows the ESP packets on the interface. How do we apply filters to the packets after decryption? Thanks - John I see what changed. In the past, this seemed to work but we were using tunnel mode. We were trying to use transport mode in this application but that seems to prevent the decrypted packet contents from appearing again on the interface. Reverting to tunnel mode made the contents visible again and our filters are working as expected - John Alas, this is still a problem since we are using VRRP and the tunnel end points are the virtual IP addresses. That makes StrongSWAN choke on selector matching in tunnel mode so back to trying to make transport mode work. I am guessing we do not see the second pass of the packet because it is only encrypted and not encapsulated. So my hunch is that we ned to pass the ESP packet into the ifb qdisc but need to look elsewhere the packet for the filter matching information. We know that matching on the normal offsets does not work so I am hoping the decrypted packet is decipherable by the filter matching logic but just still has all the ESP transport header attached. Normally, to extract the contents of my GRE tunnel, I would place them into a separate hash table with the GRE header stripped off and then filter them into TCP and UDP hast tables: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus 4 eat So we match the GRE protocol and determine that GRE is carrying an IP packet. With the ESP transport header and IV (AES = 16B) interposed between the IP header and the GRE header, I suppose the first part of this filter becomes: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 46 but what do I do with the second half to find the start of the TCP/UDP header? Is it still offset at 0 because tc filter somehow knows where the interior IP header starts or should it be offset at 48 to account for the GRE + ESP headers? Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? Thanks - John Alas, this is not working. I set a continue action for the ESP traffic: tc filter replace dev ifb0 parent 11:0 protocol ip prio 1 u32 match ip protocol 50 0xff action continue and that seems to be matching: filter parent 11: protocol ip pref 1 u32 fh 802::800 order 2048 key ht 802 bkt 0 terminal flowid ??? (rule hit 3130003 success 2931853) match 0032/00ff at 8 (success 2931853 ) action order 1: gact action continue random type none pass val 0 index 1 ref 1 bind 1 installed 294 sec And I even reduced the GRE filter to just look for the GRE protocol in the IP header: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff link 11: offset at 48 mask 0f00 shift 6 plus 4 eat but it does not appear to be matching at all: filter parent 11: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 link 11: (rule hit 3130012 success 0) match 002f/00ff at 8 (success 0 ) offset 0f006 at 48 plus 4 eat Any suggestions about how to traffic shape ingest traffic coming off an ESP Transport connection? Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 81/98] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h
Fixes userspace compiler error: error: unknown type name ‘uint32_t’ Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/openvswitch.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index bbd49a0..0ab8eca 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -586,8 +586,8 @@ enum ovs_hash_alg { * @hash_basis: basis used for computing hash. */ struct ovs_action_hash { - uint32_t hash_alg; /* One of ovs_hash_alg. */ - uint32_t hash_basis; + __u32 hash_alg; /* One of ovs_hash_alg. */ + __u32 hash_basis; }; /** -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recurring trace from tcp_fragment()
On Fri, May 29, 2015 at 3:53 PM, Grant Zhang gzh...@fastly.com wrote: Hi Neal, I will be more happy to test the patch. Please send it my way. Great. Thank you so much for being willing to do this. Attached is a patch for testing. I generated it and tested it relative to Linux v3.14.39, since your stack trace seemed to suggest that you were seeing this on some variant of v3.14.39. (Newer kernels would need a slightly different patch, since the reneging code path has changed a little since 3.14.) Can you please try it out and see if it makes that warning go away? Also, I would be interested in seeing the value of your TcpExtTCPSACKReneging counter, and some sense of how fast that value is increasing, on a machine that's seeing this issue: nstat -z -a | grep Reneg Thanks! neal 0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch Description: Binary data
[PATCH net-next 1/3] s390/bpf: fix stack allocation
From: Michael Holzheu holz...@linux.vnet.ibm.com On s390x we have to provide 160 bytes stack space before we can call the next function. From the 160 bytes that we got from the previous function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left. Currently for BPF we allocate additional 160 - 11 * 8 bytes for the next function. This is wrong because then the next function only gets: (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes Fix this and allocate enough memory for the next function. Cc: sta...@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com Acked-by: Heiko Carstens heiko.carst...@de.ibm.com Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- arch/s390/net/bpf_jit.h |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h index ba8593a515ba..de156ba3bd71 100644 --- a/arch/s390/net/bpf_jit.h +++ b/arch/s390/net/bpf_jit.h @@ -48,7 +48,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * We get 160 bytes stack space from calling function, but only use * 11 * 8 byte (old backchain + r15 - r6) for storing registers. */ -#define STK_OFF (MAX_BPF_STACK + 8 + 4 + 4 + (160 - 11 * 8)) +#define STK_SPACE (MAX_BPF_STACK + 8 + 4 + 4 + 160) +#define STK_160_UNUSED (160 - 11 * 8) +#define STK_OFF(STK_SPACE - STK_160_UNUSED) #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */ #define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/3] s390/bpf: fix bpf frame pointer setup
From: Michael Holzheu holz...@linux.vnet.ibm.com Currently the bpf frame pointer is set to the old r15. This is wrong because of packed stack. Fix this and adjust the frame pointer to respect packed stack. This now generates a prolog like the following: 3ff8001c3fa: eb67f0480024 stmg%r6,%r7,72(%r15) 3ff8001c400: ebcff0780024 stmg%r12,%r15,120(%r15) 3ff8001c406: b904001f lgr %r1,%r15 - load backchain 3ff8001c40a: 41d0f048 la %r13,72(%r15) - load adjusted bfp 3ff8001c40e: a7fbfd98 aghi%r15,-616 3ff8001c412: e310f0980024 stg %r1,152(%r15) - save backchain Cc: sta...@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com Acked-by: Heiko Carstens heiko.carst...@de.ibm.com Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- arch/s390/net/bpf_jit_comp.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 20c146d1251a..55423d8be580 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -384,13 +384,16 @@ static void bpf_jit_prologue(struct bpf_jit *jit) } /* Setup stack and backchain */ if (jit-seen SEEN_STACK) { - /* lgr %bfp,%r15 (BPF frame pointer) */ - EMIT4(0xb904, BPF_REG_FP, REG_15); + if (jit-seen SEEN_FUNC) + /* lgr %w1,%r15 (backchain) */ + EMIT4(0xb904, REG_W1, REG_15); + /* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */ + EMIT4_DISP(0x4100, BPF_REG_FP, REG_15, STK_160_UNUSED); /* aghi %r15,-STK_OFF */ EMIT4_IMM(0xa70b, REG_15, -STK_OFF); if (jit-seen SEEN_FUNC) - /* stg %bfp,152(%r15) (backchain) */ - EMIT6_DISP_LH(0xe300, 0x0024, BPF_REG_FP, REG_0, + /* stg %w1,152(%r15) (backchain) */ + EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15, 152); } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/3] s390/bpf: implement bpf_tail_call() helper
From: Michael Holzheu holz...@linux.vnet.ibm.com bpf_tail_call() arguments: - ctx..: Context pointer - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table - index: Index in the jump table In this implementation s390x JIT does stack unwinding and jumps into the callee program prologue. Caller and callee use the same stack. With this patch a tail call generates the following code on s390x: if (index = array-map.max_entries) goto out 03ff8001c7e4: e31030100016 llgf%r1,16(%r3) 03ff8001c7ea: ec41001fa065 clgrj %r4,%r1,10,3ff8001c828 if (tail_call_cnt++ MAX_TAIL_CALL_CNT) goto out; 03ff8001c7f0: a7080001 lhi %r0,1 03ff8001c7f4: eb10f25000fa laal%r1,%r0,592(%r15) 03ff8001c7fa: ec120017207f clij%r1,32,2,3ff8001c828 prog = array-prog[index]; if (prog == NULL) goto out; 03ff8001c800: eb140003000d sllg%r1,%r4,3 03ff8001c806: e3131084 lg %r1,128(%r3,%r1) 03ff8001c80c: ec18000e007d clgij %r1,0,8,3ff8001c828 Restore registers before calling function 03ff8001c812: eb68f2980004 lmg %r6,%r8,664(%r15) 03ff8001c818: ebbff2c4 lmg %r11,%r15,704(%r15) goto *(prog-bpf_func + tail_call_start); 03ff8001c81e: e3110024 lg %r1,32(%r1,%r0) 03ff8001c824: 47f01006 bc 15,6(%r1) Reviewed-by: Martin Schwidefsky schwidef...@de.ibm.com Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com Acked-by: Heiko Carstens heiko.carst...@de.ibm.com Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- arch/s390/net/bpf_jit.h | 10 +++- arch/s390/net/bpf_jit_comp.c | 106 +- 2 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h index de156ba3bd71..f6498eec9ee1 100644 --- a/arch/s390/net/bpf_jit.h +++ b/arch/s390/net/bpf_jit.h @@ -28,6 +28,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * | old backchain | | * +---+ | * | r15 - r6| | + * +---+ | + * | 4 byte align | | + * | tail_call_cnt | | * BFP- +===+ | * | | | * | BPF stack | | @@ -46,14 +49,17 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * R15- +---+ + low * * We get 160 bytes stack space from calling function, but only use - * 11 * 8 byte (old backchain + r15 - r6) for storing registers. + * 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt. */ #define STK_SPACE (MAX_BPF_STACK + 8 + 4 + 4 + 160) -#define STK_160_UNUSED (160 - 11 * 8) +#define STK_160_UNUSED (160 - 12 * 8) #define STK_OFF(STK_SPACE - STK_160_UNUSED) #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */ #define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */ +#define STK_OFF_R6 (160 - 11 * 8) /* Offset of r6 on stack */ +#define STK_OFF_TCCNT (160 - 12 * 8) /* Offset of tail_call_cnt on stack */ + /* Offset to skip condition code check */ #define OFF_OK 4 diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 55423d8be580..d3766dd67e23 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -21,6 +21,7 @@ #include linux/netdevice.h #include linux/filter.h #include linux/init.h +#include linux/bpf.h #include asm/cacheflush.h #include asm/dis.h #include bpf_jit.h @@ -40,6 +41,8 @@ struct bpf_jit { int base_ip;/* Base address for literal pool */ int ret0_ip;/* Address of return 0 */ int exit_ip;/* Address of exit */ + int tail_call_start;/* Tail call start offset */ + int labels[1]; /* Labels for local jumps */ }; #define BPF_SIZE_MAX 4096/* Max size for program */ @@ -49,6 +52,7 @@ struct bpf_jit { #define SEEN_RET0 4 /* ret0_ip points to a valid return 0 */ #define SEEN_LITERAL 8 /* code uses literals */ #define SEEN_FUNC 16 /* calls C functions */ +#define SEEN_TAIL_CALL 32 /* code uses tail calls */ #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB) /* @@ -60,6 +64,7 @@ struct bpf_jit { #define REG_L (__MAX_BPF_REG+3) /* Literal pool register */ #define REG_15 (__MAX_BPF_REG+4) /* Register 15 */ #define REG_0 REG_W0 /* Register 0 */ +#define REG_1 REG_W1 /* Register 1 */ #define REG_2 BPF_REG_1 /* Register 2 */ #define REG_14 BPF_REG_0 /* Register 14 */ @@ -223,6 +228,24 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1) REG_SET_SEEN(b3); \ }) +#define EMIT6_PCREL_LABEL(op1, op2,
[PATCH net-next 0/3] s390/bpf: implement bpf_tail_call JIT support
This set is for net-next tree. Patch 3 adds bpf_tail_call() support for s390x JIT. It has a dependency on patches 1 and 2 that will also be submitted to stable via Martin Schwidefsky. Michael Holzheu (3): s390/bpf: fix stack allocation s390/bpf: fix bpf frame pointer setup s390/bpf: implement bpf_tail_call() helper arch/s390/net/bpf_jit.h | 12 - arch/s390/net/bpf_jit_comp.c | 117 +++--- 2 files changed, 121 insertions(+), 8 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ingress tc filters with IPSec
On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote: Argh! yet another obstacle from my ignorance. We are attempting ingress traffic shaping using IFB interfaces on traffic coming via GRE / IPSec. Filters and hash tables are working fine with plain GRE including stripping the header. We even got the ematch filter working so that the ESP packets are the only packets not redirected to IFB. But, regardless of whether we redirect ESP packets to IFB, the filters never see the decrypted packets. I thought the packets passed through the interface twice - first encrypted and they decrypted. However, tcpdump only shows the ESP packets on the interface. How do we apply filters to the packets after decryption? Thanks - John I see what changed. In the past, this seemed to work but we were using tunnel mode. We were trying to use transport mode in this application but that seems to prevent the decrypted packet contents from appearing again on the interface. Reverting to tunnel mode made the contents visible again and our filters are working as expected - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] if_vlan: fix vlaue - value typo
From: Vivien Didelot vivien.dide...@savoirfairelinux.com Date: Wed, 27 May 2015 21:07:26 -0400 Fixes vlaue for value in include/linux/if_vlan.h. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bpf: allow BPF programs access skb-skb_iif and skb-dev-ifindex fields
From: Alexei Starovoitov a...@plumgrid.com Date: Wed, 27 May 2015 15:30:39 -0700 classic BPF already exposes skb-dev-ifindex via SKF_AD_IFINDEX extension. Allow eBPF program to access it as well. Note that classic aborts execution of the program if 'skb-dev == NULL' (which is inconvenient for program writers), whereas eBPF returns zero in such case. Also expose the 'skb_iif' field, since programs triggered by redirected packet need to known the original interface index. Summary: __skb-ifindex - skb-dev-ifindex __skb-ingress_ifindex - skb-skb_iif Signed-off-by: Alexei Starovoitov a...@plumgrid.com Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recurring trace from tcp_fragment()
On Sat, May 30, 2015 at 2:52 PM, Grant Zhang gzh...@fastly.com wrote: Thank you Neal. Most likely I will test the patch on Monday and report back the result. As for the TcpExtTCPSACKReneging counter, attached is the captured counter value on a 1-second interval for 10 minutes. OK, great. Those TcpExtTCPSACKReneging values look consistent with the theory underlying the patch, so that's a good sign. Thanks! neal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2015-05-28
From: Jeff Kirsher jeffrey.t.kirs...@intel.com Date: Thu, 28 May 2015 04:25:25 -0700 This series contains updates to ethtool, ixgbe, i40e and i40evf. John adds helper routines for ethtool to pass VF to rx_flow_spec. Since the ring_cookie is 64 bits wide which is much larger than what could be used for actual queue index values, provide helper routines to pack a VF index into the cookie. Then John provides a ixgbe patch to allow flow director to use the entire queue space. Neerav provides a i40e patch to collect XOFF Rx stats, where it was not being collected before. Anjali provides ATR support for tunneled packets, as well as stats to count tunnel ATR hits. Cleaned up PF struct members which are unnecessary, since we can use the stat index macro directly. Cleaned up flow director ATR/SB messages to a higher debug level since they are not useful unless silicon validation is happening. Greg provides a patch to disable offline diagnostics if VFs are enabled since ethtool offline diagnostic tests are not designed (out of scope) to disable VF functions for testing and re-enable afterward. Also cleans up TODO comment that is no longer needed. Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe called before i40e_fcoe_eof_is_supported() is called. Jesse adds skb-xmit_more support for i40evf. Then provides a performance enhancement for i40evf by inlining some functions which provides a 15% gain in small packet performance. Also cleans up the use of time_stamp since it is no longer used to determine if there is a tx_hang and was a part of a previous tx_hang design which is no longer used. Pulled, thanks Jeff. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
On 05/30/2015 02:00 AM, Jiri Pirko wrote: Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote: On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko j...@resnulli.us wrote: Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote: On Tue, May 19, 2015 at 1:28 PM, David Miller da...@davemloft.net wrote: From: Andy Gospodarek go...@cumulusnetworks.com Date: Tue, 19 May 2015 15:47:32 -0400 Are you actually saying that if users complain loudly enough about the current behavior (not the change Roopa has proposed) that you would be open to considering a change the current behavior? I am saying that we have a contract with users not to break existing behavior. Full stop. After rehearing David's argument, we should probably explore option d) which is a refinement on the fib_offload_disable mechanism we have today. fib_offload_disable is global for all routes. Once we hit a HW install problem, the global flag is set and all routes fallback to SW. We did this because we can't allow the failed route to exist in SW and not in HW because it could mess up LPM searches (HW could hit on a lesser prefix even when SW has the true LPM, because HW gets first shot at match). The refinement on fib_offload_disable is this: make it per-related-prefix rather than global, and on a HW install problem, set the flag for the related-prefix and uninstall only those routes from HW. Related-prefix (is there a correct term for this?) are routes to the same dst addr but with different prefix lengths. I haven't parsed the fib_trie structure to see how routes are organized, but I suspect since it's optimized for lookup the related-prefix tracking is already there and we can build on that. This looks interesting. However, I'm not sure that it is acceptable for user to experience this hw evict of random entries. User knows what entries are essential to have in hw. With your solution, I can see no way user can actually say what should be offloaded or not. Kernel just automagically decides. The default eviction policy could be based on RTA_PRIORITY: evict lower priority routes first. It would be up to the device driver to decide between two routes of same priority. To help device driver make the decision, we could have eviction policy options: Priority-base (default) Prefer IPv6 over IPv4 Prefer IPv4 over IPv6 Prefer single path over multipath Prefer longer prefix lengths over shorter Optimize for resource utilization These are portable across different switches. They're in terms a user understands. It's up to the device driver which truly understands the device constraints to translates the user's eviction policy choices into something that makes sense to that device. This sounds tempting... You plan to throw in some patches, or should I take care of that? This is encoding specific policies into the kernel. I was hoping to avoid this and let user space develop whatever policy it wants. If you use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this. Also I don't understand the truly understands the device constraints comment. We can export a model of the device and know how many rules of each type will fit exactly into the table. This doesn't seem like much of a problem to me. In fact the driver developer should know this anyway. Part of my motivation here is I really don't want to get stuck with a case where each driver writer gets to translate the eviction policy onto their device in some device specific and slightly different way. It means every developer has to write a new mapping and get it correct. At very least we should put a layer in switchdev that reads the table out of the driver and does the mapping so we have it one spot. At least then the kernel is enforcing policy the same on all devices. Better still IMO would be to develop the policy in user space and have a library/tool that does this so we don't end up with a bunch of policy blobs in the kernel. The 6 above is a good start but over time we more policy blobs will surely pop up. I would for example put 'optimize for throughput' on the list. .John -- John Fastabend Intel Corporation -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-4 100G Ethernet driver
From: Amir Vadai am...@mellanox.com Date: Thu, 28 May 2015 22:28:37 +0300 This patchset extends the mlx5_core driver to support Ethernet functionality. The Ethernet functionality in the mlx5 driver is integrated into the core driver and not as separated driver. The IB functionality remains in the mlx5_ib driver as before. Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] net: dsa: ar8xxx: add regmap support
2015-05-29 20:59 GMT+03:00 Andrew Lunn and...@lunn.ch: On Fri, May 29, 2015 at 10:36:49AM -0700, Mathieu Olivari wrote: Alternatively, we could have something similar to what happens for the phy in the wireless subsystems. Wireless PHYs are not registered as net_device but they can still be listed, queried or configured through netlink. It is a reasonable idea, but you retrieve most of the useful information using ethtool. That, as far as i know, operates on net_devices, not phys. May be it's time to rework Ethernet cards handling to decouple Network interfaces from Ethernet ports? -- Sergey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Sun, May 31, 2015 at 11:53:47AM +0900, Greg KH wrote: On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote: On 05/23/2015 04:16 PM, Larry Finger wrote: The driver is reporting a warning at kernel/time/timer.c:1096 due to calling del_timer_sync() while in interrupt mode. Such warnings are fixed by calling del_timer() instead. Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org Cc: Haggi Eran haggai.e...@gmail.com --- Greg, Please drop this patch. The same fixes were submitted as https://lkml.org/lkml/2015/5/15/226. That's not working for me at the moment, what was the subject: name? I think I already applied it to the testing tree... Nevermind, found it... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote: On 05/23/2015 04:16 PM, Larry Finger wrote: The driver is reporting a warning at kernel/time/timer.c:1096 due to calling del_timer_sync() while in interrupt mode. Such warnings are fixed by calling del_timer() instead. Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org Cc: Haggi Eran haggai.e...@gmail.com --- Greg, Please drop this patch. The same fixes were submitted as https://lkml.org/lkml/2015/5/15/226. That's not working for me at the moment, what was the subject: name? I think I already applied it to the testing tree... thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 0/5] Add support for QCA IPQ806x Ethernet GMAC controller
From: Mathieu Olivari math...@codeaurora.org Date: Wed, 27 May 2015 11:02:45 -0700 This patch set adds support for the integrated Ethernet GMAC controller on QCA IPQ806x SoC. This controller is based on a Gigabit Synopsys DesignWare IP, already supported in the stmmac driver located in drivers/net/ethernet/stmicro/stmmac. This change is done as a follow-up to the following thread: *http://www.spinics.net/lists/netdev/msg311265.html While previous attempt was creating a new driver to drive this controller, this new post leverages the existing stmmac driver by implementing the SoC specific glue to it. Aside from the pure stmmac glue layer, we have a couple of related patches: *IPQ806x NSS clock addition is cherry-picked and refreshed from the following thread: https://lkml.org/lkml/2014/8/6/390 *phy-handle and fixed-link support are also added in this change set so the driver can be fully functional on platforms using device-trees as well as ethernet switches. V2: *Fix MODULE_LICENSE to Dual BSD/GPL as the dwmac-ipq806x.c is using ISC license. Series applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.
2015-05-30 15:09 GMT+03:00 Bjørn Mork bj...@mork.no: Andrew Lunn and...@lunn.ch writes: Some boards have two CPU interfaces connected to the switch, e.g. WiFi access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and two port connected to the SoC. This patch extends DSA to allows both CPU ports to be used. The cpu node in the DSA tree can now have a phandle to the host interface it connects to. Each user port can have a phandle to a cpu port which should be used for traffic between the port and the CPU. Thus simple load sharing over the two CPU ports can be achieved. Signed-off-by: Andrew Lunn and...@lunn.ch --- Documentation/devicetree/bindings/net/dsa/dsa.txt | 66 - drivers/net/dsa/mv88e6xxx.c | 8 +- include/net/dsa.h | 28 +- net/dsa/dsa.c | 109 ++ net/dsa/dsa_priv.h| 6 ++ net/dsa/slave.c | 10 +- net/dsa/tag_brcm.c| 2 +- net/dsa/tag_dsa.c | 2 +- net/dsa/tag_edsa.c| 2 +- net/dsa/tag_trailer.c | 2 +- 10 files changed, 206 insertions(+), 29 deletions(-) diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index f0b4cd72411d..34f7f18026e5 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -58,13 +58,24 @@ Optionnal property: Documentation/devicetree/bindings/net/ethernet.txt for details. +- ethernet : Optional for cpu ports. A phandle to an ethernet + device which will be used by this CPU port for + passing packets to/from the host. If not present, + the port will use the dsa,ethernet property + defined above. + +- cpu: Option for non cpu/dsa ports. A phandle to a + cpu port, which will be used for passing packets + from this port to the host. If not present, the first + cpu port will be used. + Forgive me my intrusion. Maybe I could answer to some of your questions. I'm in deep water here, but this scheme sounds a little too static to me if I understand your proposal correctly. Why would you want to create a static mapping of CPU ports to external ports for any given device? Vendor already assumes that this mapping is static and DT just describes this assumption. Single switch chip with two ports connected to CPU on such devices is cheaper than switch chip + dedicated phy chip. In other words, one of the switch ports just used as independent phy and Andrew's patch gives an ability to perfectly describe such situation. To me, that's part of the switch VLAN configuration. AFAIK DSA is designed to allow L3 routing between ports as opposed to switching and VLANs at L2. DSA facilitates work of hardware designer by providing more configurable chips. If so then interconnection tasks should be resolved by kernel in plug-and-play manner, just as kernel assigns memory regions to PCI devices :) My experience with these devices is limited to running OpenWRT on an WRT1900AC, having a Marvell 88E6172 switch. And using the OpenWRT switch API of course. There I've found it very useful to be able to mix and match the two CPU ports as I like with the external ports. How you want the CPU ports used is not as much depeing on device properties as on your network configuration, IMHO. How many and which links do you have? What bandwith are they? Trunks or not? Etc. You cannot describe these answers as device properties, because they aren't. Nobody forbids to run custom kernel with custom DT in case of custom setup :) You can currently configure this as you like in OpenWRT using their usual swconfig tool. The CPU ports are added or removed from VLANs like any other port on the switch, and that feels very natural for me as an end user. The only distinction necessary to know, is your 'ethernet' property above: Which host device is this switch port connected to. So I wonder: Do you plan to put all of the switch config into DT? Where does that stop? How about trunking between external ports and CPU ports? Will every VLAN in the trunk have to go into DT too? IMHO VLANs shouldn't be described by DT. VLANs is part of network configuration and should be configured by end user, if he needs them. In the same time, DSA configuration is part of hw configuration and that's why it placed in DT. In any case, Andrew as an author could give a better explanation. So let's wait for his answer. -- Sergey -- To unsubscribe from this list: send
Gefeliciteerd !!!
-- Gefeliciteerd !!! Including we Vieren Onze 10 jaar Van het internet Journey en Global Communication we are Blij aan te kondigen aan u DAT Uw Facebook-rekening are willekeurig geselecteerd als begunstigde van $ 1,000,000.00usd in de 2014/2015 Facebook account van het Jaar {Grote Rewards winnaar} . E-mail ons de informatie hieronder: fb_deliveryserv...@mynet.com BERICHT VAN identificatie: NW90W0W0-XANSIEW-1015 1) Bedrag gewonnen: $ 1.000.000,00 usd 2) facebook Gebruikersnaam: 3) De dialog Land van Woonplaats: 4) Paspoort / Identity Number: E-mail: fb_deliveryserv...@mynet.com George Jones. Program Coordinator, Facebook Rewards Program, www.facebook.com Alle Rechten voorbehouden 2015. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN
From: Sorin Dumitru so...@returnze.ro Date: Wed, 27 May 2015 22:16:49 +0300 This is similar to b1cb59cf2efe(net: sysctl_net_core: check SNDBUF and RCVBUF for min length). I don't think too small values can cause crashes in the case of udp and tcp, but I've seen this set to too small values which triggered awful performance. It also makes the setting consistent across all the wmem/rmem sysctls. Signed-off-by: Sorin Dumitru sdumi...@ixiacom.com Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net-next 1/1] hv_netvsc: Properly size the vrss queues
From: K. Y. Srinivasan k...@microsoft.com Date: Wed, 27 May 2015 13:16:57 -0700 The current algorithm for deciding on the number of VRSS channels is not optimal since we open up the min of number of CPUs online and the number of VRSS channels the host is offering. So on a 32 VCPU guest we could potentially open 32 VRSS subchannels. Experimentation has shown that it is best to limit the number of VRSS channels to the number of CPUs within a NUMA node. Here is the new algorithm for deciding on the number of sub-channels we would open up: 1) Pick the minimum of what the host is offering and what the driver in the guest is specifying as the default value. 2) Pick the minimum of (1) and the numbers of CPUs in the NUMA node the primary channel is bound to. Signed-off-by: K. Y. Srinivasan k...@microsoft.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tipc: unconditionally put sock refcnt when sock timer to be deleted is pending
From: Ying Xue ying@windriver.com Date: Thu, 28 May 2015 13:19:22 +0800 As sock refcnt is taken when sock timer is started in sk_reset_timer(), the sock refcnt should be put when sock timer to be deleted is in pending state no matter what probing_state value of tipc sock is. Reviewed-by: Erik Hugne erik.hu...@ericsson.com Reviewed-by: Jon Maloy jon.ma...@ericsson.com Signed-off-by: Ying Xue ying@windriver.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] can: mcp251x: not correct register address
On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote: This patch corrects addresses of acceptance filters. These registers are not in use, but values should be correct. Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 and MCP2510. Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz --- drivers/net/can/spi/mcp251x.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index bf63fee..c1a95a3 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -190,10 +190,11 @@ #define RXBEID0_OFF 4 #define RXBDLC_OFF 5 #define RXBDAT_OFF 6 -#define RXFSIDH(n) ((n) * 4) -#define RXFSIDL(n) ((n) * 4 + 1) -#define RXFEID8(n) ((n) * 4 + 2) -#define RXFEID0(n) ((n) * 4 + 3) +#define RXFSID(n) ((n 3) ? 0 : 4) +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) #define RXMSIDH(n) ((n) * 4 + 0x20) #define RXMSIDL(n) ((n) * 4 + 0x21) #define RXMEID8(n) ((n) * 4 + 0x22) I think your patch was corrupted. It doesn't apply because you have extra space before each surviving #define. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] can: mcp251x: not correct register address
You are right, sorry for that. I'll send v2. Thanks. 2015-05-30 9:41 GMT+02:00 Jakub Kicinski moorr...@wp.pl: On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote: This patch corrects addresses of acceptance filters. These registers are not in use, but values should be correct. Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 and MCP2510. Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz --- drivers/net/can/spi/mcp251x.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index bf63fee..c1a95a3 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -190,10 +190,11 @@ #define RXBEID0_OFF 4 #define RXBDLC_OFF 5 #define RXBDAT_OFF 6 -#define RXFSIDH(n) ((n) * 4) -#define RXFSIDL(n) ((n) * 4 + 1) -#define RXFEID8(n) ((n) * 4 + 2) -#define RXFEID0(n) ((n) * 4 + 3) +#define RXFSID(n) ((n 3) ? 0 : 4) +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) #define RXMSIDH(n) ((n) * 4 + 0x20) #define RXMSIDL(n) ((n) * 4 + 0x21) #define RXMEID8(n) ((n) * 4 + 0x22) I think your patch was corrupted. It doesn't apply because you have extra space before each surviving #define. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bpf: add missing rcu protection when releasing programs from prog_array
On 05/30/2015 01:22 AM, Alexei Starovoitov wrote: ... Like __sk_filter_release() and __bpf_prog_release() should be removed. The whole filter cleanup procedure needs to be simplified a bit, got a bit too complicated over time, agreed. Of course, it's a grey line when to introduce a helper and when not to, but just because two lines are close enough between two functions it doesn't mean that helper is warranted. In this bpf_prog_put() case I think helper is not needed _today_. If it grows, we'll reconsider. Yes, that's what I meant. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote: On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko j...@resnulli.us wrote: Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote: On Tue, May 19, 2015 at 1:28 PM, David Miller da...@davemloft.net wrote: From: Andy Gospodarek go...@cumulusnetworks.com Date: Tue, 19 May 2015 15:47:32 -0400 Are you actually saying that if users complain loudly enough about the current behavior (not the change Roopa has proposed) that you would be open to considering a change the current behavior? I am saying that we have a contract with users not to break existing behavior. Full stop. After rehearing David's argument, we should probably explore option d) which is a refinement on the fib_offload_disable mechanism we have today. fib_offload_disable is global for all routes. Once we hit a HW install problem, the global flag is set and all routes fallback to SW. We did this because we can't allow the failed route to exist in SW and not in HW because it could mess up LPM searches (HW could hit on a lesser prefix even when SW has the true LPM, because HW gets first shot at match). The refinement on fib_offload_disable is this: make it per-related-prefix rather than global, and on a HW install problem, set the flag for the related-prefix and uninstall only those routes from HW. Related-prefix (is there a correct term for this?) are routes to the same dst addr but with different prefix lengths. I haven't parsed the fib_trie structure to see how routes are organized, but I suspect since it's optimized for lookup the related-prefix tracking is already there and we can build on that. This looks interesting. However, I'm not sure that it is acceptable for user to experience this hw evict of random entries. User knows what entries are essential to have in hw. With your solution, I can see no way user can actually say what should be offloaded or not. Kernel just automagically decides. The default eviction policy could be based on RTA_PRIORITY: evict lower priority routes first. It would be up to the device driver to decide between two routes of same priority. To help device driver make the decision, we could have eviction policy options: Priority-base (default) Prefer IPv6 over IPv4 Prefer IPv4 over IPv6 Prefer single path over multipath Prefer longer prefix lengths over shorter Optimize for resource utilization These are portable across different switches. They're in terms a user understands. It's up to the device driver which truly understands the device constraints to translates the user's eviction policy choices into something that makes sense to that device. This sounds tempting... You plan to throw in some patches, or should I take care of that? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.
Andrew Lunn and...@lunn.ch writes: Some boards have two CPU interfaces connected to the switch, e.g. WiFi access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and two port connected to the SoC. This patch extends DSA to allows both CPU ports to be used. The cpu node in the DSA tree can now have a phandle to the host interface it connects to. Each user port can have a phandle to a cpu port which should be used for traffic between the port and the CPU. Thus simple load sharing over the two CPU ports can be achieved. Signed-off-by: Andrew Lunn and...@lunn.ch --- Documentation/devicetree/bindings/net/dsa/dsa.txt | 66 - drivers/net/dsa/mv88e6xxx.c | 8 +- include/net/dsa.h | 28 +- net/dsa/dsa.c | 109 ++ net/dsa/dsa_priv.h| 6 ++ net/dsa/slave.c | 10 +- net/dsa/tag_brcm.c| 2 +- net/dsa/tag_dsa.c | 2 +- net/dsa/tag_edsa.c| 2 +- net/dsa/tag_trailer.c | 2 +- 10 files changed, 206 insertions(+), 29 deletions(-) diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index f0b4cd72411d..34f7f18026e5 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -58,13 +58,24 @@ Optionnal property: Documentation/devicetree/bindings/net/ethernet.txt for details. +- ethernet : Optional for cpu ports. A phandle to an ethernet + device which will be used by this CPU port for + passing packets to/from the host. If not present, + the port will use the dsa,ethernet property + defined above. + +- cpu: Option for non cpu/dsa ports. A phandle to a + cpu port, which will be used for passing packets + from this port to the host. If not present, the first + cpu port will be used. + I'm in deep water here, but this scheme sounds a little too static to me if I understand your proposal correctly. Why would you want to create a static mapping of CPU ports to external ports for any given device? To me, that's part of the switch VLAN configuration. My experience with these devices is limited to running OpenWRT on an WRT1900AC, having a Marvell 88E6172 switch. And using the OpenWRT switch API of course. There I've found it very useful to be able to mix and match the two CPU ports as I like with the external ports. How you want the CPU ports used is not as much depeing on device properties as on your network configuration, IMHO. How many and which links do you have? What bandwith are they? Trunks or not? Etc. You cannot describe these answers as device properties, because they aren't. You can currently configure this as you like in OpenWRT using their usual swconfig tool. The CPU ports are added or removed from VLANs like any other port on the switch, and that feels very natural for me as an end user. The only distinction necessary to know, is your 'ethernet' property above: Which host device is this switch port connected to. So I wonder: Do you plan to put all of the switch config into DT? Where does that stop? How about trunking between external ports and CPU ports? Will every VLAN in the trunk have to go into DT too? Bjørn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] can: mcp251x: not correct register address
v2: fix of corrupted patch This patch corrects addresses of acceptance filters. These registers are not in use, but values should be correct. Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 and MCP2510. Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz --- drivers/net/can/spi/mcp251x.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index bf63fee..c1a95a3 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -190,10 +190,11 @@ #define RXBEID0_OFF 4 #define RXBDLC_OFF 5 #define RXBDAT_OFF 6 -#define RXFSIDH(n) ((n) * 4) -#define RXFSIDL(n) ((n) * 4 + 1) -#define RXFEID8(n) ((n) * 4 + 2) -#define RXFEID0(n) ((n) * 4 + 3) +#define RXFSID(n) ((n 3) ? 0 : 4) +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) #define RXMSIDH(n) ((n) * 4 + 0x20) #define RXMSIDL(n) ((n) * 4 + 0x21) #define RXMEID8(n) ((n) * 4 + 0x22) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 42/98] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘iph’ has incomplete type error: field ‘prefix’ has incomplete type Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/if_tunnel.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index bd3cc11..2a36080 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -2,6 +2,9 @@ #define _UAPI_IF_TUNNEL_H_ #include linux/types.h +#include linux/if.h +#include linux/ip.h +#include linux/in6.h #include asm/byteorder.h -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 41/98] include/uapi/linux/if_pppox.h: include linux/if.h
Fixes userspace compilation error: error: ‘IFNAMSIZ’ undeclared here (not in a function) Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/if_pppox.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index e128769..473c3c4 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -21,6 +21,7 @@ #include asm/byteorder.h #include linux/socket.h +#include linux/if.h #include linux/if_ether.h #include linux/if_pppol2tp.h -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 48/98] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/if_pppox.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index 473c3c4..d37bbb1 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -24,6 +24,8 @@ #include linux/if.h #include linux/if_ether.h #include linux/if_pppol2tp.h +#include linux/in.h +#include linux/in6.h /* For user-space programs to pick up these definitions * which they wouldn't get otherwise without defining __KERNEL__ -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 47/98] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ ^ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/if_pppol2tp.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h index 163e8ad..4bd1f55 100644 --- a/include/uapi/linux/if_pppol2tp.h +++ b/include/uapi/linux/if_pppol2tp.h @@ -16,7 +16,8 @@ #define _UAPI__LINUX_IF_PPPOL2TP_H #include linux/types.h - +#include linux/in.h +#include linux/in6.h /* Structure used to connect() the socket to a particular tunnel UDP * socket over IPv4. -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 84/98] include/uapi/linux/atm_zatm.h: include linux/time.h
Fixes userspace compile error: error: field ‘real’ has incomplete type struct timeval real; /* real (wall-clock) time */ Signed-off-by: Mikko Rapeli mikko.rap...@iki.fi --- include/uapi/linux/atm_zatm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h index 10f0fa2..adbaa6c 100644 --- a/include/uapi/linux/atm_zatm.h +++ b/include/uapi/linux/atm_zatm.h @@ -14,6 +14,7 @@ #include linux/atmapi.h #include linux/atmioc.h +#include linux/time.h #define ZATM_GETPOOL _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc) /* get pool statistics */ -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] udp: fix behavior of wrong checksums
From: Eric Dumazet eduma...@google.com We have two problems in UDP stack related to bogus checksums : 1) We return -EAGAIN to application even if receive queue is not empty. This breaks applications using edge trigger epoll() 2) Under UDP flood, we can loop forever without yielding to other processes, potentially hanging the host, especially on non SMP. This patch is an attempt to make things better. We might in the future add extra support for rt applications wanting to better control time spent doing a recv() in a hostile environment. For example we could validate checksums before queuing packets in socket receive queue. Signed-off-by: Eric Dumazet eduma...@google.com Cc: Willem de Bruijn will...@google.com --- net/ipv4/udp.c |6 ++ net/ipv6/udp.c |6 ++ 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d10b7e0112eb..1c92ea67baef 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1345,10 +1345,8 @@ csum_copy_err: } unlock_sock_fast(sk, slow); - if (noblock) - return -EAGAIN; - - /* starting over for a new packet */ + /* starting over for a new packet, but check if we need to yield */ + cond_resched(); msg-msg_flags = ~MSG_TRUNC; goto try_again; } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index c2ec41617a35..e51fc3eee6db 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -525,10 +525,8 @@ csum_copy_err: } unlock_sock_fast(sk, slow); - if (noblock) - return -EAGAIN; - - /* starting over for a new packet */ + /* starting over for a new packet, but check if we need to yield */ + cond_resched(); msg-msg_flags = ~MSG_TRUNC; goto try_again; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html