[PATCH net v2] enic: fix issues in enic_poll
In enic_poll, we clean tx and rx queues, when low latency busy socket polling is happening, enic_poll will only clean tx queue. After cleaning tx, it should return total budget for re-poll. There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi(). In this window if an irq occurs and napi is scheduled on different cpu, it tries to acquire enic_poll_lock_napi() and fails. Unlock napi_poll before unmasking the interrupt. v2: Do not change tx wonk done behaviour. Consider only rx work done for completing napi. Signed-off-by: Govindarajulu Varadarajan _gov...@gmx.com --- drivers/net/ethernet/cisco/enic/enic_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index da2004e..918a8e4 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1170,7 +1170,7 @@ static int enic_poll(struct napi_struct *napi, int budget) wq_work_done, 0 /* dont unmask intr */, 0 /* dont reset intr timer */); - return rq_work_done; + return budget; } if (budget 0) @@ -1191,6 +1191,7 @@ static int enic_poll(struct napi_struct *napi, int budget) 0 /* don't reset intr timer */); err = vnic_rq_fill(enic-rq[0], enic_rq_alloc_buf); + enic_poll_unlock_napi(enic-rq[cq_rq], napi); /* Buffer allocation failed. Stay in polling * mode so we can try to fill the ring again. @@ -1208,7 +1209,6 @@ static int enic_poll(struct napi_struct *napi, int budget) napi_complete(napi); vnic_intr_unmask(enic-intr[intr]); } - enic_poll_unlock_napi(enic-rq[cq_rq], napi); return rq_work_done; } -- 2.4.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Error message clean-ups for Renesas R-Car CAN driver
On 06/20/2015 02:49 AM, Sergei Shtylyov wrote: Hello. Here's the set of 2 patches against Marc Kleine-Budde's 'linux-can.git' repo plus 3 fix patches just posted; they are small error message cleanups for the Renesas R-Car CAN driver. Applied both series to can/master. Thanks, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions| Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917- | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote: On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote: diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c [] @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev) while (lp-rx_ring[lp-rx_tail].addr MACB_BIT(RX_USED)) { p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl); - skb = netdev_alloc_skb(dev, pktlen + 2); + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN); if (skb) { - skb_reserve(skb, 2); + skb_reserve(skb, NET_IP_ALIGN); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-protocol = eth_type_trans(skb, dev); Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() It seems there are ~50 of these in the kernel tree that could be converted. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: macb: zynq: why is SG disabled?
Le 01/07/2015 17:14, Nicolae Rosia a écrit : Hello, After reading the GEM part of Zynq7000 Technical Reference Manual [0], I think that SG should be supported. Is there a reason why SG is disabled in macb for Zynq? Best regards, Nicolae Rosia Hi Nicolae, when the scatter-gather patch was introduced, the feature was enabled only on tested boards to avoid regressions on other boards. So SG is enabled on sama5d4x and sama5d2x SoCs. SG is disabled on purpose on sama5d3x. For Zynq, I think the feature is still disabled just because it has never been tested. Best regards, Cyrille -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Sebastian Ott wrote: On Wed, 1 Jul 2015, Or Gerlitz wrote: On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Thanks for the patch. Unfortunately, that didn't work: OK, using this patch it worked: diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 12fbfcb..29c2a01 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2273,6 +2273,11 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev) } else if (err == -ENOENT) { err = 0; continue; + } else if (mlx4_is_slave(dev) err == -EINVAL) { + priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev); + mlx4_warn(dev, can't allocate counter from old PF driver, using index %d\n, + MLX4_SINK_COUNTER_INDEX(dev)); + err = 0; } else { mlx4_err(dev, %s: failed to allocate default counter port %d err %d\n, __func__, port + 1, err); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Or. diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 12fbfcb..a66cc6e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev) } else if (err == -ENOENT) { err = 0; continue; + } else if (mlx4_is_slave(dev) err == -EINVAL) { + priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev); + mlx4_warn(dev, can't allocate counter from old PF driver, using index %d\n, + MLX4_SINK_COUNTER_INDEX(dev)); } else { mlx4_err(dev, %s: failed to allocate default counter port %d err %d\n, __func__, port + 1, err); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On 07/01/2015 04:44 PM, Eric Dumazet wrote: I really doubt this adapter can process millions of packets per second ? I was suggesting this since I was taking into consideration the comment from skbuff.c, we can save several CPU cycles by avoiding having to disable and re-enable IRQs. Is there a downside to this? I would rather enable GRO, it would be more useful. I had no idea what GRO is, so I have read about it [0] and looked at a couple of drivers which use it. They all seem to replace netif_receive_skb with napi_gro_receive and when there are no more packets in napi_pool they call napi_gro_flush. Is it that simple? Regards [0] https://lwn.net/Articles/358910/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On 07/01/2015 01:56 PM, Eric Dumazet wrote: Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() Thank you for the suggestion. I can do that. A related question, should I also replace netdev_alloc with napi_alloc_skb in places where I have a napi struct? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, 2015-07-01 at 16:29 +0300, Nicolae Rosia wrote: On 07/01/2015 01:56 PM, Eric Dumazet wrote: Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() Thank you for the suggestion. I can do that. A related question, should I also replace netdev_alloc with napi_alloc_skb in places where I have a napi struct? I really doubt this adapter can process millions of packets per second ? I would rather enable GRO, it would be more useful. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Or Gerlitz wrote: On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Thanks for the patch. Unfortunately, that didn't work: [ 170.531076] mlx4_core :00:00.0: NOP command IRQ test passed [ 170.531291] mlx4_core :00:00.0: can't allocate counter from old PF driver, using index 255 [ 170.531294] mlx4_core :00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 1 [ 170.531531] mlx4_core :00:00.0: can't allocate counter from old PF driver, using index 255 [ 170.531534] mlx4_core :00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 2 [ 170.531535] mlx4_core :00:00.0: Failed to allocate default counters, aborting [ 170.587306] mlx4_core: probe of :00:00.0 failed with error -22 Regards, Sebastian Or. diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 12fbfcb..a66cc6e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev) } else if (err == -ENOENT) { err = 0; continue; + } else if (mlx4_is_slave(dev) err == -EINVAL) { + priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev); + mlx4_warn(dev, can't allocate counter from old PF driver, using index %d\n, + MLX4_SINK_COUNTER_INDEX(dev)); } else { mlx4_err(dev, %s: failed to allocate default counter port %d err %d\n, __func__, port + 1, err); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rionet: Don't try to corrupt skbuff assigning data pointer directly
It's not allowed to assign data pointer of skbuff directly, this makes no sense if the assigned pointer is the very same as already existing one, or it brakes all the pointer arithmetics in all other cases. We cannot do better as just compare them and report BUG() in case of mismatch. Signed-off-by: Alexander Sverdlin alexander.sverd...@nokia.com --- We came across this problem developing new code for Octeon2 RAPIDIO. For the last 10 years since original commit of the code this assignment did nothing as the pointers were always same. But the bug in the new code discovered this one. So better do BUG() immediately here, this would prevent longer debugging of the following skbuff corruption. drivers/net/rionet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c index dac7a0d..34c27b8 100644 --- a/drivers/net/rionet.c +++ b/drivers/net/rionet.c @@ -104,7 +104,8 @@ static int rionet_rx_clean(struct net_device *ndev) if (!(data = rio_get_inb_message(rnet-mport, RIONET_MAILBOX))) break; - rnet-rx_skb[i]-data = data; + if (rnet-rx_skb[i]-data != data) + BUG(); skb_put(rnet-rx_skb[i], RIO_MAX_MSG_SIZE); rnet-rx_skb[i]-protocol = eth_type_trans(rnet-rx_skb[i], ndev); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] virtio/vhost: cross endian support
The following changes since commit 8a7b19d8b542b87bccc3eaaf81dcc90a5ca48aea: include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h (2015-06-01 15:46:54 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 59a5b0f7bf74f88da6670bcbf924d8cc1e75b1ee: virtio-pci: alloc only resources actually used. (2015-06-24 08:15:09 +0200) virtio/vhost: cross endian support I have just queued some more bugfix patches today but none fix regressions and none are related to these ones, so it looks like a good time for a merge for -rc1. Signed-off-by: Michael S. Tsirkin m...@redhat.com Gerd Hoffmann (1): virtio-pci: alloc only resources actually used. Greg Kurz (8): virtio: introduce virtio_is_little_endian() helper tun: add tun_is_little_endian() helper macvtap: introduce macvtap_is_little_endian() helper vringh: introduce vringh_is_little_endian() helper vhost: introduce vhost_is_little_endian() helper virtio: add explicit big-endian support to memory accessors vhost: cross-endian support for legacy devices macvtap/tun: cross-endian support for little-endian hosts drivers/vhost/vhost.h | 25 --- drivers/virtio/virtio_pci_common.h | 2 + include/linux/virtio_byteorder.h | 24 ++- include/linux/virtio_config.h | 18 +--- include/linux/vringh.h | 18 +--- include/uapi/linux/if_tun.h| 6 +++ include/uapi/linux/vhost.h | 14 +++ drivers/net/macvtap.c | 65 - drivers/net/tun.c | 67 +- drivers/vhost/vhost.c | 85 +- drivers/virtio/virtio_pci_common.c | 7 drivers/virtio/virtio_pci_legacy.c | 13 +- drivers/virtio/virtio_pci_modern.c | 24 --- drivers/net/Kconfig| 14 +++ drivers/vhost/Kconfig | 15 +++ 15 files changed, 350 insertions(+), 47 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
Le 30/06/2015 19:25, Nicolae Rosia a écrit : Signed-off-by: Nicolae Rosia nicolae.ro...@certsign.ro Acked-by: Nicolas Ferre nicolas.fe...@atmel.com Thanks, bye. --- drivers/net/ethernet/cadence/macb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index caeb395..dbb5160 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev) while (lp-rx_ring[lp-rx_tail].addr MACB_BIT(RX_USED)) { p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl); - skb = netdev_alloc_skb(dev, pktlen + 2); + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN); if (skb) { - skb_reserve(skb, 2); + skb_reserve(skb, NET_IP_ALIGN); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-protocol = eth_type_trans(skb, dev); -- Nicolas Ferre -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.1 regression in resizable hashtable tests
This is 4.1 on sparc64 - one of my boxes that happens to have most runtime test left on from some debugging effort. In 4.0 it was fine, 4.1 gives this in dmesg: [ 31.898697] Running resizable hashtable tests... [ 31.898915] Adding 2048 keys [ 31.952911] Traversal complete: counted=17, nelems=2048, entries=2048 [ 31.953004] Test failed: Total count mismatch ^^^ [ 32.022676] Traversal complete: counted=17, nelems=2048, entries=2048 [ 32.022788] Test failed: Total count mismatch ^^^ [ 32.022828] Deleting 2048 keys Full dmesg: [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36' [0.00] PROMLIB: Root node compatible: [0.00] Linux version 4.1.0 (mroos@u5) (gcc version 4.9.2 (Debian 4.9.2-20) ) #18 Wed Jul 1 02:33:02 EEST 2015 [0.00] bootconsole [earlyprom0] enabled [0.00] ARCH: SUN4U [0.00] Ethernet address: 08:00:20:f8:c7:72 [0.00] MM: PAGE_OFFSET is 0xf800 (max_phys_bits == 40) [0.00] MM: VMALLOC [0x0001 -- 0x0600] [0.00] MM: VMEMMAP [0x0600 -- 0x0c00] [0.00] Kernel: Using 6 locked TLB entries for main kernel image. [0.00] Remapping the kernel... done. [0.00] kmemleak: Kernel memory leak detector disabled [0.00] OF stdout device is: /pci@1f,0/pci@1,1/ebus@1/se@14,40:a [0.00] PROM: Built device tree with 70282 bytes of memory. [0.00] Top of RAM: 0x1ff3e000, Total RAM: 0x1ff2e000 [0.00] Memory hole size: 0MB [0.00] Allocated 16384 bytes for kernel page tables. [0.00] Zone ranges: [0.00] Normal [mem 0x-0x1ff3dfff] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x-0x1fefdfff] [0.00] node 0: [mem 0x1ff0-0x1ff2bfff] [0.00] node 0: [mem 0x1ff3a000-0x1ff3dfff] [0.00] Initmem setup node 0 [mem 0x-0x1ff3dfff] [0.00] On node 0 totalpages: 65431 [0.00] Normal zone: 512 pages used for memmap [0.00] Normal zone: 0 pages reserved [0.00] Normal zone: 65431 pages, LIFO batch:15 [0.00] Booting Linux... [0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus] [0.00] CPU CAPS: [vis] [0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 [0.00] pcpu-alloc: [0] 0 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64919 [0.00] Kernel command line: root=/dev/sda1 ro [0.00] PID hash table entries: 2048 (order: 1, 16384 bytes) [0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes) [0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes) [0.00] Sorting __ex_table... [0.00] Memory: 491632K/523448K available (5216K kernel code, 509K rwdata, 1656K rodata, 520K init, 14578K bss, 31816K reserved, 0K cma-reserved) [0.00] Running RCU self tests [0.00] Testing tracer nop: PASSED [0.00] NR_IRQS:2048 nr_irqs:2048 1 [ 25.662551] clocksource tick: mask: 0x max_cycles: 0x5306eb473f, max_idle_ns: 440795213232 ns [ 25.765203] clocksource: mult[2c71c72] shift[24] [ 25.804735] clockevent: mult[5c28f5c3] shift[32] [ 25.846767] Console: colour dummy device 80x25 [ 25.882817] console [tty0] enabled [ 25.907574] bootconsole [earlyprom0] disabled [ 25.944044] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 25.944112] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 25.944151] ... MAX_LOCK_DEPTH: 48 [ 25.944190] ... MAX_LOCKDEP_KEYS:8191 [ 25.944229] ... CLASSHASH_SIZE: 4096 [ 25.944268] ... MAX_LOCKDEP_ENTRIES: 32768 [ 25.944308] ... MAX_LOCKDEP_CHAINS: 65536 [ 25.944349] ... CHAINHASH_SIZE: 32768 [ 25.944390] memory used by lock dependency info: 8159 kB [ 25.944437] per task-struct memory footprint: 1920 bytes [ 25.944483] [ 25.944516] | Locking API testsuite: [ 25.944550] [ 25.944615] | spin |wlock |rlock |mutex | wsem | rsem | [ 25.944678] -- [ 25.944780] A-A deadlock: ok | ok | ok | ok | ok | ok | [ 26.011037] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [ 26.077813] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [ 26.144938] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [ 26.212077] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [ 26.279658] A-B-C-D-B-D-D-A deadlock:
Re: [RFC] COLO Proxy Module
On 06/30/2015 05:19 PM, Patrick McHardy wrote: On 30.06, Li Zhijian wrote: |ping... and i have another question: can i add a new |||nf_ct_ext_id simply without touching the exiting kernel code?| No, the kernel needs to know the highest extension ID in order to allocate space for the offsets. So if we want to add a new extension, we should post the module too. Is it right? Thanks Wen Congyang in order to support COLO-Proxy, i need a extra nf_ct_ext_id called NF_CT_EXT_COLO to store some message of COLO related. This message can help COLO-Proxy to buffer packet and compare packet for each connection. Thanks Li Zhijian Cheers, Patrick -- To unsubscribe from this list: send the line unsubscribe netfilter-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnetlink: Actually use the policy for the IFLA_VF_INFO
Hi Jason, On 07/01/2015 12:52 AM, Jason Gunthorpe wrote: It turns out the policy was defined but never actually checked, so lets check it. Fixes: ebc08a6f47ee (rtnetlink: Add VF config code to rtnetlink) I would argue that the actual commit would be ... Fixes: c02db8c6290b (rtnetlink: make SR-IOV VF interface symmetric) ... since in ebc08a6f47ee, these members were part of ifla_policy[] which has been validated (if we ignore the fact that it was NLA_BINARY). So, commit c02db8c6290b moved it into a nested attribute (IFLA_VF_INFO) where we indeed don't do further validation. Imho, we should pass the parsed attribute table from nla_parse_nested() down into do_setvfinfo(), something like the below; I can give it a test run on my ixgbe. Cheers, Daniel net/core/rtnetlink.c | 184 ++- 1 file changed, 94 insertions(+), 90 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 01ced4a..7d63551 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1328,10 +1328,6 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { [IFLA_INFO_SLAVE_DATA] = { .type = NLA_NESTED }, }; -static const struct nla_policy ifla_vfinfo_policy[IFLA_VF_INFO_MAX+1] = { - [IFLA_VF_INFO] = { .type = NLA_NESTED }, -}; - static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = { [IFLA_VF_MAC] = { .len = sizeof(struct ifla_vf_mac) }, [IFLA_VF_VLAN] = { .len = sizeof(struct ifla_vf_vlan) }, @@ -1488,96 +1484,98 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[]) return 0; } -static int do_setvfinfo(struct net_device *dev, struct nlattr *attr) +static int do_setvfinfo(struct net_device *dev, struct nlattr **tb) { - int rem, err = -EINVAL; - struct nlattr *vf; const struct net_device_ops *ops = dev-netdev_ops; + int err = -EINVAL; - nla_for_each_nested(vf, attr, rem) { - switch (nla_type(vf)) { - case IFLA_VF_MAC: { - struct ifla_vf_mac *ivm; - ivm = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_mac) - err = ops-ndo_set_vf_mac(dev, ivm-vf, - ivm-mac); - break; - } - case IFLA_VF_VLAN: { - struct ifla_vf_vlan *ivv; - ivv = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_vlan) - err = ops-ndo_set_vf_vlan(dev, ivv-vf, - ivv-vlan, - ivv-qos); - break; - } - case IFLA_VF_TX_RATE: { - struct ifla_vf_tx_rate *ivt; - struct ifla_vf_info ivf; - ivt = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_get_vf_config) - err = ops-ndo_get_vf_config(dev, ivt-vf, -ivf); - if (err) - break; - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_rate) - err = ops-ndo_set_vf_rate(dev, ivt-vf, - ivf.min_tx_rate, - ivt-rate); - break; - } - case IFLA_VF_RATE: { - struct ifla_vf_rate *ivt; - ivt = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_rate) - err = ops-ndo_set_vf_rate(dev, ivt-vf, - ivt-min_tx_rate, - ivt-max_tx_rate); - break; - } - case IFLA_VF_SPOOFCHK: { - struct ifla_vf_spoofchk *ivs; - ivs = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_spoofchk) - err = ops-ndo_set_vf_spoofchk(dev, ivs-vf, - ivs-setting); - break; - } - case IFLA_VF_LINK_STATE: { - struct ifla_vf_link_state *ivl; - ivl = nla_data(vf); - err = -EOPNOTSUPP; - if (ops-ndo_set_vf_link_state) - err =
Re: [PATCH] ax88179_178a: add reset functionality in reset_resume
Hello David, I configured my email client and below patch does not have indentation issue of converting tabs into spaces. I hope it should be accepted. Thanks, Vivek --- Original Message --- Sender : Vivek Kumar Bhagatvivek.bha...@samsung.com Chief Engineer/SRI-Delhi-System S/W 1 Team/Samsung Electronics Date : Jul 01, 2015 09:44 (GMT+05:30) Title : [PATCH] ax88179_178a: add reset functionality in reset_resume Without reset functionality in reset_resume, iperf connection does not establish after suspend/resume however ping works at the same time. iperf connection fails by giving checksum error in tcpdump. reset function inside reset_resume solves above bug. We have verified it on ASIX based ST Lab, Cadyce dongle. Signed-off-by: Vivek Kumar Bhagat Signed-off-by: Praveen Kumar --- drivers/net/usb/ax88179_178a.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index e6338c1..00928c0 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1630,6 +1630,18 @@ static int ax88179_stop(struct usbnet *dev) return 0; } +static int ax88179_reset_resume(struct usb_interface *intf) +{ + struct usbnet *dev = usb_get_intfdata(intf); + int ret; + + ret = ax88179_reset(dev); + if (ret 0) + return ret; + + return ax88179_resume(intf); +} + static const struct driver_info ax88179_info = { .description = ASIX AX88179 USB 3.0 Gigabit Ethernet, .bind = ax88179_bind, @@ -1744,7 +1756,7 @@ static struct usb_driver ax88179_178a_driver = { .probe = usbnet_probe, .suspend = ax88179_suspend, .resume = ax88179_resume, - .reset_resume = ax88179_resume, + .reset_resume = ax88179_reset_resume, .disconnect = usbnet_disconnect, .supports_autosuspend = 1, .disable_hub_initiated_lpm = 1, -- 1.7.9.5N�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: [RFC] virtio_net: Adding tx_timeout function.
On Wed, Jun 24, 2015 at 10:31:09PM -0300, Julio Faracco wrote: 2015-06-24 3:10 GMT-03:00 Michael S. Tsirkin m...@redhat.com: On Tue, Jun 23, 2015 at 10:44:29PM -0300, Julio Faracco wrote: virtio_net paravirtualized driver does not have a tx_timeout() function to guarantee that the driver will recover properly after receiving a timeout during a transmission of a packet. This patch add this feature and throw a timeout exception after 5 HZ. Considering some tests, this is the best time to use here. Signed-off-by: Julio Faracco jcfara...@gmail.com Cc: Jason Wang jasow...@redhat.com Looks like a bunch of locks and flushes are missing in this patch. IMHO that's just too painful with current hardware. IMO the right thing to do here is to add ability to reset specific queues to hardware. I agree, Michael. This model is the default one resetting the device due to transmission timeout. To have a better performance, only some queues must be reset. It's not a question of performance. You would need to write a bunch of code anyway. Why not do it in the hypervisor so guest can simply write into a register and reset a ring? BTW now that I think about it, this requires Jason's patches that introduce the tx interrupt, otherwise packet will timeout simply because no packets are sent. --- drivers/net/virtio_net.c | 69 +- 1 file changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 63c7810..75ac45c 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -135,6 +135,9 @@ struct virtnet_info { /* Work struct for config space updates */ struct work_struct config_work; + /* Work struct for resetting the virtio-net driver. */ + struct work_struct reset_task; + /* Does the affinity hint is set for virtqueues? */ bool affinity_hint_set; @@ -1394,6 +1397,18 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu) return 0; } +static void virtnet_tx_timeout(struct net_device *dev) +{ + struct virtnet_info *vi = netdev_priv(dev); + + dev_warn(dev-dev, TX Timeout exception with latency: %ld\n, + jiffies - dev_trans_start(dev)); + + schedule_work(vi-reset_task); What if after this triggers user does something to the device (e.g. attempts to remove it)? Or if a packet is transmitted or used? At some point, this work must be canceled. Yes, you are right. Specially, when the driver is being removed. +} + +static void virtnet_reset_task(struct work_struct *work); + static const struct net_device_ops virtnet_netdev = { .ndo_open= virtnet_open, .ndo_stop= virtnet_close, @@ -1405,6 +1420,7 @@ static const struct net_device_ops virtnet_netdev = { .ndo_get_stats64 = virtnet_stats, .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, + .ndo_tx_timeout = virtnet_tx_timeout, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = virtnet_netpoll, #endif @@ -1750,6 +1766,7 @@ static int virtnet_probe(struct virtio_device *vdev) dev-netdev_ops = virtnet_netdev; dev-features = NETIF_F_HIGHDMA; + dev-watchdog_timeo = 5 * HZ; dev-ethtool_ops = virtnet_ethtool_ops; SET_NETDEV_DEV(dev, vdev-dev); @@ -1811,6 +1828,7 @@ static int virtnet_probe(struct virtio_device *vdev) } INIT_WORK(vi-config_work, virtnet_config_changed_work); + INIT_WORK(vi-reset_task, virtnet_reset_task); /* If we can receive ANY GSO packets, we must allocate large ones. */ if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || @@ -1891,7 +1909,7 @@ static int virtnet_probe(struct virtio_device *vdev) netif_carrier_on(dev); } - pr_debug(virtnet: registered device %s with %d RX and TX vq's\n, + pr_debug(virtio_net: registered device %s with %d RX and TX vq's\n, dev-name, max_queue_pairs); return 0; @@ -2001,6 +2019,55 @@ static int virtnet_restore(struct virtio_device *vdev) } #endif +static void virtnet_reset_task(struct work_struct *work) +{ + struct virtnet_info *vi = + container_of(work, struct virtnet_info, reset_task); + struct net_device *dev = vi-dev; + struct virtio_device *vdev = vi-vdev; + int err, i; + + flush_work(vi-config_work); + + netif_device_detach(vi-dev); + cancel_delayed_work_sync(vi-refill); + + if (netif_running(vi-dev)) { + for (i = 0; i vi-max_queue_pairs; i++) { + napi_disable(vi-rq[i].napi); +
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote: Signed-off-by: Nicolae Rosia nicolae.ro...@certsign.ro --- drivers/net/ethernet/cadence/macb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index caeb395..dbb5160 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev) while (lp-rx_ring[lp-rx_tail].addr MACB_BIT(RX_USED)) { p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl); - skb = netdev_alloc_skb(dev, pktlen + 2); + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN); if (skb) { - skb_reserve(skb, 2); + skb_reserve(skb, NET_IP_ALIGN); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-protocol = eth_type_trans(skb, dev); Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
macb: zynq: why is SG disabled?
Hello, After reading the GEM part of Zynq7000 Technical Reference Manual [0], I think that SG should be supported. Is there a reason why SG is disabled in macb for Zynq? Best regards, Nicolae Rosia -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, Jul 1, 2015 at 6:33 PM, Eric Dumazet eric.duma...@gmail.com wrote: [...] This only matters in terms of few nano seconds per packet, so for 10Gb+ NIC anyway. Absolute noise for most NIC. I'll give it a try and benchmark. I achieved a huge speedup by moving TX into napi [0], but my hardware doesn't support multiple TX queues and I can't test that situation. Yes, but main question is : Do you have the hardware to test your changes ? Yes, I have a Xilinx ZC706 board with a Zynq7 XC7Z045 processor [1] [0] https://patchwork.ozlabs.org/patch/488949/ [1] http://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4.2-rc1] printk: make extended printk support conditional on netconsole
On Mon 2015-06-29 19:31:40, Tejun Heo wrote: 6fe29354befe (printk: implement support for extended console drivers) implemented extended printk support for extended netconsole. The code added was miniscule but it added static 8k buffer unconditionally unnecessarily bloating the kernel for cases where extended netconsole is not used. This patch introduces CONFIG_PRINTK_CON_EXTENDED which is selected by CONFIG_NETCONSOLE. If the config option is not set, extended printk support is compiled out along with the static buffer. Verified 8k reduction in vmlinux bss when !CONFIG_NETCONSOLE. Signed-off-by: Tejun Heo t...@kernel.org Reported-and-suggested-by: Geert Uytterhoeven ge...@linux-m68k.org --- Linus, Andrew. This removes an unnecessary 8k bss bloat introduced during v4.2-rc1 merge window on certain configs. The original patch was routed through -mm. How should this be routed? Thanks. drivers/net/Kconfig|1 + init/Kconfig |3 +++ kernel/printk/printk.c | 33 + 3 files changed, 33 insertions(+), 4 deletions(-) [...] --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c [...] @@ -2561,9 +2584,11 @@ void register_console(struct console *ne console_drivers-next = newcon; } - if (newcon-flags CON_EXTENDED) - if (!nr_ext_console_drivers++) + if (newcon-flags CON_EXTENDED) { + if (!nr_ext_console_drivers) pr_info(printk: continuation disabled due to ext consoles, expect more fragments in /dev/kmsg\n); + inc_nr_ext_console_drivers(); We should handle also the situation when CON_EXTENDED is set and CONFIG_PRINTK_CON_EXTENDED is not set by mistake. Otherwise, we will not increment nr_ext_console_drivers here = ext_text will not be filled in console_unlock() = call_console_drivers() will print nothing for the CON_EXTENDED console. At least, I would print an error here. Something like. #ifndef CONFIG_PRINTK_CON_EXTENDED pr_err(The registered extended console will print nothing because the kernel is not compiled with PRINTK_CON_EXTENDED\n); #endif I wonder if there is a good identification of the console that can be printed. Otherwise, it looks fine to me. Best Regards, Petr -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, 2015-07-01 at 17:29 +0300, Nicolae Rosia wrote: On 07/01/2015 04:44 PM, Eric Dumazet wrote: I really doubt this adapter can process millions of packets per second ? I was suggesting this since I was taking into consideration the comment from skbuff.c, we can save several CPU cycles by avoiding having to disable and re-enable IRQs. Is there a downside to this? This only matters in terms of few nano seconds per packet, so for 10Gb+ NIC anyway. Absolute noise for most NIC. I would rather enable GRO, it would be more useful. I had no idea what GRO is, so I have read about it [0] and looked at a couple of drivers which use it. They all seem to replace netif_receive_skb with napi_gro_receive and when there are no more packets in napi_pool they call napi_gro_flush. Is it that simple? Yes, but main question is : Do you have the hardware to test your changes ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Wed, Jul 1, 2015 at 12:02 PM, Linus Torvalds torva...@linux-foundation.org wrote: Doing a unconditional byte swap is faster and simpler than the crazy conditionals. Unconditional endianness not only makes for simpler and faster code, it also ends up being easier to debug and add things like type annotations for sparse. Linus -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote: virtio/vhost: cross endian support Ugh. Does this really have to be dynamic? Can't virtio do the sane thing, and just use a _fixed_ endianness? Doing a unconditional byte swap is faster and simpler than the crazy conditionals. That's true regardless of endianness, but gets to be even more so if the fixed endianness is little-endian, since BE is not-so-slowly fading from the world. Linus -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree are going to reach into the inet_timewait_sock and mess with fields that don't exist. Signed-off-by: Alex Gartrell agartr...@fb.com --- net/core/sock.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/core/sock.c b/net/core/sock.c index 1e1fe9a..b37328f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1620,6 +1620,9 @@ void sock_wfree(struct sk_buff *skb) struct sock *sk = skb-sk; unsigned int len = skb-truesize; + if (sk-sk_state == TCP_TIME_WAIT) + return; + if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) { /* * Keep a reference on sk_wmem_alloc, this will be released @@ -1665,6 +1668,9 @@ void sock_rfree(struct sk_buff *skb) struct sock *sk = skb-sk; unsigned int len = skb-truesize; + if (sk-sk_state == TCP_TIME_WAIT) + return; + atomic_sub(len, sk-sk_rmem_alloc); sk_mem_uncharge(sk, len); } -- Alex Gartrell agartr...@fb.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott seb...@linux.vnet.ibm.com wrote: OK, using this patch it worked: yep, I forgot to recap err to zero. By it worked you mean the VF is live and kicking, all functionality you had before the 4.2 merge window is back again? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
From: Alex Gartrell agartr...@fb.com Date: Wed, 1 Jul 2015 13:13:09 -0700 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree are going to reach into the inet_timewait_sock and mess with fields that don't exist. Signed-off-by: Alex Gartrell agartr...@fb.com If we're forwarding, we should not find a local socket, period. We should only match sockets for locally destined packets. So I'd say that the state in which you say this can occur is illegal. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] cdc_ncm: Add support for moving NDP to end of NCM frame
NCM specs are not actually mandating a specific position in the frame for the NDP (Network Datagram Pointer). However, some Huawei devices will ignore our aggregates if it is not placed after the datagrams it points to. Add support for doing just this, in a per-device configurable way. While at it, update NCM subdrivers, disabling this functionality in all of them, except in huawei_cdc_ncm where it is enabled instead. We aren't making any distinction between different Huawei NCM devices, based on what the vendor driver does. Standard NCM devices are left unaffected: if they are compliant, they should be always usable, still stay on the safe side. This change has been tested and working with a Huawei E3131 device (which works regardless of NDP position) and an E3372 device (which mandates NDP to be after indexed datagrams). V1-V2: - corrected wrong NDP acronym definition - fixed possible NULL pointer dereference - patch cleanup - rewrote description and commit subject to be more clear Signed-off-by: Enrico Mioso mrkiko...@gmail.com --- drivers/net/usb/cdc_mbim.c | 2 +- drivers/net/usb/cdc_ncm.c| 50 drivers/net/usb/huawei_cdc_ncm.c | 7 -- include/linux/usb/cdc_ncm.h | 7 +- 4 files changed, 57 insertions(+), 9 deletions(-) diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c index e4b7a47..efc18e0 100644 --- a/drivers/net/usb/cdc_mbim.c +++ b/drivers/net/usb/cdc_mbim.c @@ -158,7 +158,7 @@ static int cdc_mbim_bind(struct usbnet *dev, struct usb_interface *intf) if (!cdc_ncm_comm_intf_is_mbim(intf-cur_altsetting)) goto err; - ret = cdc_ncm_bind_common(dev, intf, data_altsetting); + ret = cdc_ncm_bind_common(dev, intf, data_altsetting, 0); if (ret) goto err; diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 8067b8f..4a27673 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -684,10 +684,12 @@ static void cdc_ncm_free(struct cdc_ncm_ctx *ctx) ctx-tx_curr_skb = NULL; } + kfree(ctx-delayed_ndp16); + kfree(ctx); } -int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_altsetting) +int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_altsetting, int drvflags) { const struct usb_cdc_union_desc *union_desc = NULL; struct cdc_ncm_ctx *ctx; @@ -855,6 +857,17 @@ advance: /* finish setting up the device specific data */ cdc_ncm_setup(dev); + /* Device-specific flags */ + ctx-drvflags = drvflags; + + /* Allocate the delayed NDP if needed. */ + if (ctx-drvflags CDC_NCM_FLAG_NDP_TO_END) { + ctx-delayed_ndp16 = kzalloc(ctx-max_ndp_size, GFP_KERNEL); + if (!ctx-delayed_ndp16) + goto error2; + dev_info(intf-dev, NDP will be placed at end of frame for this device.); + } + /* override ethtool_ops */ dev-net-ethtool_ops = cdc_ncm_ethtool_ops; @@ -954,8 +967,11 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf) if (cdc_ncm_select_altsetting(intf) != CDC_NCM_COMM_ALTSETTING_NCM) return -ENODEV; - /* The NCM data altsetting is fixed */ - ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM); + /* The NCM data altsetting is fixed, so we hard-coded it. +* Additionally, generic NCM devices are assumed to accept arbitrarily +* placed NDP. +*/ + ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM, 0); /* * We should get an event when network connection is connected or @@ -986,6 +1002,14 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_ struct usb_cdc_ncm_nth16 *nth16 = (void *)skb-data; size_t ndpoffset = le16_to_cpu(nth16-wNdpIndex); + /* If NDP should be moved to the end of the NCM package, we can't follow the + * NTH16 header as we would normally do. NDP isn't written to the SKB yet, and + * the wNdpIndex field in the header is actually not consistent with reality. It will be later. + */ + if (ctx-drvflags CDC_NCM_FLAG_NDP_TO_END) + if (ctx-delayed_ndp16-dwSignature == sign) + return ctx-delayed_ndp16; + /* follow the chain of NDPs, looking for a match */ while (ndpoffset) { ndp16 = (struct usb_cdc_ncm_ndp16 *)(skb-data + ndpoffset); @@ -995,7 +1019,8 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_ } /* align new NDP */ - cdc_ncm_align_tail(skb, ctx-tx_ndp_modulus, 0, ctx-tx_max); + if (!(ctx-drvflags CDC_NCM_FLAG_NDP_TO_END)) + cdc_ncm_align_tail(skb, ctx-tx_ndp_modulus, 0, ctx-tx_max); /* verify that there
[PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss
If the retransmission in CA_Loss is lost again, we should not continue to slow start or raise cwnd in congestion avoidance mode. Instead we should enter fast recovery and use PRR to reduce cwnd, following the principle in RFC5681: ... or the loss of a retransmission, should be taken as two indications of congestion and, therefore, cwnd (and ssthresh) MUST be lowered twice in this case. This is especially important to reduce loss when the CA_Loss state was caused by a traffic policer dropping the entire inflight. The CA_Loss state has a problem where a loss of L packets causes the sender to send a burst of L packets. So a policer that's dropping most packets in a given RTT can cause a huge retransmit storm. By contrast, PRR includes logic to bound the number of outbound packets that result from a given ACK. So switching to CA_Recovery on lost retransmits in CA_Loss avoids this retransmit storm problem when in CA_Loss. Signed-off-by: Yuchung Cheng ych...@google.com Signed-off-by: Nandita Dukkipati nandi...@google.com Signed-off-by: Neal Cardwell ncardw...@google.com --- net/ipv4/tcp_input.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 684f095..923e0e5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -109,6 +109,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; #define FLAG_SYN_ACKED 0x10 /* This ACK acknowledged SYN. */ #define FLAG_DATA_SACKED 0x20 /* New SACK. */ #define FLAG_ECE 0x40 /* ECE in this ACK */ +#define FLAG_LOST_RETRANS 0x80 /* This ACK marks some retransmission lost */ #define FLAG_SLOWPATH 0x100 /* Do not skip RFC checks for window update.*/ #define FLAG_ORIG_SACK_ACKED 0x200 /* Never retransmitted data are (s)acked */ #define FLAG_SND_UNA_ADVANCED 0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */ @@ -1037,7 +1038,7 @@ static bool tcp_is_sackblock_valid(struct tcp_sock *tp, bool is_dsack, * highest SACK block). Also calculate the lowest snd_nxt among the remaining * retransmitted skbs to avoid some costly processing per ACKs. */ -static void tcp_mark_lost_retrans(struct sock *sk) +static void tcp_mark_lost_retrans(struct sock *sk, int *flag) { const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -1078,7 +1079,7 @@ static void tcp_mark_lost_retrans(struct sock *sk) if (after(received_upto, ack_seq)) { TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS; tp-retrans_out -= tcp_skb_pcount(skb); - + *flag |= FLAG_LOST_RETRANS; tcp_skb_mark_lost_uncond_verify(tp, skb); NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT); } else { @@ -1818,7 +1819,7 @@ advance_sp: ((inet_csk(sk)-icsk_ca_state != TCP_CA_Loss) || tp-undo_marker)) tcp_update_reordering(sk, tp-fackets_out - state-reord, 0); - tcp_mark_lost_retrans(sk); + tcp_mark_lost_retrans(sk, state-flag); tcp_verify_left_out(tp); out: @@ -2676,7 +2677,7 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack) tp-prior_ssthresh = 0; tcp_init_undo(tp); - if (inet_csk(sk)-icsk_ca_state TCP_CA_CWR) { + if (!tcp_in_cwnd_reduction(sk)) { if (!ece_ack) tp-prior_ssthresh = tcp_current_ssthresh(sk); tcp_init_cwnd_reduction(sk); @@ -2852,9 +2853,10 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked, break; case TCP_CA_Loss: tcp_process_loss(sk, flag, is_dupack); - if (icsk-icsk_ca_state != TCP_CA_Open) + if (icsk-icsk_ca_state != TCP_CA_Open + !(flag FLAG_LOST_RETRANS)) return; - /* Fall through to processing in Open state. */ + /* Change state if cwnd is undone or retransmits are lost */ default: if (tcp_is_reno(tp)) { if (flag FLAG_SND_UNA_ADVANCED) -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally
PRR slow start is often too aggressive especially when drops are caused by traffic policers. The policers mainly use token bucket to enforce the rate so sending (twice) faster than the delivery rate causes excessive drops. This patch changes PRR to the conservative reduction bound (CRB) mode in RFC 6937 by default. CRB follows the packet conservation rule to send at most the delivery rate by default. But if many packets are lost and the pipe is empty, CRB may take N round trips to repair N losses. We conditionally turn on slow start mode if all these conditions are made to speed up the recovery: 1) on the second round or later in recovery 2) retransmission sent in the previous round is delivered on this ACK 3) no retransmission is marked lost on this ACK By using packet conservation by default, this change reduces the loss retransmits signicantly on networks that deploy traffic policers, up to 20% reduction of overall loss rate. Signed-off-by: Yuchung Cheng ych...@google.com Signed-off-by: Nandita Dukkipati nandi...@google.com Signed-off-by: Neal Cardwell ncardw...@google.com --- net/ipv4/tcp_input.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 923e0e5..ad1482d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2476,15 +2476,14 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo) return false; } -/* The cwnd reduction in CWR and Recovery use the PRR algorithm - * https://datatracker.ietf.org/doc/draft-ietf-tcpm-proportional-rate-reduction/ +/* The cwnd reduction in CWR and Recovery uses the PRR algorithm in RFC 6937. * It computes the number of packets to send (sndcnt) based on packets newly * delivered: * 1) If the packets in flight is larger than ssthresh, PRR spreads the * cwnd reductions across a full RTT. - * 2) If packets in flight is lower than ssthresh (such as due to excess - * losses and/or application stalls), do not perform any further cwnd - * reductions, but instead slow start up to ssthresh. + * 2) Otherwise PRR uses packet conservation to send as much as delivered. + * But when the retransmits are acked without further losses, PRR + * slow starts cwnd up to ssthresh to speed up the recovery. */ static void tcp_init_cwnd_reduction(struct sock *sk) { @@ -2501,7 +2500,7 @@ static void tcp_init_cwnd_reduction(struct sock *sk) } static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked, - int fast_rexmit) + int fast_rexmit, int flag) { struct tcp_sock *tp = tcp_sk(sk); int sndcnt = 0; @@ -2510,16 +2509,18 @@ static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked, (tp-packets_out - tp-sacked_out); tp-prr_delivered += newly_acked_sacked; - if (tcp_packets_in_flight(tp) tp-snd_ssthresh) { + if (delta 0) { u64 dividend = (u64)tp-snd_ssthresh * tp-prr_delivered + tp-prior_cwnd - 1; sndcnt = div_u64(dividend, tp-prior_cwnd) - tp-prr_out; - } else { + } else if ((flag FLAG_RETRANS_DATA_ACKED) + !(flag FLAG_LOST_RETRANS)) { sndcnt = min_t(int, delta, max_t(int, tp-prr_delivered - tp-prr_out, newly_acked_sacked) + 1); + } else { + sndcnt = min(delta, newly_acked_sacked); } - sndcnt = max(sndcnt, (fast_rexmit ? 1 : 0)); tp-snd_cwnd = tcp_packets_in_flight(tp) + sndcnt; } @@ -2580,7 +2581,7 @@ static void tcp_try_to_open(struct sock *sk, int flag, const int prior_unsacked) if (inet_csk(sk)-icsk_ca_state != TCP_CA_CWR) { tcp_try_keep_open(sk); } else { - tcp_cwnd_reduction(sk, prior_unsacked, 0); + tcp_cwnd_reduction(sk, prior_unsacked, 0, flag); } } @@ -2737,7 +2738,7 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack) /* Undo during fast recovery after partial ACK. */ static bool tcp_try_undo_partial(struct sock *sk, const int acked, -const int prior_unsacked) +const int prior_unsacked, int flag) { struct tcp_sock *tp = tcp_sk(sk); @@ -2753,7 +2754,7 @@ static bool tcp_try_undo_partial(struct sock *sk, const int acked, * mark more packets lost or retransmit more. */ if (tp-retrans_out) { - tcp_cwnd_reduction(sk, prior_unsacked, 0); + tcp_cwnd_reduction(sk, prior_unsacked, 0, flag); return true; } @@ -2840,7 +2841,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked, if
[PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery
This patch series reduces lost retransmits in recovery, in particular when dealing with traffic policers. The main problem is that slow start in recovery under policing can cause massive lost and retransmit storms: any excess sending rate turns into drops. The solution is to avoid doing slow start when lost retransmit is detected and use packet conservation instead. On networks with traffic policers the patches have lowered the TCP loss rates by ~20% from Google servers without latency regressions. Yuchung Cheng (2): tcp: reduce cwnd if retransmit is lost in CA_Loss tcp: PRR uses CRB mode by default and SS mode conditionally net/ipv4/tcp_input.c | 43 +++ 1 file changed, 23 insertions(+), 20 deletions(-) -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote: From: Alex Gartrell agartr...@fb.com Date: Wed, 1 Jul 2015 13:13:09 -0700 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree are going to reach into the inet_timewait_sock and mess with fields that don't exist. Signed-off-by: Alex Gartrell agartr...@fb.com If we're forwarding, we should not find a local socket, period. We should only match sockets for locally destined packets. So I'd say that the state in which you say this can occur is illegal. Right, this patch is totally buggy. A socket cannot change state to TCP_TIMEWAIT. A new object is allocated and old one is removed from ehash, then freed (rcu rules being applied) Also sock_wfree() has nothing to do with early demux. It is for output path skbs only. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ax88179_178a: add reset functionality in reset_resume
From: Vivek Kumar Bhagat vivek.bha...@samsung.com Date: Wed, 01 Jul 2015 11:33:58 + (GMT) I configured my email client and below patch does not have indentation issue of converting tabs into spaces. I hope it should be accepted. Patchwork is not recognizing your postings as a patch submission, therefore it's not ending up in my queue of patches to handle. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiring up direct socket calls on x86_32 Linux?
Andy Lutomirski l...@amacapital.net writes: Hi all- sys_socketcall sucks. If nothing else, it's impossible to filter with seccomp. Should we wire up the real socket calls so that user code can (very slowly) start migrating? I think the list is: - socket - bind - connect - listen - accept4 - getsockname - getpeername - socketpair - send - sendto - sendmsg - recv - recvfrom - recvmsg - shutdown - setsockopt I guess you might want to follow the patch Raji sent today [1]. Her patch doesn't have all the syscalls you mentioned here, but has others too. She will work to get a generic implementation for these functions. [1] http://patchwork.sourceware.org/patch/7438/ -- Tulio Magno -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, 2015-07-01 at 18:53 +0300, Nicolae Rosia wrote: On Wed, Jul 1, 2015 at 6:33 PM, Eric Dumazet eric.duma...@gmail.com wrote: [...] This only matters in terms of few nano seconds per packet, so for 10Gb+ NIC anyway. Absolute noise for most NIC. I'll give it a try and benchmark. I achieved a huge speedup by moving TX into napi [0], but my hardware doesn't support multiple TX queues and I can't test that situation. Yes, but main question is : Do you have the hardware to test your changes ? Yes, I have a Xilinx ZC706 board with a Zynq7 XC7Z045 processor [1] [0] https://patchwork.ozlabs.org/patch/488949/ [1] http://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html OK, then enabling GRO should be quite easy, given driver already has most of it. diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index caeb39561567237261ac0d50befebad666cfbeb3..24a93c769caa5430ca61efe002b458fef7281e99 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -815,7 +815,7 @@ static int gem_rx(struct macb *bp, int budget) skb-data, 32, true); #endif - netif_receive_skb(skb); + napi_gro_receive(bp-napi, skb); } gem_rx_refill(bp); @@ -896,7 +896,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag, bp-stats.rx_bytes += skb-len; netdev_vdbg(bp-dev, received skb of length %u, csum: %08x\n, skb-len, skb-csum); - netif_receive_skb(skb); + napi_gro_receive(bp-napi, skb); return 0; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] enic: fix issues in enic_poll
From: Govindarajulu Varadarajan _gov...@gmx.com Date: Wed, 1 Jul 2015 11:23:54 +0530 Current code checks only rx work done to complete napi. It completely ignores tx work done. If we have only tx packets to clean and no rq work, we always napi complete instead of re-poll. Change this behavior to re-poll until tx work_done + rx work_done is not 0. The existing TX behavior is correct, please do not change it. You should never count TX work against the NAPI poll budget, it is only for RX work. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
iproute 4.1.1
I am putting out iproute2 4.1.1 (ie stable) on Monday. Will include fixes for MPLS and TIPC that are already in latest (master). Is there any other fixes which people think are urgent enough to backport. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ipv6: Make MLD packets to only be processed locally.
Before commit daad151263cf334 (ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input(). MLD packets were only processed locally. After the change, a copy of MLD packet goes through ip6_mr_input, causing MRT6MSG_NOCACHE message to be generated to user space. Make MLD packet only processed locally. Fixes: daad151263cf334 (ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().) Signed-off-by: Hermin Anggawijaya hermin.anggawij...@alliedtelesis.co.nz --- diff --git a/linux/net/ipv6/ip6_input.c.orig b/linux/net/ipv6/ip6_input.c index f2e464e..57990c9 100644 --- a/linux/net/ipv6/ip6_input.c.orig +++ b/linux/net/ipv6/ip6_input.c @@ -331,10 +331,10 @@ int ip6_mc_input(struct sk_buff *skb) if (offset 0) goto out; - if (!ipv6_is_mld(skb, nexthdr, offset)) - goto out; + if (ipv6_is_mld(skb, nexthdr, offset)) + deliver = true; - deliver = true; + goto out; } /* unknown RA - process it normally */ }
Issue with active-backup mode bond and bridge
I find that kernel seems to be not well handled with the combination of bonding and bridge module. I have a physical host with two nics that are bonded together (active backup mode). Each nic is connected to a separate L2 switch. And the two L2 switchs are connected to a L3 switch. If the host only has the bond device, when I manually make the active slave down, bonding will issue one or more gratuitous ARPs on the newly active slave. One gratuitous ARP is issued for the bonding master interface, provided that the interface has at least one IP address configured. However, if there is a bridge named br0 and the bond device joins in the bridge br0, the IP address of the bond moves to the br0 device. First, I make two nics up. But this time, when I again make the active slave down, I can't capture the gratuitous ARP in the bond device with tcpdump. And this can result in the bad connect to the host, because with no ARP packet sended out of the host, the L3 switch may still send the packets from outside to the old L2 switch which connect to the new backup nic. These packets can't get any responses. I read the kernel code. When change the active slave into the specified one, in bond_change_active_slave function, bond will send the NETDEV_NOTIFY_PEERS event: netdev_bonding_change(bond-dev, NETDEV_BONDING_FAILOVER); if (should_notify_peers) netdev_bonding_change(bond-dev, NETDEV_NOTIFY_PEERS); And in inetdev_event function, if event is NETDEV_NOTIFY_PEERS, it will call inetdev_send_gratuitous_arp to send gratuitous ARP. case NETDEV_NOTIFY_PEERS: /* Send gratuitous ARP to notify of link change */ inetdev_send_gratuitous_arp(dev, in_dev); break; But when the bond is in the bridge, the code won't change the dev to the bridge device, and there is no IP address in bond device, so there is no gratuitous ARP. My question is, why the latest kernel(4.1) still does not consider this conditoin ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majordomo@xxx More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] net namespace: dynamically configure new net namespace inherit net config
Hi, David This patch is applied in our linux for a long time. It should work well. Would you like to let me know your advice about this patch? Thanks a lot. Zhu Yanjun On 06/26/2015 05:37 PM, Zhu Yanjun wrote: The new net namespace can inherit from the original net config, or the current net config. As such, a config is needed to decide where the new namespace inherit from. Signed-off-by: Zhu Yanjun yanjun@windriver.com --- init/Kconfig | 9 + net/ipv4/devinet.c | 13 + 2 files changed, 22 insertions(+) diff --git a/init/Kconfig b/init/Kconfig index dc24dec..fab8c41 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1212,6 +1212,15 @@ config NET_NS Allow user space to create what appear to be multiple instances of the network stack. +config NET_NS_INHERIT_ORIGINAL + bool New network namespace inherits from original net config + depends on NET_NS + default n + help + Allow new network namespace inherit from original net config. + If no, the new network namespace inherits from the current net + config including the modified net config. + endif # NAMESPACES config SCHED_AUTOGROUP diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 419d23c..cf635e4 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2271,6 +2271,7 @@ static __net_init int devinet_init_net(struct net *net) #endif err = -ENOMEM; +#ifndef CONFIG_NET_NS_INHERIT_ORIGINAL all = ipv4_devconf; dflt = ipv4_devconf_dflt; @@ -2282,6 +2283,15 @@ static __net_init int devinet_init_net(struct net *net) dflt = kmemdup(dflt, sizeof(ipv4_devconf_dflt), GFP_KERNEL); if (!dflt) goto err_alloc_dflt; +#else + all = kmemdup(ipv4_devconf, sizeof(ipv4_devconf), GFP_KERNEL); + if (!all) + goto err_alloc_all; + + dflt = kmemdup(ipv4_devconf_dflt, sizeof(ipv4_devconf_dflt), GFP_KERNEL); + if (!dflt) + goto err_alloc_dflt; +#endif #ifdef CONFIG_SYSCTL tbl = kmemdup(tbl, sizeof(ctl_forward_entry), GFP_KERNEL); @@ -2292,7 +2302,10 @@ static __net_init int devinet_init_net(struct net *net) tbl[0].extra1 = all; tbl[0].extra2 = net; #endif + +#ifndef CONFIG_NET_NS_INHERIT_ORIGINAL } +#endif #ifdef CONFIG_SYSCTL err = __devinet_sysctl_register(net, all, all); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: macb: zynq: why is SG disabled?
Hi Nicolae and Cyrille, SG feature was not tested for Zynq using macb driver but tested it using the emacps Driver in Xilinx tree (this driver is deprecated recently) We will test and enable this feature in driver for Zynq. Regards, Punnaiah -Original Message- From: Cyrille Pitchen [mailto:cyrille.pitc...@atmel.com] Sent: Wednesday, July 01, 2015 10:34 PM To: Nicolae Rosia; Michal Simek; Punnaiah Choudary Kalluri; netdev@vger.kernel.org; Nicolas Ferre; linux-arm-ker...@lists.infradead.org Subject: Re: macb: zynq: why is SG disabled? Le 01/07/2015 17:14, Nicolae Rosia a écrit : Hello, After reading the GEM part of Zynq7000 Technical Reference Manual [0], I think that SG should be supported. Is there a reason why SG is disabled in macb for Zynq? Best regards, Nicolae Rosia Hi Nicolae, when the scatter-gather patch was introduced, the feature was enabled only on tested boards to avoid regressions on other boards. So SG is enabled on sama5d4x and sama5d2x SoCs. SG is disabled on purpose on sama5d3x. For Zynq, I think the feature is still disabled just because it has never been tested. Best regards, Cyrille -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Wed, 2015-07-01 at 10:06 -0700, Joe Perches wrote: On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote: On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote: diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c [] @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev) while (lp-rx_ring[lp-rx_tail].addr MACB_BIT(RX_USED)) { p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl); - skb = netdev_alloc_skb(dev, pktlen + 2); + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN); if (skb) { - skb_reserve(skb, 2); + skb_reserve(skb, NET_IP_ALIGN); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-protocol = eth_type_trans(skb, dev); Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() It seems there are ~50 of these in the kernel tree that could be converted. Make sure the 2 is really NET_IP_ALIGN Some hardwares need 2, even if NET_IP_ALIGN is 0 (on x86 arches for example) I would rather not touch this without testing the change on real hardware. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
On Thu, 2015-07-02 at 01:26 +0200, Eric Dumazet wrote: On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell alexgartr...@gmail.com wrote: On Wednesday, July 1, 2015, Eric Dumazet eduma...@google.com wrote: On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote: From: Alex Gartrell agartr...@fb.com Date: Wed, 1 Jul 2015 13:13:09 -0700 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree are going to reach into the inet_timewait_sock and mess with fields that don't exist. Signed-off-by: Alex Gartrell agartr...@fb.com If we're forwarding, we should not find a local socket, period. A socket cannot change state to TCP_TIMEWAIT. A new object is allocated and old one is removed from ehash, then freed (rcu rules being applied) Also sock_wfree() has nothing to do with early demux. It is for output path skbs only. Alright I kind of cheated and didn't include full context here. The problem is that within ipvs we are getting packets that have been early demuxed and associated with time wait sockets which we then wish to forward immediately (ip_vs_xmit.c). Under normal circumstances it would never be associated with any sk at all, but it is because of early demux, so we want to drop the relationship by calling skb_orphan. This invokes the destructor which lands us there. So that is how we reach this illegal treating a twsk like an sk state. If there is a better way to drop the association than skb_orphan I will use it. I think you are mistaken Alex. socket early demux cannot possibly set skb-destructor to sock_rfree() If skb-destructor is set by early demux, it correctly points to sock_edemux() And this one correctly handles all socket variants. If ipvs is the problem, could you try instead following patch ? Shoot in the dark, as you do not give a lot of details :( diff --git a/include/net/sock.h b/include/net/sock.h index 05a8c1aea251..f77fe9acc7a4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1932,6 +1932,14 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) } } +/* This helper checks if a socket is a full socket, + * ie _not_ a timewait or request socket. + */ +static inline bool sk_fullsock(const struct sock *sk) +{ + return (1 sk-sk_state) ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); +} + /* * Queue a received datagram if it will fit. Stream and sequenced * protocols can't normally use this as they need to fit buffers in @@ -1944,6 +1952,9 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); + if (unlikely(!sk_fullsock(sk)) + return; + skb-sk = sk; skb-destructor = sock_wfree; skb_set_hash_from_sk(skb, sk); @@ -2204,14 +2215,6 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb) return NULL; } -/* This helper checks if a socket is a full socket, - * ie _not_ a timewait or request socket. - */ -static inline bool sk_fullsock(const struct sock *sk) -{ - return (1 sk-sk_state) ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); -} - void sock_enable_timestamp(struct sock *sk, int flag); int sock_get_timestamp(struct sock *, struct timeval __user *); int sock_get_timestampns(struct sock *, struct timespec __user *); diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 5d2b806a862e..ff05ec5a9016 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1161,9 +1161,10 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af) if (unlikely(skb-sk != NULL hooknum == NF_INET_LOCAL_OUT af == AF_INET)) { struct sock *sk = skb-sk; - struct inet_sock *inet = inet_sk(skb-sk); - if (inet sk-sk_family == PF_INET inet-nodefrag) + if (sk_fullsock(sk) + sk-sk_family == PF_INET + inet_sk(sk)-nodefrag) return NF_ACCEPT; } @@ -1640,9 +1641,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af) if (unlikely(skb-sk != NULL hooknum == NF_INET_LOCAL_OUT af == AF_INET)) { struct sock *sk = skb-sk; - struct inet_sock *inet = inet_sk(skb-sk); - if (inet sk-sk_family == PF_INET inet-nodefrag) + if (sk_fullsock(sk) + sk-sk_family == PF_INET + inet_sk(sk)-nodefrag) return NF_ACCEPT; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH] add stealth mode
Add option to disable any reply not related to a listening socket, like RST/ACK for TCP and ICMP Dest-Unreach for UDP. Also disables ICMP replies to echo request and timestamp. The stealth mode can be enabled selectively for a single interface. --- include/linux/inetdevice.h | 1 + include/linux/ipv6.h | 1 + include/uapi/linux/ip.h| 1 + net/ipv4/devinet.c | 1 + net/ipv4/icmp.c| 6 ++ net/ipv4/tcp_ipv4.c| 3 ++- net/ipv4/udp.c | 4 +++- net/ipv6/addrconf.c| 7 +++ net/ipv6/icmp.c| 3 ++- net/ipv6/tcp_ipv6.c| 2 +- net/ipv6/udp.c | 3 ++- 11 files changed, 27 insertions(+), 5 deletions(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index a4328ce..a64c01e 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE) #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY) +#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH) struct in_ifaddr { struct hlist_node hash; diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 82806c6..49494ec 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -53,6 +53,7 @@ struct ipv6_devconf { __s32 ndisc_notify; __s32 suppress_frag_ndisc; __s32 accept_ra_mtu; + __s32 stealth; struct ipv6_stable_secret { bool initialized; struct in6_addr secret; diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h index 08f894d..4acbf99 100644 --- a/include/uapi/linux/ip.h +++ b/include/uapi/linux/ip.h @@ -165,6 +165,7 @@ enum IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN, + IPV4_DEVCONF_STEALTH, __IPV4_DEVCONF_MAX }; diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 7498716..6b9930a 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table { promote_secondaries), DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET, route_localnet), + DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth), }, }; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index f5203fb..2f1b31f 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb) { struct net *net; + if(IN_DEV_STEALTH(skb-dev-ip_ptr)) + return true; + net = dev_net(skb_dst(skb)-dev); if (!net-ipv4.sysctl_icmp_echo_ignore_all) { struct icmp_bxm icmp_param; @@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb) if (skb-len 4) goto out_err; + if(IN_DEV_STEALTH(skb-dev-ip_ptr)) + return true; + /* * Fill in the current time as ms since midnight UT: */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index d7d4c2b..c887d6e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -77,6 +77,7 @@ #include net/busy_poll.h #include linux/inet.h +#include linux/inetdevice.h #include linux/ipv6.h #include linux/stddef.h #include linux/proc_fs.h @@ -1652,7 +1653,7 @@ csum_error: TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS); bad_packet: TCP_INC_STATS_BH(net, TCP_MIB_INERRS); - } else { + } else if(!IN_DEV_STEALTH(skb-dev-ip_ptr)) { tcp_v4_send_reset(NULL, skb); } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604..b3b0dee 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -96,6 +96,7 @@ #include linux/timer.h #include linux/mm.h #include linux/inet.h +#include linux/inetdevice.h #include linux/netdevice.h #include linux/slab.h #include net/tcp_states.h @@ -1823,7 +1824,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, goto csum_error; UDP_INC_STATS_BH(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE); - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); + if(!IN_DEV_STEALTH(skb-dev-ip_ptr)) + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); /* * Hmm. We got an UDP packet to a port to which we diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 21c2c81..b9e44e2 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5585,6 +5585,13 @@ static struct addrconf_sysctl_table .proc_handler = addrconf_sysctl_stable_secret, }, { + .procname = stealth, + .data = ipv6_devconf.stealth, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { /* sentinel */ } }, diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 713d743..94b08ac 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -723,7 +723,8 @@ static int icmpv6_rcv(struct sk_buff *skb) switch (type) { case ICMPV6_ECHO_REQUEST: - icmpv6_echo_reply(skb); + if(!idev-cnf.stealth) + icmpv6_echo_reply(skb); break; case ICMPV6_ECHO_REPLY: diff --git a/net/ipv6/tcp_ipv6.c
Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN
On Thu, 2015-07-02 at 00:13 +0200, Eric Dumazet wrote: On Wed, 2015-07-01 at 10:06 -0700, Joe Perches wrote: On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote: On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote: diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c [] @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev) while (lp-rx_ring[lp-rx_tail].addr MACB_BIT(RX_USED)) { p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ; pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl); - skb = netdev_alloc_skb(dev, pktlen + 2); + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN); if (skb) { - skb_reserve(skb, 2); + skb_reserve(skb, NET_IP_ALIGN); memcpy(skb_put(skb, pktlen), p_recv, pktlen); skb-protocol = eth_type_trans(skb, dev); Then please use netdev_alloc_skb_ip_align(), so that you get rid of skb_reserve() It seems there are ~50 of these in the kernel tree that could be converted. Make sure the 2 is really NET_IP_ALIGN Some hardwares need 2, even if NET_IP_ALIGN is 0 (on x86 arches for example) I would rather not touch this without testing the change on real hardware. Nor I really. Most all of those are in fairly old hardware drivers. I just wanted to point out that more exist. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell alexgartr...@gmail.com wrote: On Wednesday, July 1, 2015, Eric Dumazet eduma...@google.com wrote: On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote: From: Alex Gartrell agartr...@fb.com Date: Wed, 1 Jul 2015 13:13:09 -0700 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree are going to reach into the inet_timewait_sock and mess with fields that don't exist. Signed-off-by: Alex Gartrell agartr...@fb.com If we're forwarding, we should not find a local socket, period. A socket cannot change state to TCP_TIMEWAIT. A new object is allocated and old one is removed from ehash, then freed (rcu rules being applied) Also sock_wfree() has nothing to do with early demux. It is for output path skbs only. Alright I kind of cheated and didn't include full context here. The problem is that within ipvs we are getting packets that have been early demuxed and associated with time wait sockets which we then wish to forward immediately (ip_vs_xmit.c). Under normal circumstances it would never be associated with any sk at all, but it is because of early demux, so we want to drop the relationship by calling skb_orphan. This invokes the destructor which lands us there. So that is how we reach this illegal treating a twsk like an sk state. If there is a better way to drop the association than skb_orphan I will use it. I think you are mistaken Alex. socket early demux cannot possibly set skb-destructor to sock_rfree() If skb-destructor is set by early demux, it correctly points to sock_edemux() And this one correctly handles all socket variants. /* All sockets share common refcount, but have different destructors */ void sock_gen_put(struct sock *sk) { if (!atomic_dec_and_test(sk-sk_refcnt)) return; if (sk-sk_state == TCP_TIME_WAIT) inet_twsk_free(inet_twsk(sk)); else if (sk-sk_state == TCP_NEW_SYN_RECV) reqsk_free(inet_reqsk(sk)); else sk_free(sk); } EXPORT_SYMBOL_GPL(sock_gen_put); void sock_edemux(struct sk_buff *skb) { sock_gen_put(skb-sk); } EXPORT_SYMBOL(sock_edemux); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v1 1/1] drivers: net: xgene: Fix the compilation error error: implicit declaration of function 'acpi_evaluate_integer' in APM X-Gene ethernet driver.
Hi , Any comments on this patch . On Wed, Jun 24, 2015 at 1:51 PM, Suman Tripathi stripa...@apm.com wrote: This patch guards the acpi_evaluate_interger function as it fails the build for NON_ACPI CONFIG. Signed-off-by: Iyappan Subramanian isubraman...@apm.com Signed-off-by: Suman Tripathi stripa...@apm.com Reported-by: kbuild test robot fengguang...@intel.com --- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 4e83d4c..70b9ef6 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -871,6 +871,7 @@ static const struct net_device_ops xgene_ndev_ops = { .ndo_set_mac_address = xgene_enet_set_mac_address, }; +#ifdef CONFIG_ACPI static int xgene_get_port_id_acpi(struct device *dev, struct xgene_enet_pdata *pdata) { @@ -886,6 +887,7 @@ static int xgene_get_port_id_acpi(struct device *dev, return 0; } +#endif static int xgene_get_port_id_dt(struct device *dev, struct xgene_enet_pdata *pdata) { -- 1.8.2.1 -- Thanks, with regards, Suman Tripathi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
On Wed, Jul 1, 2015 at 4:26 PM, Eric Dumazet eduma...@google.com wrote: I think you are mistaken Alex. Indeed, I was! Should be unsurpising. socket early demux cannot possibly set skb-destructor to sock_rfree() Yeah I will admit adding the code to sock_rfree reflexively out of paranoia. If skb-destructor is set by early demux, it correctly points to sock_edemux() And this one correctly handles all socket variants. Yes, the problem appears to be in ip_vs_prepare_tunneled_skb (ip_vs_xmit.c:859 in 4.0) if (skb_headroom(skb) max_headroom || skb_cloned(skb)) { new_skb = skb_realloc_headroom(skb, max_headroom); if (!new_skb) goto error; if (skb-sk) skb_set_owner_w(new_skb, skb-sk); consume_skb(skb); skb = new_skb; } skb_set_owner_w sets sock_wfree. I'll figure out how to ensure that we're using an appropriate destructor here. Appreciate the patience! -- Alex Gartrell agartr...@fb.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html