[PATCH net v2] enic: fix issues in enic_poll

2015-07-01 Thread Govindarajulu Varadarajan
In enic_poll, we clean tx and rx queues, when low latency busy socket polling
is happening, enic_poll will only clean tx queue. After cleaning tx, it should
return total budget for re-poll.

There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi().
In this window if an irq occurs and napi is scheduled on different cpu, it tries
to acquire enic_poll_lock_napi() and fails. Unlock napi_poll before unmasking
the interrupt.

v2:
Do not change tx wonk done behaviour. Consider only rx work done for completing
napi.

Signed-off-by: Govindarajulu Varadarajan _gov...@gmx.com
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c 
b/drivers/net/ethernet/cisco/enic/enic_main.c
index da2004e..918a8e4 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1170,7 +1170,7 @@ static int enic_poll(struct napi_struct *napi, int budget)
 wq_work_done,
 0 /* dont unmask intr */,
 0 /* dont reset intr timer */);
-   return rq_work_done;
+   return budget;
}
 
if (budget  0)
@@ -1191,6 +1191,7 @@ static int enic_poll(struct napi_struct *napi, int budget)
0 /* don't reset intr timer */);
 
err = vnic_rq_fill(enic-rq[0], enic_rq_alloc_buf);
+   enic_poll_unlock_napi(enic-rq[cq_rq], napi);
 
/* Buffer allocation failed. Stay in polling
 * mode so we can try to fill the ring again.
@@ -1208,7 +1209,6 @@ static int enic_poll(struct napi_struct *napi, int budget)
napi_complete(napi);
vnic_intr_unmask(enic-intr[intr]);
}
-   enic_poll_unlock_napi(enic-rq[cq_rq], napi);
 
return rq_work_done;
 }
-- 
2.4.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Error message clean-ups for Renesas R-Car CAN driver

2015-07-01 Thread Marc Kleine-Budde
On 06/20/2015 02:49 AM, Sergei Shtylyov wrote:
 Hello.
 
Here's the set of 2 patches against Marc Kleine-Budde's 'linux-can.git'
 repo plus 3 fix patches just posted; they are small error message cleanups for
 the Renesas R-Car CAN driver.

Applied both series to can/master.

Thanks,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Joe Perches
On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote:
 On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote:
  diff --git a/drivers/net/ethernet/cadence/macb.c 
  b/drivers/net/ethernet/cadence/macb.c
[]
  @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev)
  while (lp-rx_ring[lp-rx_tail].addr  MACB_BIT(RX_USED)) {
  p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ;
  pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl);
  -   skb = netdev_alloc_skb(dev, pktlen + 2);
  +   skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN);
  if (skb) {
  -   skb_reserve(skb, 2);
  +   skb_reserve(skb, NET_IP_ALIGN);
  memcpy(skb_put(skb, pktlen), p_recv, pktlen);
   
  skb-protocol = eth_type_trans(skb, dev);
 
 Then please use netdev_alloc_skb_ip_align(), so that you get rid of
 skb_reserve()

It seems there are ~50 of these in the kernel tree
that could be converted.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: macb: zynq: why is SG disabled?

2015-07-01 Thread Cyrille Pitchen
Le 01/07/2015 17:14, Nicolae Rosia a écrit :
 Hello,
 
 After reading the GEM part of Zynq7000 Technical Reference Manual [0], I 
 think that SG should be supported.
 Is there a reason why SG is disabled in macb for Zynq?
 
 Best regards,
 Nicolae Rosia

Hi Nicolae,

when the scatter-gather patch was introduced, the feature was enabled only on
tested boards to avoid regressions on other boards.
So SG is enabled on sama5d4x and sama5d2x SoCs. SG is disabled on purpose on
sama5d3x.

For Zynq, I think the feature is still disabled just because it has never been
tested.

Best regards,

Cyrille
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Sebastian Ott
On Wed, 1 Jul 2015, Sebastian Ott wrote:

 On Wed, 1 Jul 2015, Or Gerlitz wrote:
  On 6/30/2015 5:17 PM, Sebastian Ott wrote:
   On Tue, 30 Jun 2015, Or Gerlitz wrote:
On 6/30/2015 4:24 PM, Sebastian Ott wrote:
   Do you run the VF on the same system/kernel as the PF, or the VF 
   is
   probed to
   VM which runs the latest kernel and the PF runsolder kernel 
   (which?)
 The latter case. The PF is driven by a much older Kernel running OFED
 2.3.2.0.0.1

Can you try running the inbox PF driver that comes with the PF kernel
(what
kernel is that?) I'd like to see we're OK there.
   Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
  
  
  I didn't want you to build that package, but rather the outer way around,
  namely
  see what happens if uninstalling this package and running with the mlx4 
  inbox
  PF
  driver from the kernel provided from your distro of choice or an
  upstreamkernel installed
  by you. Anyway, I hope the below patch would provide a quick band-aid and 
  let
  you to continue running upstream VFs over that PF config, let me know (I 
  will
  be
  OOO till Thu-Sun). Once we see how this behaves, will take it from there.
 
 Thanks for the patch. Unfortunately, that didn't work:
 

OK, using this patch it worked:

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 12fbfcb..29c2a01 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2273,6 +2273,11 @@ static int mlx4_allocate_default_counters(struct 
mlx4_dev *dev)
} else if (err == -ENOENT) {
err = 0;
continue;
+   } else if (mlx4_is_slave(dev)  err == -EINVAL) {
+   priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev);
+   mlx4_warn(dev, can't allocate counter from old PF 
driver, using index %d\n,
+ MLX4_SINK_COUNTER_INDEX(dev));
+   err = 0;
} else {
mlx4_err(dev, %s: failed to allocate default counter 
port %d err %d\n,
 __func__, port + 1, err);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Or Gerlitz

On 6/30/2015 5:17 PM, Sebastian Ott wrote:

On Tue, 30 Jun 2015, Or Gerlitz wrote:

On 6/30/2015 4:24 PM, Sebastian Ott wrote:

Do you run the VF on the same system/kernel as the PF, or the VF is
probed to
VM which runs the latest kernel and the PF runsolder kernel (which?)

The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1


Can you try running the inbox PF driver that comes with the PF kernel (what
kernel is that?) I'd like to see we're OK there.

Frankly, I don't know. Plus I also don't know how to build an ofed kernel.



I didn't want you to build that package, but rather the outer way 
around, namely
see what happens if uninstalling this package and running with the mlx4 
inbox PF
driver from the kernel provided from your distro of choice or an 
upstreamkernel installed
by you. Anyway, I hope the below patch would provide a quick band-aid 
and let
you to continue running upstream VFs over that PF config, let me know (I 
will be

OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Or.


diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c

index 12fbfcb..a66cc6e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct 
mlx4_dev *dev)

} else if (err == -ENOENT) {
err = 0;
continue;
+   } else if (mlx4_is_slave(dev)  err == -EINVAL) {
+   priv-def_counter[port] = 
MLX4_SINK_COUNTER_INDEX(dev);
+   mlx4_warn(dev, can't allocate counter from old 
PF driver, using index %d\n,

+ MLX4_SINK_COUNTER_INDEX(dev));
} else {
mlx4_err(dev, %s: failed to allocate default 
counter port %d err %d\n,

 __func__, port + 1, err);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Nicolae Rosia

On 07/01/2015 04:44 PM, Eric Dumazet wrote:

I really doubt this adapter can process millions of packets per second ?
I was suggesting this since I was taking into consideration the comment 
from skbuff.c, we can save several CPU cycles by avoiding having to 
disable and re-enable IRQs.

Is there a downside to this?



I would rather enable GRO, it would be more useful.
I had no idea what GRO is, so I have read about it [0] and looked at a 
couple of drivers which use it. They all seem to
replace netif_receive_skb with napi_gro_receive and when there are no 
more packets in napi_pool they call napi_gro_flush.

Is it that simple?

Regards

[0] https://lwn.net/Articles/358910/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Nicolae Rosia

On 07/01/2015 01:56 PM, Eric Dumazet wrote:

Then please use netdev_alloc_skb_ip_align(), so that you get rid of
skb_reserve()

Thank you for the suggestion.
I can do that.
A related question, should I also replace netdev_alloc with 
napi_alloc_skb in places where I have a napi struct?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Eric Dumazet
On Wed, 2015-07-01 at 16:29 +0300, Nicolae Rosia wrote:
 On 07/01/2015 01:56 PM, Eric Dumazet wrote:
  Then please use netdev_alloc_skb_ip_align(), so that you get rid of
  skb_reserve()
 Thank you for the suggestion.
 I can do that.
 A related question, should I also replace netdev_alloc with 
 napi_alloc_skb in places where I have a napi struct?

I really doubt this adapter can process millions of packets per second ?

I would rather enable GRO, it would be more useful.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Sebastian Ott
On Wed, 1 Jul 2015, Or Gerlitz wrote:
 On 6/30/2015 5:17 PM, Sebastian Ott wrote:
  On Tue, 30 Jun 2015, Or Gerlitz wrote:
   On 6/30/2015 4:24 PM, Sebastian Ott wrote:
  Do you run the VF on the same system/kernel as the PF, or the VF is
  probed to
  VM which runs the latest kernel and the PF runsolder kernel (which?)
The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1
   
   Can you try running the inbox PF driver that comes with the PF kernel
   (what
   kernel is that?) I'd like to see we're OK there.
  Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
 
 
 I didn't want you to build that package, but rather the outer way around,
 namely
 see what happens if uninstalling this package and running with the mlx4 inbox
 PF
 driver from the kernel provided from your distro of choice or an
 upstreamkernel installed
 by you. Anyway, I hope the below patch would provide a quick band-aid and let
 you to continue running upstream VFs over that PF config, let me know (I will
 be
 OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Thanks for the patch. Unfortunately, that didn't work:

[  170.531076] mlx4_core :00:00.0: NOP command IRQ test passed
[  170.531291] mlx4_core :00:00.0: can't allocate counter from old PF 
driver, using index 255
[  170.531294] mlx4_core :00:00.0: mlx4_allocate_default_counters: default 
counter index 255 for port 1
[  170.531531] mlx4_core :00:00.0: can't allocate counter from old PF 
driver, using index 255
[  170.531534] mlx4_core :00:00.0: mlx4_allocate_default_counters: default 
counter index 255 for port 2
[  170.531535] mlx4_core :00:00.0: Failed to allocate default counters, 
aborting
[  170.587306] mlx4_core: probe of :00:00.0 failed with error -22

Regards,
Sebastian

 
 Or.
 
 
 diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
 b/drivers/net/ethernet/mellanox/mlx4/main.c
 index 12fbfcb..a66cc6e 100644
 --- a/drivers/net/ethernet/mellanox/mlx4/main.c
 +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
 @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct
 mlx4_dev *dev)
 } else if (err == -ENOENT) {
 err = 0;
 continue;
 +   } else if (mlx4_is_slave(dev)  err == -EINVAL) {
 +   priv-def_counter[port] =
 MLX4_SINK_COUNTER_INDEX(dev);
 +   mlx4_warn(dev, can't allocate counter from old PF
 driver, using index %d\n,
 + MLX4_SINK_COUNTER_INDEX(dev));
 } else {
 mlx4_err(dev, %s: failed to allocate default counter
 port %d err %d\n,
  __func__, port + 1, err);
 
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rionet: Don't try to corrupt skbuff assigning data pointer directly

2015-07-01 Thread Alexander Sverdlin
It's not allowed to assign data pointer of skbuff directly, this makes no sense
if the assigned pointer is the very same as already existing one, or it brakes
all the pointer arithmetics in all other cases. We cannot do better as just
compare them and report BUG() in case of mismatch.

Signed-off-by: Alexander Sverdlin alexander.sverd...@nokia.com
---

We came across this problem developing new code for Octeon2 RAPIDIO. For the 
last
10 years since original commit of the code this assignment did nothing as the
pointers were always same. But the bug in the new code discovered this one. So
better do BUG() immediately here, this would prevent longer debugging of the
following skbuff corruption.

 drivers/net/rionet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c
index dac7a0d..34c27b8 100644
--- a/drivers/net/rionet.c
+++ b/drivers/net/rionet.c
@@ -104,7 +104,8 @@ static int rionet_rx_clean(struct net_device *ndev)
if (!(data = rio_get_inb_message(rnet-mport, RIONET_MAILBOX)))
break;

-   rnet-rx_skb[i]-data = data;
+   if (rnet-rx_skb[i]-data != data)
+   BUG();
skb_put(rnet-rx_skb[i], RIO_MAX_MSG_SIZE);
rnet-rx_skb[i]-protocol =
eth_type_trans(rnet-rx_skb[i], ndev);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] virtio/vhost: cross endian support

2015-07-01 Thread Michael S. Tsirkin
The following changes since commit 8a7b19d8b542b87bccc3eaaf81dcc90a5ca48aea:

  include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h (2015-06-01 
15:46:54 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 59a5b0f7bf74f88da6670bcbf924d8cc1e75b1ee:

  virtio-pci: alloc only resources actually used. (2015-06-24 08:15:09 +0200)


virtio/vhost: cross endian support

I have just queued some more bugfix patches today but none fix regressions and
none are related to these ones, so it looks like a good time for a merge for
-rc1.

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Gerd Hoffmann (1):
  virtio-pci: alloc only resources actually used.

Greg Kurz (8):
  virtio: introduce virtio_is_little_endian() helper
  tun: add tun_is_little_endian() helper
  macvtap: introduce macvtap_is_little_endian() helper
  vringh: introduce vringh_is_little_endian() helper
  vhost: introduce vhost_is_little_endian() helper
  virtio: add explicit big-endian support to memory accessors
  vhost: cross-endian support for legacy devices
  macvtap/tun: cross-endian support for little-endian hosts

 drivers/vhost/vhost.h  | 25 ---
 drivers/virtio/virtio_pci_common.h |  2 +
 include/linux/virtio_byteorder.h   | 24 ++-
 include/linux/virtio_config.h  | 18 +---
 include/linux/vringh.h | 18 +---
 include/uapi/linux/if_tun.h|  6 +++
 include/uapi/linux/vhost.h | 14 +++
 drivers/net/macvtap.c  | 65 -
 drivers/net/tun.c  | 67 +-
 drivers/vhost/vhost.c  | 85 +-
 drivers/virtio/virtio_pci_common.c |  7 
 drivers/virtio/virtio_pci_legacy.c | 13 +-
 drivers/virtio/virtio_pci_modern.c | 24 ---
 drivers/net/Kconfig| 14 +++
 drivers/vhost/Kconfig  | 15 +++
 15 files changed, 350 insertions(+), 47 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Nicolas Ferre
Le 30/06/2015 19:25, Nicolae Rosia a écrit :
 
 Signed-off-by: Nicolae Rosia nicolae.ro...@certsign.ro

Acked-by: Nicolas Ferre nicolas.fe...@atmel.com

Thanks, bye.

 ---
  drivers/net/ethernet/cadence/macb.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/ethernet/cadence/macb.c 
 b/drivers/net/ethernet/cadence/macb.c
 index caeb395..dbb5160 100644
 --- a/drivers/net/ethernet/cadence/macb.c
 +++ b/drivers/net/ethernet/cadence/macb.c
 @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev)
   while (lp-rx_ring[lp-rx_tail].addr  MACB_BIT(RX_USED)) {
   p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ;
   pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl);
 - skb = netdev_alloc_skb(dev, pktlen + 2);
 + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN);
   if (skb) {
 - skb_reserve(skb, 2);
 + skb_reserve(skb, NET_IP_ALIGN);
   memcpy(skb_put(skb, pktlen), p_recv, pktlen);
  
   skb-protocol = eth_type_trans(skb, dev);
 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.1 regression in resizable hashtable tests

2015-07-01 Thread Meelis Roos
This is 4.1 on sparc64 - one of my boxes that happens to have most 
runtime test left on from some debugging effort. In 4.0 it was fine, 4.1 
gives this in dmesg:

[   31.898697] Running resizable hashtable tests...
[   31.898915]   Adding 2048 keys
[   31.952911]   Traversal complete: counted=17, nelems=2048, entries=2048
[   31.953004] Test failed: Total count mismatch ^^^
[   32.022676]   Traversal complete: counted=17, nelems=2048, entries=2048
[   32.022788] Test failed: Total count mismatch ^^^
[   32.022828]   Deleting 2048 keys


Full dmesg:
[0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36'
[0.00] PROMLIB: Root node compatible: 
[0.00] Linux version 4.1.0 (mroos@u5) (gcc version 4.9.2 (Debian 
4.9.2-20) ) #18 Wed Jul 1 02:33:02 EEST 2015
[0.00] bootconsole [earlyprom0] enabled
[0.00] ARCH: SUN4U
[0.00] Ethernet address: 08:00:20:f8:c7:72
[0.00] MM: PAGE_OFFSET is 0xf800 (max_phys_bits == 40)
[0.00] MM: VMALLOC [0x0001 -- 0x0600]
[0.00] MM: VMEMMAP [0x0600 -- 0x0c00]
[0.00] Kernel: Using 6 locked TLB entries for main kernel image.
[0.00] Remapping the kernel... done.
[0.00] kmemleak: Kernel memory leak detector disabled
[0.00] OF stdout device is: /pci@1f,0/pci@1,1/ebus@1/se@14,40:a
[0.00] PROM: Built device tree with 70282 bytes of memory.
[0.00] Top of RAM: 0x1ff3e000, Total RAM: 0x1ff2e000
[0.00] Memory hole size: 0MB
[0.00] Allocated 16384 bytes for kernel page tables.
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x-0x1ff3dfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x1fefdfff]
[0.00]   node   0: [mem 0x1ff0-0x1ff2bfff]
[0.00]   node   0: [mem 0x1ff3a000-0x1ff3dfff]
[0.00] Initmem setup node 0 [mem 0x-0x1ff3dfff]
[0.00] On node 0 totalpages: 65431
[0.00]   Normal zone: 512 pages used for memmap
[0.00]   Normal zone: 0 pages reserved
[0.00]   Normal zone: 65431 pages, LIFO batch:15
[0.00] Booting Linux...
[0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]
[0.00] CPU CAPS: [vis]
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0 
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 64919
[0.00] Kernel command line: root=/dev/sda1 ro
[0.00] PID hash table entries: 2048 (order: 1, 16384 bytes)
[0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes)
[0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 491632K/523448K available (5216K kernel code, 509K 
rwdata, 1656K rodata, 520K init, 14578K bss, 31816K reserved, 0K cma-reserved)
[0.00] Running RCU self tests
[0.00] Testing tracer nop: PASSED
[0.00] NR_IRQS:2048 nr_irqs:2048 1
[   25.662551] clocksource tick: mask: 0x max_cycles: 
0x5306eb473f, max_idle_ns: 440795213232 ns
[   25.765203] clocksource: mult[2c71c72] shift[24]
[   25.804735] clockevent: mult[5c28f5c3] shift[32]
[   25.846767] Console: colour dummy device 80x25
[   25.882817] console [tty0] enabled
[   25.907574] bootconsole [earlyprom0] disabled
[   25.944044] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[   25.944112] ... MAX_LOCKDEP_SUBCLASSES:  8
[   25.944151] ... MAX_LOCK_DEPTH:  48
[   25.944190] ... MAX_LOCKDEP_KEYS:8191
[   25.944229] ... CLASSHASH_SIZE:  4096
[   25.944268] ... MAX_LOCKDEP_ENTRIES: 32768
[   25.944308] ... MAX_LOCKDEP_CHAINS:  65536
[   25.944349] ... CHAINHASH_SIZE:  32768
[   25.944390]  memory used by lock dependency info: 8159 kB
[   25.944437]  per task-struct memory footprint: 1920 bytes
[   25.944483] 
[   25.944516] | Locking API testsuite:
[   25.944550] 

[   25.944615]  | spin |wlock |rlock |mutex | 
wsem | rsem |
[   25.944678]   
--
[   25.944780]  A-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   26.011037]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   26.077813]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   26.144938]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   26.212077]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   26.279658]  A-B-C-D-B-D-D-A deadlock:  

Re: [RFC] COLO Proxy Module

2015-07-01 Thread Wen Congyang
On 06/30/2015 05:19 PM, Patrick McHardy wrote:
 On 30.06, Li Zhijian wrote:
 |ping...

 and i have another question:
 can i add a new |||nf_ct_ext_id simply without touching the exiting kernel
 code?|
 
 No, the kernel needs to know the highest extension ID in order to
 allocate space for the offsets.

So if we want to add a new extension, we should post the module too. Is
it right?

Thanks
Wen Congyang

 
 in order to support COLO-Proxy, i need a extra nf_ct_ext_id called
 NF_CT_EXT_COLO
 to store some message of COLO related. This message can help COLO-Proxy to
 buffer packet and compare packet for each connection.

 Thanks
 Li Zhijian
 
 Cheers,
 Patrick
 --
 To unsubscribe from this list: send the line unsubscribe netfilter-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnetlink: Actually use the policy for the IFLA_VF_INFO

2015-07-01 Thread Daniel Borkmann

Hi Jason,

On 07/01/2015 12:52 AM, Jason Gunthorpe wrote:

It turns out the policy was defined but never actually checked,
so lets check it.

Fixes: ebc08a6f47ee (rtnetlink: Add VF config code to rtnetlink)


I would argue that the actual commit would be ...

Fixes: c02db8c6290b (rtnetlink: make SR-IOV VF interface symmetric)

... since in ebc08a6f47ee, these members were part of ifla_policy[]
which has been validated (if we ignore the fact that it was NLA_BINARY).

So, commit c02db8c6290b moved it into a nested attribute (IFLA_VF_INFO)
where we indeed don't do further validation. Imho, we should pass the
parsed attribute table from nla_parse_nested() down into do_setvfinfo(),
something like the below; I can give it a test run on my ixgbe.

Cheers,
Daniel

 net/core/rtnetlink.c | 184 ++-
 1 file changed, 94 insertions(+), 90 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 01ced4a..7d63551 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1328,10 +1328,6 @@ static const struct nla_policy 
ifla_info_policy[IFLA_INFO_MAX+1] = {
[IFLA_INFO_SLAVE_DATA]  = { .type = NLA_NESTED },
 };

-static const struct nla_policy ifla_vfinfo_policy[IFLA_VF_INFO_MAX+1] = {
-   [IFLA_VF_INFO]  = { .type = NLA_NESTED },
-};
-
 static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
[IFLA_VF_MAC]   = { .len = sizeof(struct ifla_vf_mac) },
[IFLA_VF_VLAN]  = { .len = sizeof(struct ifla_vf_vlan) },
@@ -1488,96 +1484,98 @@ static int validate_linkmsg(struct net_device *dev, 
struct nlattr *tb[])
return 0;
 }

-static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
+static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 {
-   int rem, err = -EINVAL;
-   struct nlattr *vf;
const struct net_device_ops *ops = dev-netdev_ops;
+   int err = -EINVAL;

-   nla_for_each_nested(vf, attr, rem) {
-   switch (nla_type(vf)) {
-   case IFLA_VF_MAC: {
-   struct ifla_vf_mac *ivm;
-   ivm = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_mac)
-   err = ops-ndo_set_vf_mac(dev, ivm-vf,
- ivm-mac);
-   break;
-   }
-   case IFLA_VF_VLAN: {
-   struct ifla_vf_vlan *ivv;
-   ivv = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_vlan)
-   err = ops-ndo_set_vf_vlan(dev, ivv-vf,
-  ivv-vlan,
-  ivv-qos);
-   break;
-   }
-   case IFLA_VF_TX_RATE: {
-   struct ifla_vf_tx_rate *ivt;
-   struct ifla_vf_info ivf;
-   ivt = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_get_vf_config)
-   err = ops-ndo_get_vf_config(dev, ivt-vf,
-ivf);
-   if (err)
-   break;
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_rate)
-   err = ops-ndo_set_vf_rate(dev, ivt-vf,
-  ivf.min_tx_rate,
-  ivt-rate);
-   break;
-   }
-   case IFLA_VF_RATE: {
-   struct ifla_vf_rate *ivt;
-   ivt = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_rate)
-   err = ops-ndo_set_vf_rate(dev, ivt-vf,
-  ivt-min_tx_rate,
-  ivt-max_tx_rate);
-   break;
-   }
-   case IFLA_VF_SPOOFCHK: {
-   struct ifla_vf_spoofchk *ivs;
-   ivs = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_spoofchk)
-   err = ops-ndo_set_vf_spoofchk(dev, ivs-vf,
-  ivs-setting);
-   break;
-   }
-   case IFLA_VF_LINK_STATE: {
-   struct ifla_vf_link_state *ivl;
-   ivl = nla_data(vf);
-   err = -EOPNOTSUPP;
-   if (ops-ndo_set_vf_link_state)
-   err = 

Re: [PATCH] ax88179_178a: add reset functionality in reset_resume

2015-07-01 Thread Vivek Kumar Bhagat
Hello David,

I configured my email client and below patch does not have indentation
issue of converting tabs into spaces. I hope it should be accepted.

Thanks,
Vivek

--- Original Message ---
Sender : Vivek Kumar Bhagatvivek.bha...@samsung.com Chief 
Engineer/SRI-Delhi-System S/W 1 Team/Samsung Electronics
Date : Jul 01, 2015 09:44 (GMT+05:30)
Title : [PATCH] ax88179_178a: add reset functionality in reset_resume

Without reset functionality in reset_resume, iperf connection
does not establish after suspend/resume however ping works at
the same time. iperf connection fails by giving checksum error
in tcpdump.

reset function inside reset_resume solves above bug.
We have verified it on ASIX based ST Lab, Cadyce dongle.

Signed-off-by: Vivek Kumar Bhagat 
Signed-off-by: Praveen Kumar 

---
drivers/net/usb/ax88179_178a.c |   14 +-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index e6338c1..00928c0 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1630,6 +1630,18 @@ static int ax88179_stop(struct usbnet *dev)
return 0;
}

+static int ax88179_reset_resume(struct usb_interface *intf)
+{
+ struct usbnet *dev = usb_get_intfdata(intf);
+ int ret;
+
+ ret = ax88179_reset(dev);
+ if (ret  0)
+ return ret;
+
+ return  ax88179_resume(intf);
+}
+
static const struct driver_info ax88179_info = {
.description = ASIX AX88179 USB 3.0 Gigabit Ethernet,
.bind = ax88179_bind,
@@ -1744,7 +1756,7 @@ static struct usb_driver ax88179_178a_driver = {
.probe = usbnet_probe,
.suspend = ax88179_suspend,
.resume = ax88179_resume,
- .reset_resume = ax88179_resume,
+ .reset_resume = ax88179_reset_resume,
.disconnect = usbnet_disconnect,
.supports_autosuspend = 1,
.disable_hub_initiated_lpm = 1,
-- 
1.7.9.5N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [RFC] virtio_net: Adding tx_timeout function.

2015-07-01 Thread Michael S. Tsirkin
On Wed, Jun 24, 2015 at 10:31:09PM -0300, Julio Faracco wrote:
 2015-06-24 3:10 GMT-03:00 Michael S. Tsirkin m...@redhat.com:
 
  On Tue, Jun 23, 2015 at 10:44:29PM -0300, Julio Faracco wrote:
   virtio_net paravirtualized driver does not have a tx_timeout() function to
   guarantee that the driver will recover properly after receiving a timeout
   during a transmission of a packet. This patch add this feature and throw a
   timeout exception after 5 HZ. Considering some tests, this is the best
   time to use here.
  
   Signed-off-by: Julio Faracco jcfara...@gmail.com
   Cc: Jason Wang jasow...@redhat.com
 
  Looks like a bunch of locks and flushes are missing in this patch.  IMHO
  that's just too painful with current hardware.  IMO the right thing to
  do here is to add ability to reset specific queues to hardware.
 
 
 I agree, Michael. This model is the default one resetting the device
 due to transmission timeout.
 To have a better performance, only some queues must be reset.

It's not a question of performance. You would need to write
a bunch of code anyway. Why not do it in the hypervisor
so guest can simply write into a register and reset
a ring?


BTW now that I think about it, this requires Jason's 
patches that introduce the tx interrupt, otherwise
packet will timeout simply because no packets are sent.


   ---
drivers/net/virtio_net.c |   69 
   +-
1 file changed, 68 insertions(+), 1 deletion(-)
  
   diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
   index 63c7810..75ac45c 100644
   --- a/drivers/net/virtio_net.c
   +++ b/drivers/net/virtio_net.c
   @@ -135,6 +135,9 @@ struct virtnet_info {
 /* Work struct for config space updates */
 struct work_struct config_work;
  
   + /* Work struct for resetting the virtio-net driver. */
   + struct work_struct reset_task;
   +
 /* Does the affinity hint is set for virtqueues? */
 bool affinity_hint_set;
  
   @@ -1394,6 +1397,18 @@ static int virtnet_change_mtu(struct net_device 
   *dev, int new_mtu)
 return 0;
}
  
   +static void virtnet_tx_timeout(struct net_device *dev)
   +{
   + struct virtnet_info *vi = netdev_priv(dev);
   +
   + dev_warn(dev-dev, TX Timeout exception with latency: %ld\n,
   +  jiffies - dev_trans_start(dev));
   +
   + schedule_work(vi-reset_task);
 
  What if after this triggers user does something
  to the device (e.g. attempts to remove it)?
  Or if a packet is transmitted or used?
 
 At some point, this work must be canceled.
 Yes, you are right. Specially, when the driver is being removed.
 
   +}
   +
   +static void virtnet_reset_task(struct work_struct *work);
   +
static const struct net_device_ops virtnet_netdev = {
 .ndo_open= virtnet_open,
 .ndo_stop= virtnet_close,
   @@ -1405,6 +1420,7 @@ static const struct net_device_ops virtnet_netdev = 
   {
 .ndo_get_stats64 = virtnet_stats,
 .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
 .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
   + .ndo_tx_timeout  = virtnet_tx_timeout,
#ifdef CONFIG_NET_POLL_CONTROLLER
 .ndo_poll_controller = virtnet_netpoll,
#endif
   @@ -1750,6 +1766,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 dev-netdev_ops = virtnet_netdev;
 dev-features = NETIF_F_HIGHDMA;
  
   + dev-watchdog_timeo = 5 * HZ;
 dev-ethtool_ops = virtnet_ethtool_ops;
 SET_NETDEV_DEV(dev, vdev-dev);
  
   @@ -1811,6 +1828,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 }
  
 INIT_WORK(vi-config_work, virtnet_config_changed_work);
   + INIT_WORK(vi-reset_task, virtnet_reset_task);
  
 /* If we can receive ANY GSO packets, we must allocate large ones. 
   */
 if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
   @@ -1891,7 +1909,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 netif_carrier_on(dev);
 }
  
   - pr_debug(virtnet: registered device %s with %d RX and TX vq's\n,
   + pr_debug(virtio_net: registered device %s with %d RX and TX 
   vq's\n,
  dev-name, max_queue_pairs);
  
 return 0;
   @@ -2001,6 +2019,55 @@ static int virtnet_restore(struct virtio_device 
   *vdev)
}
#endif
  
   +static void virtnet_reset_task(struct work_struct *work)
   +{
   + struct virtnet_info *vi =
   + container_of(work, struct virtnet_info, reset_task);
   + struct net_device *dev = vi-dev;
   + struct virtio_device *vdev = vi-vdev;
   + int err, i;
   +
   + flush_work(vi-config_work);
   +
   + netif_device_detach(vi-dev);
   + cancel_delayed_work_sync(vi-refill);
   +
   + if (netif_running(vi-dev)) {
   + for (i = 0; i  vi-max_queue_pairs; i++) {
   + napi_disable(vi-rq[i].napi);
   +   

Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Eric Dumazet
On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote:
 Signed-off-by: Nicolae Rosia nicolae.ro...@certsign.ro
 ---
  drivers/net/ethernet/cadence/macb.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/ethernet/cadence/macb.c 
 b/drivers/net/ethernet/cadence/macb.c
 index caeb395..dbb5160 100644
 --- a/drivers/net/ethernet/cadence/macb.c
 +++ b/drivers/net/ethernet/cadence/macb.c
 @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev)
   while (lp-rx_ring[lp-rx_tail].addr  MACB_BIT(RX_USED)) {
   p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ;
   pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl);
 - skb = netdev_alloc_skb(dev, pktlen + 2);
 + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN);
   if (skb) {
 - skb_reserve(skb, 2);
 + skb_reserve(skb, NET_IP_ALIGN);
   memcpy(skb_put(skb, pktlen), p_recv, pktlen);
  
   skb-protocol = eth_type_trans(skb, dev);

Then please use netdev_alloc_skb_ip_align(), so that you get rid of
skb_reserve()



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


macb: zynq: why is SG disabled?

2015-07-01 Thread Nicolae Rosia

Hello,

After reading the GEM part of Zynq7000 Technical Reference Manual [0], I 
think that SG should be supported.

Is there a reason why SG is disabled in macb for Zynq?

Best regards,
Nicolae Rosia
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Nicolae Rosia
On Wed, Jul 1, 2015 at 6:33 PM, Eric Dumazet eric.duma...@gmail.com wrote:
[...]
 This only matters in terms of few nano seconds per packet, so for 10Gb+
 NIC anyway. Absolute noise for most NIC.

I'll give it a try and benchmark.
I achieved a huge speedup by moving TX into napi [0], but my hardware doesn't
support multiple TX queues and I can't test that situation.

 Yes, but main question is : Do you have the hardware to test your
 changes ?
Yes, I have a Xilinx ZC706 board with a Zynq7 XC7Z045 processor [1]

[0] https://patchwork.ozlabs.org/patch/488949/
[1] http://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4.2-rc1] printk: make extended printk support conditional on netconsole

2015-07-01 Thread Petr Mladek
On Mon 2015-06-29 19:31:40, Tejun Heo wrote:
 6fe29354befe (printk: implement support for extended console
 drivers) implemented extended printk support for extended netconsole.
 The code added was miniscule but it added static 8k buffer
 unconditionally unnecessarily bloating the kernel for cases where
 extended netconsole is not used.
 
 This patch introduces CONFIG_PRINTK_CON_EXTENDED which is selected by
 CONFIG_NETCONSOLE.  If the config option is not set, extended printk
 support is compiled out along with the static buffer.
 
 Verified 8k reduction in vmlinux bss when !CONFIG_NETCONSOLE.
 
 Signed-off-by: Tejun Heo t...@kernel.org
 Reported-and-suggested-by: Geert Uytterhoeven ge...@linux-m68k.org
 ---
 Linus, Andrew.
 
 This removes an unnecessary 8k bss bloat introduced during v4.2-rc1
 merge window on certain configs.  The original patch was routed
 through -mm.  How should this be routed?
 
 Thanks.
 
  drivers/net/Kconfig|1 +
  init/Kconfig   |3 +++
  kernel/printk/printk.c |   33 +
  3 files changed, 33 insertions(+), 4 deletions(-)

[...]

 --- a/kernel/printk/printk.c
 +++ b/kernel/printk/printk.c
[...]
 @@ -2561,9 +2584,11 @@ void register_console(struct console *ne
   console_drivers-next = newcon;
   }
  
 - if (newcon-flags  CON_EXTENDED)
 - if (!nr_ext_console_drivers++)
 + if (newcon-flags  CON_EXTENDED) {
 + if (!nr_ext_console_drivers)
   pr_info(printk: continuation disabled due to ext 
 consoles, expect more fragments in /dev/kmsg\n);
 + inc_nr_ext_console_drivers();

We should handle also the situation when CON_EXTENDED is set
and CONFIG_PRINTK_CON_EXTENDED is not set by mistake. Otherwise,
we will not increment nr_ext_console_drivers here = ext_text will
not be filled in console_unlock() = call_console_drivers()
will print nothing for the CON_EXTENDED console.

At least, I would print an error here. Something like.

#ifndef CONFIG_PRINTK_CON_EXTENDED
pr_err(The registered extended console will print nothing 
because the kernel is not compiled with PRINTK_CON_EXTENDED\n);
#endif

I wonder if there is a good identification of the console that can be printed.

Otherwise, it looks fine to me.

Best Regards,
Petr
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Eric Dumazet
On Wed, 2015-07-01 at 17:29 +0300, Nicolae Rosia wrote:
 On 07/01/2015 04:44 PM, Eric Dumazet wrote:
  I really doubt this adapter can process millions of packets per second ?
 I was suggesting this since I was taking into consideration the comment 
 from skbuff.c, we can save several CPU cycles by avoiding having to 
 disable and re-enable IRQs.
 Is there a downside to this?

This only matters in terms of few nano seconds per packet, so for 10Gb+
NIC anyway. Absolute noise for most NIC.

 
 
  I would rather enable GRO, it would be more useful.
 I had no idea what GRO is, so I have read about it [0] and looked at a 
 couple of drivers which use it. They all seem to
 replace netif_receive_skb with napi_gro_receive and when there are no 
 more packets in napi_pool they call napi_gro_flush.
 Is it that simple?

Yes, but main question is : Do you have the hardware to test your
changes ?


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] virtio/vhost: cross endian support

2015-07-01 Thread Linus Torvalds
On Wed, Jul 1, 2015 at 12:02 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Doing a unconditional byte swap is faster and simpler than the crazy
 conditionals.

Unconditional endianness not only makes for simpler and faster code,
it also ends up being easier to debug and add things like type
annotations for sparse.

Linus
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] virtio/vhost: cross endian support

2015-07-01 Thread Linus Torvalds
On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote:
 virtio/vhost: cross endian support

Ugh. Does this really have to be dynamic?

Can't virtio do the sane thing, and just use a _fixed_ endianness?

Doing a unconditional byte swap is faster and simpler than the crazy
conditionals. That's true regardless of endianness, but gets to be
even more so if the fixed endianness is little-endian, since BE is
not-so-slowly fading from the world.

   Linus
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread Alex Gartrell
If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it
(as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree
are going to reach into the inet_timewait_sock and mess with fields that
don't exist.

Signed-off-by: Alex Gartrell agartr...@fb.com
---
 net/core/sock.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 1e1fe9a..b37328f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1620,6 +1620,9 @@ void sock_wfree(struct sk_buff *skb)
struct sock *sk = skb-sk;
unsigned int len = skb-truesize;
 
+   if (sk-sk_state == TCP_TIME_WAIT)
+   return;
+
if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
/*
 * Keep a reference on sk_wmem_alloc, this will be released
@@ -1665,6 +1668,9 @@ void sock_rfree(struct sk_buff *skb)
struct sock *sk = skb-sk;
unsigned int len = skb-truesize;
 
+   if (sk-sk_state == TCP_TIME_WAIT)
+   return;
+
atomic_sub(len, sk-sk_rmem_alloc);
sk_mem_uncharge(sk, len);
 }
-- 
Alex Gartrell agartr...@fb.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Or Gerlitz
On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott seb...@linux.vnet.ibm.com wrote:
 OK, using this patch it worked:

yep, I forgot to recap err to zero.

By it worked you mean the VF is live and kicking, all functionality
you had before the 4.2 merge window is back again?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread David Miller
From: Alex Gartrell agartr...@fb.com
Date: Wed, 1 Jul 2015 13:13:09 -0700

 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it
 (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree
 are going to reach into the inet_timewait_sock and mess with fields that
 don't exist.
 
 Signed-off-by: Alex Gartrell agartr...@fb.com

If we're forwarding, we should not find a local socket, period.

We should only match sockets for locally destined packets.

So I'd say that the state in which you say this can occur is illegal.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] cdc_ncm: Add support for moving NDP to end of NCM frame

2015-07-01 Thread Enrico Mioso
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position) and an E3372 device (which mandates NDP
to be after indexed datagrams).

V1-V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
- rewrote description and commit subject to be more clear

Signed-off-by: Enrico Mioso mrkiko...@gmail.com
---
 drivers/net/usb/cdc_mbim.c   |  2 +-
 drivers/net/usb/cdc_ncm.c| 50 
 drivers/net/usb/huawei_cdc_ncm.c |  7 --
 include/linux/usb/cdc_ncm.h  |  7 +-
 4 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c
index e4b7a47..efc18e0 100644
--- a/drivers/net/usb/cdc_mbim.c
+++ b/drivers/net/usb/cdc_mbim.c
@@ -158,7 +158,7 @@ static int cdc_mbim_bind(struct usbnet *dev, struct 
usb_interface *intf)
if (!cdc_ncm_comm_intf_is_mbim(intf-cur_altsetting))
goto err;
 
-   ret = cdc_ncm_bind_common(dev, intf, data_altsetting);
+   ret = cdc_ncm_bind_common(dev, intf, data_altsetting, 0);
if (ret)
goto err;
 
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 8067b8f..4a27673 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -684,10 +684,12 @@ static void cdc_ncm_free(struct cdc_ncm_ctx *ctx)
ctx-tx_curr_skb = NULL;
}
 
+   kfree(ctx-delayed_ndp16);
+
kfree(ctx);
 }
 
-int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 
data_altsetting)
+int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 
data_altsetting, int drvflags)
 {
const struct usb_cdc_union_desc *union_desc = NULL;
struct cdc_ncm_ctx *ctx;
@@ -855,6 +857,17 @@ advance:
/* finish setting up the device specific data */
cdc_ncm_setup(dev);
 
+   /* Device-specific flags */
+   ctx-drvflags = drvflags;
+
+   /* Allocate the delayed NDP if needed. */
+   if (ctx-drvflags  CDC_NCM_FLAG_NDP_TO_END) {
+   ctx-delayed_ndp16 = kzalloc(ctx-max_ndp_size, GFP_KERNEL);
+   if (!ctx-delayed_ndp16)
+   goto error2;
+   dev_info(intf-dev, NDP will be placed at end of frame for 
this device.);
+   }
+
/* override ethtool_ops */
dev-net-ethtool_ops = cdc_ncm_ethtool_ops;
 
@@ -954,8 +967,11 @@ static int cdc_ncm_bind(struct usbnet *dev, struct 
usb_interface *intf)
if (cdc_ncm_select_altsetting(intf) != CDC_NCM_COMM_ALTSETTING_NCM)
return -ENODEV;
 
-   /* The NCM data altsetting is fixed */
-   ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM);
+   /* The NCM data altsetting is fixed, so we hard-coded it.
+* Additionally, generic NCM devices are assumed to accept arbitrarily
+* placed NDP.
+*/
+   ret = cdc_ncm_bind_common(dev, intf, CDC_NCM_DATA_ALTSETTING_NCM, 0);
 
/*
 * We should get an event when network connection is connected or
@@ -986,6 +1002,14 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct 
cdc_ncm_ctx *ctx, struct sk_
struct usb_cdc_ncm_nth16 *nth16 = (void *)skb-data;
size_t ndpoffset = le16_to_cpu(nth16-wNdpIndex);
 
+   /* If NDP should be moved to the end of the NCM package, we can't 
follow the
+   * NTH16 header as we would normally do. NDP isn't written to the SKB 
yet, and
+   * the wNdpIndex field in the header is actually not consistent with 
reality. It will be later.
+   */
+   if (ctx-drvflags  CDC_NCM_FLAG_NDP_TO_END)
+   if (ctx-delayed_ndp16-dwSignature == sign)
+   return ctx-delayed_ndp16;
+
/* follow the chain of NDPs, looking for a match */
while (ndpoffset) {
ndp16 = (struct usb_cdc_ncm_ndp16 *)(skb-data + ndpoffset);
@@ -995,7 +1019,8 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct 
cdc_ncm_ctx *ctx, struct sk_
}
 
/* align new NDP */
-   cdc_ncm_align_tail(skb, ctx-tx_ndp_modulus, 0, ctx-tx_max);
+   if (!(ctx-drvflags  CDC_NCM_FLAG_NDP_TO_END))
+   cdc_ncm_align_tail(skb, ctx-tx_ndp_modulus, 0, ctx-tx_max);
 
/* verify that there 

[PATCH 1/2 net-next] tcp: reduce cwnd if retransmit is lost in CA_Loss

2015-07-01 Thread Yuchung Cheng
If the retransmission in CA_Loss is lost again, we should not
continue to slow start or raise cwnd in congestion avoidance mode.
Instead we should enter fast recovery and use PRR to reduce cwnd,
following the principle in RFC5681:

... or the loss of a retransmission, should be taken as two
 indications of congestion and, therefore, cwnd (and ssthresh) MUST
 be lowered twice in this case.

This is especially important to reduce loss when the CA_Loss
state was caused by a traffic policer dropping the entire inflight.
The CA_Loss state has a problem where a loss of L packets causes the
sender to send a burst of L packets. So a policer that's dropping
most packets in a given RTT can cause a huge retransmit storm. By
contrast, PRR includes logic to bound the number of outbound packets
that result from a given ACK. So switching to CA_Recovery on lost
retransmits in CA_Loss avoids this retransmit storm problem when
in CA_Loss.

Signed-off-by: Yuchung Cheng ych...@google.com
Signed-off-by: Nandita Dukkipati nandi...@google.com
Signed-off-by: Neal Cardwell ncardw...@google.com
---
 net/ipv4/tcp_input.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 684f095..923e0e5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -109,6 +109,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_SYN_ACKED 0x10 /* This ACK acknowledged SYN.  
*/
 #define FLAG_DATA_SACKED   0x20 /* New SACK.   
*/
 #define FLAG_ECE   0x40 /* ECE in this ACK 
*/
+#define FLAG_LOST_RETRANS  0x80 /* This ACK marks some retransmission lost 
*/
 #define FLAG_SLOWPATH  0x100 /* Do not skip RFC checks for window 
update.*/
 #define FLAG_ORIG_SACK_ACKED   0x200 /* Never retransmitted data are (s)acked  
*/
 #define FLAG_SND_UNA_ADVANCED  0x400 /* Snd_una was changed (!= 
FLAG_DATA_ACKED) */
@@ -1037,7 +1038,7 @@ static bool tcp_is_sackblock_valid(struct tcp_sock *tp, 
bool is_dsack,
  * highest SACK block). Also calculate the lowest snd_nxt among the remaining
  * retransmitted skbs to avoid some costly processing per ACKs.
  */
-static void tcp_mark_lost_retrans(struct sock *sk)
+static void tcp_mark_lost_retrans(struct sock *sk, int *flag)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
@@ -1078,7 +1079,7 @@ static void tcp_mark_lost_retrans(struct sock *sk)
if (after(received_upto, ack_seq)) {
TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS;
tp-retrans_out -= tcp_skb_pcount(skb);
-
+   *flag |= FLAG_LOST_RETRANS;
tcp_skb_mark_lost_uncond_verify(tp, skb);
NET_INC_STATS_BH(sock_net(sk), 
LINUX_MIB_TCPLOSTRETRANSMIT);
} else {
@@ -1818,7 +1819,7 @@ advance_sp:
((inet_csk(sk)-icsk_ca_state != TCP_CA_Loss) || tp-undo_marker))
tcp_update_reordering(sk, tp-fackets_out - state-reord, 0);
 
-   tcp_mark_lost_retrans(sk);
+   tcp_mark_lost_retrans(sk, state-flag);
tcp_verify_left_out(tp);
 out:
 
@@ -2676,7 +2677,7 @@ static void tcp_enter_recovery(struct sock *sk, bool 
ece_ack)
tp-prior_ssthresh = 0;
tcp_init_undo(tp);
 
-   if (inet_csk(sk)-icsk_ca_state  TCP_CA_CWR) {
+   if (!tcp_in_cwnd_reduction(sk)) {
if (!ece_ack)
tp-prior_ssthresh = tcp_current_ssthresh(sk);
tcp_init_cwnd_reduction(sk);
@@ -2852,9 +2853,10 @@ static void tcp_fastretrans_alert(struct sock *sk, const 
int acked,
break;
case TCP_CA_Loss:
tcp_process_loss(sk, flag, is_dupack);
-   if (icsk-icsk_ca_state != TCP_CA_Open)
+   if (icsk-icsk_ca_state != TCP_CA_Open 
+   !(flag  FLAG_LOST_RETRANS))
return;
-   /* Fall through to processing in Open state. */
+   /* Change state if cwnd is undone or retransmits are lost */
default:
if (tcp_is_reno(tp)) {
if (flag  FLAG_SND_UNA_ADVANCED)
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 net-next] tcp: PRR uses CRB mode by default and SS mode conditionally

2015-07-01 Thread Yuchung Cheng
PRR slow start is often too aggressive especially when drops are
caused by traffic policers. The policers mainly use token bucket
to enforce the rate so sending (twice) faster than the delivery
rate causes excessive drops.

This patch changes PRR to the conservative reduction bound
(CRB) mode in RFC 6937 by default. CRB follows the packet
conservation rule to send at most the delivery rate by default.

But if many packets are lost and the pipe is empty, CRB may take N
round trips to repair N losses. We conditionally turn on slow start
mode if all these conditions are made to speed up the recovery:

  1) on the second round or later in recovery
  2) retransmission sent in the previous round is delivered on this ACK
  3) no retransmission is marked lost on this ACK

By using packet conservation by default, this change reduces the loss
retransmits signicantly on networks that deploy traffic policers,
up to 20% reduction of overall loss rate.

Signed-off-by: Yuchung Cheng ych...@google.com
Signed-off-by: Nandita Dukkipati nandi...@google.com
Signed-off-by: Neal Cardwell ncardw...@google.com
---
 net/ipv4/tcp_input.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 923e0e5..ad1482d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2476,15 +2476,14 @@ static bool tcp_try_undo_loss(struct sock *sk, bool 
frto_undo)
return false;
 }
 
-/* The cwnd reduction in CWR and Recovery use the PRR algorithm
- * 
https://datatracker.ietf.org/doc/draft-ietf-tcpm-proportional-rate-reduction/
+/* The cwnd reduction in CWR and Recovery uses the PRR algorithm in RFC 6937.
  * It computes the number of packets to send (sndcnt) based on packets newly
  * delivered:
  *   1) If the packets in flight is larger than ssthresh, PRR spreads the
  * cwnd reductions across a full RTT.
- *   2) If packets in flight is lower than ssthresh (such as due to excess
- * losses and/or application stalls), do not perform any further cwnd
- * reductions, but instead slow start up to ssthresh.
+ *   2) Otherwise PRR uses packet conservation to send as much as delivered.
+ *  But when the retransmits are acked without further losses, PRR
+ *  slow starts cwnd up to ssthresh to speed up the recovery.
  */
 static void tcp_init_cwnd_reduction(struct sock *sk)
 {
@@ -2501,7 +2500,7 @@ static void tcp_init_cwnd_reduction(struct sock *sk)
 }
 
 static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked,
-  int fast_rexmit)
+  int fast_rexmit, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
int sndcnt = 0;
@@ -2510,16 +2509,18 @@ static void tcp_cwnd_reduction(struct sock *sk, const 
int prior_unsacked,
 (tp-packets_out - tp-sacked_out);
 
tp-prr_delivered += newly_acked_sacked;
-   if (tcp_packets_in_flight(tp)  tp-snd_ssthresh) {
+   if (delta  0) {
u64 dividend = (u64)tp-snd_ssthresh * tp-prr_delivered +
   tp-prior_cwnd - 1;
sndcnt = div_u64(dividend, tp-prior_cwnd) - tp-prr_out;
-   } else {
+   } else if ((flag  FLAG_RETRANS_DATA_ACKED) 
+  !(flag  FLAG_LOST_RETRANS)) {
sndcnt = min_t(int, delta,
   max_t(int, tp-prr_delivered - tp-prr_out,
 newly_acked_sacked) + 1);
+   } else {
+   sndcnt = min(delta, newly_acked_sacked);
}
-
sndcnt = max(sndcnt, (fast_rexmit ? 1 : 0));
tp-snd_cwnd = tcp_packets_in_flight(tp) + sndcnt;
 }
@@ -2580,7 +2581,7 @@ static void tcp_try_to_open(struct sock *sk, int flag, 
const int prior_unsacked)
if (inet_csk(sk)-icsk_ca_state != TCP_CA_CWR) {
tcp_try_keep_open(sk);
} else {
-   tcp_cwnd_reduction(sk, prior_unsacked, 0);
+   tcp_cwnd_reduction(sk, prior_unsacked, 0, flag);
}
 }
 
@@ -2737,7 +2738,7 @@ static void tcp_process_loss(struct sock *sk, int flag, 
bool is_dupack)
 
 /* Undo during fast recovery after partial ACK. */
 static bool tcp_try_undo_partial(struct sock *sk, const int acked,
-const int prior_unsacked)
+const int prior_unsacked, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
 
@@ -2753,7 +2754,7 @@ static bool tcp_try_undo_partial(struct sock *sk, const 
int acked,
 * mark more packets lost or retransmit more.
 */
if (tp-retrans_out) {
-   tcp_cwnd_reduction(sk, prior_unsacked, 0);
+   tcp_cwnd_reduction(sk, prior_unsacked, 0, flag);
return true;
}
 
@@ -2840,7 +2841,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const 
int acked,
if 

[PATCH 0/2 net-next] tcp: reducing lost retransmits in recovery

2015-07-01 Thread Yuchung Cheng
This patch series reduces lost retransmits in recovery, in particular
when dealing with traffic policers. The main problem is that
slow start in recovery under policing can cause massive lost and
retransmit storms: any excess sending rate turns into drops. The
solution is to avoid doing slow start when lost retransmit is
detected and use packet conservation instead.

On networks with traffic policers the patches have lowered the
TCP loss rates by ~20% from Google servers without latency regressions.

Yuchung Cheng (2):
  tcp: reduce cwnd if retransmit is lost in CA_Loss
  tcp: PRR uses CRB mode by default and SS mode conditionally

 net/ipv4/tcp_input.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread Eric Dumazet
On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote:
 From: Alex Gartrell agartr...@fb.com
 Date: Wed, 1 Jul 2015 13:13:09 -0700

 If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan it
 (as we need to do in the ipvs forwarding case), sock_wfree and sock_rfree
 are going to reach into the inet_timewait_sock and mess with fields that
 don't exist.

 Signed-off-by: Alex Gartrell agartr...@fb.com

 If we're forwarding, we should not find a local socket, period.

 We should only match sockets for locally destined packets.

 So I'd say that the state in which you say this can occur is illegal.

Right, this patch is totally buggy.

A socket cannot change state to TCP_TIMEWAIT.

A new object is allocated and old one is removed from ehash, then
freed (rcu rules being applied)

Also sock_wfree() has nothing to do with early demux. It is for output
path skbs only.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ax88179_178a: add reset functionality in reset_resume

2015-07-01 Thread David Miller
From: Vivek Kumar Bhagat vivek.bha...@samsung.com
Date: Wed, 01 Jul 2015 11:33:58 + (GMT)

 I configured my email client and below patch does not have indentation
 issue of converting tabs into spaces. I hope it should be accepted.

Patchwork is not recognizing your postings as a patch submission,
therefore it's not ending up in my queue of patches to handle.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wiring up direct socket calls on x86_32 Linux?

2015-07-01 Thread Tulio Magno Quites Machado Filho
Andy Lutomirski l...@amacapital.net writes:

 Hi all-

 sys_socketcall sucks.  If nothing else, it's impossible to filter with
 seccomp.  Should we wire up the real socket calls so that user code
 can (very slowly) start migrating?

 I think the list is:
  - socket
  - bind
  - connect
  - listen
  - accept4
  - getsockname
  - getpeername
  - socketpair
  - send
  - sendto
  - sendmsg
  - recv
  - recvfrom
  - recvmsg
  - shutdown
  - setsockopt

I guess you might want to follow the patch Raji sent today [1].

Her patch doesn't have all the syscalls you mentioned here, but has others too.
She will work to get a generic implementation for these functions.

[1] http://patchwork.sourceware.org/patch/7438/

-- 
Tulio Magno

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Eric Dumazet
On Wed, 2015-07-01 at 18:53 +0300, Nicolae Rosia wrote:
 On Wed, Jul 1, 2015 at 6:33 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 [...]
  This only matters in terms of few nano seconds per packet, so for 10Gb+
  NIC anyway. Absolute noise for most NIC.
 
 I'll give it a try and benchmark.
 I achieved a huge speedup by moving TX into napi [0], but my hardware doesn't
 support multiple TX queues and I can't test that situation.
 
  Yes, but main question is : Do you have the hardware to test your
  changes ?
 Yes, I have a Xilinx ZC706 board with a Zynq7 XC7Z045 processor [1]
 
 [0] https://patchwork.ozlabs.org/patch/488949/
 [1] http://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html

OK, then enabling GRO should be quite easy, given driver already has
most of it.

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 
caeb39561567237261ac0d50befebad666cfbeb3..24a93c769caa5430ca61efe002b458fef7281e99
 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -815,7 +815,7 @@ static int gem_rx(struct macb *bp, int budget)
   skb-data, 32, true);
 #endif
 
-   netif_receive_skb(skb);
+   napi_gro_receive(bp-napi, skb);
}
 
gem_rx_refill(bp);
@@ -896,7 +896,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
bp-stats.rx_bytes += skb-len;
netdev_vdbg(bp-dev, received skb of length %u, csum: %08x\n,
   skb-len, skb-csum);
-   netif_receive_skb(skb);
+   napi_gro_receive(bp-napi, skb);
 
return 0;
 }


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] enic: fix issues in enic_poll

2015-07-01 Thread David Miller
From: Govindarajulu Varadarajan _gov...@gmx.com
Date: Wed,  1 Jul 2015 11:23:54 +0530

 Current code checks only rx work done to complete napi. It completely ignores
 tx work done. If we have only tx packets to clean and no rq work, we always
 napi complete instead of re-poll. Change this behavior to re-poll until
 tx work_done + rx work_done is not 0.

The existing TX behavior is correct, please do not change it.

You should never count TX work against the NAPI poll budget, it is
only for RX work.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iproute 4.1.1

2015-07-01 Thread Stephen Hemminger
I am putting out iproute2 4.1.1 (ie stable) on Monday. Will include fixes for 
MPLS and TIPC that
are already in latest (master). Is there any other fixes which people think are 
urgent enough to
backport.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipv6: Make MLD packets to only be processed locally.

2015-07-01 Thread Angga

Before commit daad151263cf334 (ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input(). MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.

Make MLD packet only processed locally.

Fixes: daad151263cf334 (ipv6: Make ipv6_is_mld() inline and use it from 
ip6_mc_input().)

Signed-off-by: Hermin Anggawijaya hermin.anggawij...@alliedtelesis.co.nz
---

diff --git a/linux/net/ipv6/ip6_input.c.orig b/linux/net/ipv6/ip6_input.c
index f2e464e..57990c9 100644
--- a/linux/net/ipv6/ip6_input.c.orig
+++ b/linux/net/ipv6/ip6_input.c
@@ -331,10 +331,10 @@ int ip6_mc_input(struct sk_buff *skb)
 if (offset  0)
 goto out;

-   if (!ipv6_is_mld(skb, nexthdr, offset))
-   goto out;
+   if (ipv6_is_mld(skb, nexthdr, offset))
+   deliver = true;

-   deliver = true;
+   goto out;
 }
 /* unknown RA - process it normally */
 }


Issue with active-backup mode bond and bridge

2015-07-01 Thread pengyi Peng(Yi)
I find that kernel seems to be not well handled with the combination of bonding 
and bridge module. I have a physical host with two nics that are bonded 
together (active backup mode).  Each nic is connected to a separate L2 switch. 
And the two L2 switchs are connected to a L3 switch.

If the host only has the bond device, when I manually make the active slave 
down, bonding will issue one or more gratuitous ARPs on the newly active slave. 
One gratuitous ARP is issued for the bonding master interface, provided that 
the interface has at least one IP address configured. 

However, if there is a bridge named br0 and the bond device joins in the bridge 
br0, the IP address of the bond moves to the br0 device. First, I make two nics 
up. But this time, when I again make the active slave down, I can't capture the 
gratuitous ARP in the bond device with tcpdump. And this can result in the bad 
connect to the host, because with no ARP packet sended out of the host, the L3 
switch may still send the packets from outside to the old L2 switch which 
connect to the new backup nic. These packets can't get any responses.

I read the kernel code. 
When change the active slave into the specified one, in 
bond_change_active_slave function, bond will send the NETDEV_NOTIFY_PEERS event:
netdev_bonding_change(bond-dev, 
NETDEV_BONDING_FAILOVER);
if (should_notify_peers)
netdev_bonding_change(bond-dev,
  NETDEV_NOTIFY_PEERS);

  
And in inetdev_event function, if event is NETDEV_NOTIFY_PEERS, it will call 
inetdev_send_gratuitous_arp to send gratuitous ARP.
case NETDEV_NOTIFY_PEERS:
/* Send gratuitous ARP to notify of link change */
inetdev_send_gratuitous_arp(dev, in_dev);
break;

But when the bond is in the bridge, the code won't change the dev to the bridge 
device, and there is no IP address in bond device, so there is no gratuitous 
ARP.

My question is, why the latest kernel(4.1) still does not consider this 
conditoin ?


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majordomo@xxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] net namespace: dynamically configure new net namespace inherit net config

2015-07-01 Thread yzhu1

Hi, David

This patch is applied in our linux for a long time. It should work well.

Would you like to let me know your advice about this patch?

Thanks a lot.
Zhu Yanjun

On 06/26/2015 05:37 PM, Zhu Yanjun wrote:

The new net namespace can inherit from the original net config, or
the current net config. As such, a config is needed to decide where
the new namespace inherit from.

Signed-off-by: Zhu Yanjun yanjun@windriver.com
---
  init/Kconfig   |  9 +
  net/ipv4/devinet.c | 13 +
  2 files changed, 22 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index dc24dec..fab8c41 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1212,6 +1212,15 @@ config NET_NS
  Allow user space to create what appear to be multiple instances
  of the network stack.
  
+config NET_NS_INHERIT_ORIGINAL

+   bool New network namespace inherits from original net config
+   depends on NET_NS
+   default n
+   help
+ Allow new network namespace inherit from original net config.
+ If no, the new network namespace inherits from the current net
+ config including the modified net config.
+
  endif # NAMESPACES
  
  config SCHED_AUTOGROUP

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 419d23c..cf635e4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2271,6 +2271,7 @@ static __net_init int devinet_init_net(struct net *net)
  #endif
  
  	err = -ENOMEM;

+#ifndef CONFIG_NET_NS_INHERIT_ORIGINAL
all = ipv4_devconf;
dflt = ipv4_devconf_dflt;
  
@@ -2282,6 +2283,15 @@ static __net_init int devinet_init_net(struct net *net)

dflt = kmemdup(dflt, sizeof(ipv4_devconf_dflt), GFP_KERNEL);
if (!dflt)
goto err_alloc_dflt;
+#else
+   all = kmemdup(ipv4_devconf, sizeof(ipv4_devconf), GFP_KERNEL);
+   if (!all)
+   goto err_alloc_all;
+
+   dflt = kmemdup(ipv4_devconf_dflt, sizeof(ipv4_devconf_dflt), 
GFP_KERNEL);
+   if (!dflt)
+   goto err_alloc_dflt;
+#endif
  
  #ifdef CONFIG_SYSCTL

tbl = kmemdup(tbl, sizeof(ctl_forward_entry), GFP_KERNEL);
@@ -2292,7 +2302,10 @@ static __net_init int devinet_init_net(struct net *net)
tbl[0].extra1 = all;
tbl[0].extra2 = net;
  #endif
+
+#ifndef CONFIG_NET_NS_INHERIT_ORIGINAL
}
+#endif
  
  #ifdef CONFIG_SYSCTL

err = __devinet_sysctl_register(net, all, all);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: macb: zynq: why is SG disabled?

2015-07-01 Thread Punnaiah Choudary Kalluri
Hi Nicolae and Cyrille,

   SG feature was not tested for Zynq using macb driver but tested it using the
 emacps Driver in Xilinx tree (this driver is deprecated recently)

We will test and enable this feature in driver for Zynq.

Regards,
Punnaiah

 -Original Message-
 From: Cyrille Pitchen [mailto:cyrille.pitc...@atmel.com]
 Sent: Wednesday, July 01, 2015 10:34 PM
 To: Nicolae Rosia; Michal Simek; Punnaiah Choudary Kalluri;
 netdev@vger.kernel.org; Nicolas Ferre; linux-arm-ker...@lists.infradead.org
 Subject: Re: macb: zynq: why is SG disabled?
 
 Le 01/07/2015 17:14, Nicolae Rosia a écrit :
  Hello,
 
  After reading the GEM part of Zynq7000 Technical Reference Manual [0], I
 think that SG should be supported.
  Is there a reason why SG is disabled in macb for Zynq?
 
  Best regards,
  Nicolae Rosia
 
 Hi Nicolae,
 
 when the scatter-gather patch was introduced, the feature was enabled only
 on tested boards to avoid regressions on other boards.
 So SG is enabled on sama5d4x and sama5d2x SoCs. SG is disabled on purpose
 on sama5d3x.
 
 For Zynq, I think the feature is still disabled just because it has never been
 tested.
 
 Best regards,
 
 Cyrille
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Eric Dumazet
On Wed, 2015-07-01 at 10:06 -0700, Joe Perches wrote:
 On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote:
  On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote:
   diff --git a/drivers/net/ethernet/cadence/macb.c 
   b/drivers/net/ethernet/cadence/macb.c
 []
   @@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev)
 while (lp-rx_ring[lp-rx_tail].addr  MACB_BIT(RX_USED)) {
 p_recv = lp-rx_buffers + lp-rx_tail * AT91ETHER_MAX_RBUFF_SZ;
 pktlen = MACB_BF(RX_FRMLEN, lp-rx_ring[lp-rx_tail].ctrl);
   - skb = netdev_alloc_skb(dev, pktlen + 2);
   + skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN);
 if (skb) {
   - skb_reserve(skb, 2);
   + skb_reserve(skb, NET_IP_ALIGN);
 memcpy(skb_put(skb, pktlen), p_recv, pktlen);

 skb-protocol = eth_type_trans(skb, dev);
  
  Then please use netdev_alloc_skb_ip_align(), so that you get rid of
  skb_reserve()
 
 It seems there are ~50 of these in the kernel tree
 that could be converted.
 

Make sure the 2 is really NET_IP_ALIGN

Some hardwares need 2, even if NET_IP_ALIGN is 0 (on x86 arches for
example)

I would rather not touch this without testing the change on real
hardware.





--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread Eric Dumazet
On Thu, 2015-07-02 at 01:26 +0200, Eric Dumazet wrote:
 On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell alexgartr...@gmail.com wrote:
  On Wednesday, July 1, 2015, Eric Dumazet eduma...@google.com wrote:
 
  On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote:
   From: Alex Gartrell agartr...@fb.com
   Date: Wed, 1 Jul 2015 13:13:09 -0700
  
   If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan
   it
   (as we need to do in the ipvs forwarding case), sock_wfree and
   sock_rfree
   are going to reach into the inet_timewait_sock and mess with fields
   that
   don't exist.
  
   Signed-off-by: Alex Gartrell agartr...@fb.com
  
   If we're forwarding, we should not find a local socket, period.
 
  A socket cannot change state to TCP_TIMEWAIT.
 
  A new object is allocated and old one is removed from ehash, then
  freed (rcu rules being applied)
 
  Also sock_wfree() has nothing to do with early demux. It is for output
  path skbs only.
 
 
  Alright I kind of cheated and didn't include full context here. The problem
  is that within ipvs we are getting  packets that have been early demuxed and
  associated with time wait sockets which we then wish to forward immediately
  (ip_vs_xmit.c).  Under normal circumstances it would never be associated
  with any sk at all, but it is because of early demux, so we want to drop the
  relationship by calling skb_orphan.  This invokes the destructor which lands
  us there.
 
  So that is how we reach this illegal treating a twsk like an sk state.
 
  If there is a better way to drop the association than skb_orphan I will use
  it.
 
 I think you are mistaken Alex.
 
 socket early demux cannot possibly set skb-destructor to sock_rfree()
 
 If skb-destructor is set by early demux, it correctly points to sock_edemux()
 
 And this one correctly handles all socket variants.


If ipvs is the problem, could you try instead following patch ?

Shoot in the dark, as you do not give a lot of details :(

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a8c1aea251..f77fe9acc7a4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1932,6 +1932,14 @@ static inline void skb_set_hash_from_sk(struct sk_buff 
*skb, struct sock *sk)
}
 }
 
+/* This helper checks if a socket is a full socket,
+ * ie _not_ a timewait or request socket.
+ */
+static inline bool sk_fullsock(const struct sock *sk)
+{
+   return (1  sk-sk_state)  ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
+}
+
 /*
  * Queue a received datagram if it will fit. Stream and sequenced
  * protocols can't normally use this as they need to fit buffers in
@@ -1944,6 +1952,9 @@ static inline void skb_set_hash_from_sk(struct sk_buff 
*skb, struct sock *sk)
 static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 {
skb_orphan(skb);
+   if (unlikely(!sk_fullsock(sk))
+   return;
+
skb-sk = sk;
skb-destructor = sock_wfree;
skb_set_hash_from_sk(skb, sk);
@@ -2204,14 +2215,6 @@ static inline struct sock *skb_steal_sock(struct sk_buff 
*skb)
return NULL;
 }
 
-/* This helper checks if a socket is a full socket,
- * ie _not_ a timewait or request socket.
- */
-static inline bool sk_fullsock(const struct sock *sk)
-{
-   return (1  sk-sk_state)  ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
-}
-
 void sock_enable_timestamp(struct sock *sk, int flag);
 int sock_get_timestamp(struct sock *, struct timeval __user *);
 int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 5d2b806a862e..ff05ec5a9016 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1161,9 +1161,10 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int 
af)
if (unlikely(skb-sk != NULL  hooknum == NF_INET_LOCAL_OUT 
 af == AF_INET)) {
struct sock *sk = skb-sk;
-   struct inet_sock *inet = inet_sk(skb-sk);
 
-   if (inet  sk-sk_family == PF_INET  inet-nodefrag)
+   if (sk_fullsock(sk) 
+   sk-sk_family == PF_INET 
+   inet_sk(sk)-nodefrag)
return NF_ACCEPT;
}
 
@@ -1640,9 +1641,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int 
af)
if (unlikely(skb-sk != NULL  hooknum == NF_INET_LOCAL_OUT 
 af == AF_INET)) {
struct sock *sk = skb-sk;
-   struct inet_sock *inet = inet_sk(skb-sk);
 
-   if (inet  sk-sk_family == PF_INET  inet-nodefrag)
+   if (sk_fullsock(sk) 
+   sk-sk_family == PF_INET 
+   inet_sk(sk)-nodefrag)
return NF_ACCEPT;
}
 

 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH] add stealth mode

2015-07-01 Thread Matteo Croce
Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Dest-Unreach for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.
---
 include/linux/inetdevice.h | 1 +
 include/linux/ipv6.h   | 1 +
 include/uapi/linux/ip.h| 1 +
 net/ipv4/devinet.c | 1 +
 net/ipv4/icmp.c| 6 ++
 net/ipv4/tcp_ipv4.c| 3 ++-
 net/ipv4/udp.c | 4 +++-
 net/ipv6/addrconf.c| 7 +++
 net/ipv6/icmp.c| 3 ++-
 net/ipv6/tcp_ipv6.c| 2 +-
 net/ipv6/udp.c | 3 ++-
 11 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..2f1b31f 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if(IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if(IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..c887d6e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if(!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..b3b0dee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/netdevice.h
 #include linux/slab.h
 #include net/tcp_states.h
@@ -1823,7 +1824,8 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct
udp_table *udptable,
  goto csum_error;

  UDP_INC_STATS_BH(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+ if(!IN_DEV_STEALTH(skb-dev-ip_ptr))
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);

  /*
  * Hmm.  We got an UDP packet to a port to which we
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..b9e44e2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5585,6 +5585,13 @@ static struct addrconf_sysctl_table
  .proc_handler = addrconf_sysctl_stable_secret,
  },
  {
+ .procname = stealth,
+ .data = ipv6_devconf.stealth,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
  /* sentinel */
  }
  },
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 713d743..94b08ac 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -723,7 +723,8 @@ static int icmpv6_rcv(struct sk_buff *skb)

  switch (type) {
  case ICMPV6_ECHO_REQUEST:
- icmpv6_echo_reply(skb);
+ if(!idev-cnf.stealth)
+ icmpv6_echo_reply(skb);
  break;

  case ICMPV6_ECHO_REPLY:
diff --git a/net/ipv6/tcp_ipv6.c 

Re: [PATCH net-next] net: macb: replace literal constant with NET_IP_ALIGN

2015-07-01 Thread Joe Perches
On Thu, 2015-07-02 at 00:13 +0200, Eric Dumazet wrote:
 On Wed, 2015-07-01 at 10:06 -0700, Joe Perches wrote:
  On Wed, 2015-07-01 at 12:56 +0200, Eric Dumazet wrote:
   On Tue, 2015-06-30 at 20:25 +0300, Nicolae Rosia wrote:
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
  []
@@ -2554,9 +2554,9 @@ static void at91ether_rx(struct net_device *dev)
while (lp-rx_ring[lp-rx_tail].addr  MACB_BIT(RX_USED)) {
p_recv = lp-rx_buffers + lp-rx_tail * 
AT91ETHER_MAX_RBUFF_SZ;
pktlen = MACB_BF(RX_FRMLEN, 
lp-rx_ring[lp-rx_tail].ctrl);
-   skb = netdev_alloc_skb(dev, pktlen + 2);
+   skb = netdev_alloc_skb(dev, pktlen + NET_IP_ALIGN);
if (skb) {
-   skb_reserve(skb, 2);
+   skb_reserve(skb, NET_IP_ALIGN);
memcpy(skb_put(skb, pktlen), p_recv, pktlen);
 
skb-protocol = eth_type_trans(skb, dev);
   
   Then please use netdev_alloc_skb_ip_align(), so that you get rid of
   skb_reserve()
  
  It seems there are ~50 of these in the kernel tree
  that could be converted.
  
 
 Make sure the 2 is really NET_IP_ALIGN
 
 Some hardwares need 2, even if NET_IP_ALIGN is 0 (on x86 arches for
 example)
 
 I would rather not touch this without testing the change on real
 hardware.

Nor I really.

Most all of those are in fairly old hardware drivers.

I just wanted to point out that more exist.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread Eric Dumazet
On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell alexgartr...@gmail.com wrote:
 On Wednesday, July 1, 2015, Eric Dumazet eduma...@google.com wrote:

 On Wed, Jul 1, 2015 at 11:14 PM, David Miller da...@davemloft.net wrote:
  From: Alex Gartrell agartr...@fb.com
  Date: Wed, 1 Jul 2015 13:13:09 -0700
 
  If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan
  it
  (as we need to do in the ipvs forwarding case), sock_wfree and
  sock_rfree
  are going to reach into the inet_timewait_sock and mess with fields
  that
  don't exist.
 
  Signed-off-by: Alex Gartrell agartr...@fb.com
 
  If we're forwarding, we should not find a local socket, period.

 A socket cannot change state to TCP_TIMEWAIT.

 A new object is allocated and old one is removed from ehash, then
 freed (rcu rules being applied)

 Also sock_wfree() has nothing to do with early demux. It is for output
 path skbs only.


 Alright I kind of cheated and didn't include full context here. The problem
 is that within ipvs we are getting  packets that have been early demuxed and
 associated with time wait sockets which we then wish to forward immediately
 (ip_vs_xmit.c).  Under normal circumstances it would never be associated
 with any sk at all, but it is because of early demux, so we want to drop the
 relationship by calling skb_orphan.  This invokes the destructor which lands
 us there.

 So that is how we reach this illegal treating a twsk like an sk state.

 If there is a better way to drop the association than skb_orphan I will use
 it.

I think you are mistaken Alex.

socket early demux cannot possibly set skb-destructor to sock_rfree()

If skb-destructor is set by early demux, it correctly points to sock_edemux()

And this one correctly handles all socket variants.

/* All sockets share common refcount, but have different destructors */
void sock_gen_put(struct sock *sk)
{
if (!atomic_dec_and_test(sk-sk_refcnt))
return;

if (sk-sk_state == TCP_TIME_WAIT)
inet_twsk_free(inet_twsk(sk));
else if (sk-sk_state == TCP_NEW_SYN_RECV)
reqsk_free(inet_reqsk(sk));
else
sk_free(sk);
}
EXPORT_SYMBOL_GPL(sock_gen_put);

void sock_edemux(struct sk_buff *skb)
{
sock_gen_put(skb-sk);
}
EXPORT_SYMBOL(sock_edemux);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v1 1/1] drivers: net: xgene: Fix the compilation error error: implicit declaration of function 'acpi_evaluate_integer' in APM X-Gene ethernet driver.

2015-07-01 Thread Suman Tripathi
Hi ,

Any comments on this patch .

On Wed, Jun 24, 2015 at 1:51 PM, Suman Tripathi stripa...@apm.com wrote:
 This patch guards the acpi_evaluate_interger function as it fails
 the build for NON_ACPI CONFIG.

 Signed-off-by: Iyappan Subramanian isubraman...@apm.com
 Signed-off-by: Suman Tripathi stripa...@apm.com
 Reported-by: kbuild test robot fengguang...@intel.com
 ---
  drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
 b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
 index 4e83d4c..70b9ef6 100644
 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
 +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
 @@ -871,6 +871,7 @@ static const struct net_device_ops xgene_ndev_ops = {
 .ndo_set_mac_address = xgene_enet_set_mac_address,
  };

 +#ifdef CONFIG_ACPI
  static int xgene_get_port_id_acpi(struct device *dev,
   struct xgene_enet_pdata *pdata)
  {
 @@ -886,6 +887,7 @@ static int xgene_get_port_id_acpi(struct device *dev,

 return 0;
  }
 +#endif

  static int xgene_get_port_id_dt(struct device *dev, struct xgene_enet_pdata 
 *pdata)
  {
 --
 1.8.2.1




-- 
Thanks,
with regards,
Suman Tripathi
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-01 Thread Alex Gartrell
On Wed, Jul 1, 2015 at 4:26 PM, Eric Dumazet eduma...@google.com wrote:
 I think you are mistaken Alex.

Indeed, I was!  Should be unsurpising.


 socket early demux cannot possibly set skb-destructor to sock_rfree()

Yeah I will admit adding the code to sock_rfree reflexively out of paranoia.

 If skb-destructor is set by early demux, it correctly points to sock_edemux()

 And this one correctly handles all socket variants.

Yes, the problem appears to be in ip_vs_prepare_tunneled_skb
(ip_vs_xmit.c:859 in 4.0)

if (skb_headroom(skb)  max_headroom || skb_cloned(skb)) {
new_skb = skb_realloc_headroom(skb, max_headroom);
if (!new_skb)
goto error;
if (skb-sk)
skb_set_owner_w(new_skb, skb-sk);
consume_skb(skb);
skb = new_skb;
}

skb_set_owner_w sets sock_wfree.

I'll figure out how to ensure that we're using an appropriate destructor here.

Appreciate the patience!

-- 
Alex Gartrell agartr...@fb.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html