Re: [PATCH v4] netlink: Fix autobind race condition that leads to zero port ID

2015-09-21 Thread David Miller
From: Herbert Xu 
Date: Mon, 21 Sep 2015 14:06:36 +0800

> On Sun, Sep 20, 2015 at 10:55:21PM -0700, David Miller wrote:
>> From: Herbert Xu 
>> Date: Fri, 18 Sep 2015 19:16:50 +0800
>> 
>> > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
>> > Reset portid after netlink_insert failure") introduced a race
>> > condition where if two threads try to autobind the same socket
>> > one of them may end up with a zero port ID.  This led to kernel
>> > deadlocks that were observed by multiple people.
>> > 
>> > This patch reverts that commit and instead fixes it by introducing
>> > a separte rhash_portid variable so that the real portid is only set
>> > after the socket has been successfully hashed.
>> > 
>> > Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
>> > Reported-by: Tejun Heo 
>> > Reported-by: Linus Torvalds 
>> > Signed-off-by: Herbert Xu 
>> 
>> Applied and queued up for -stable, thanks Herbert.
> 
> Sorry but Dave but there are still races with v4 as Tejun pointed
> out.  I'm still working on it and I could post them as incremental
> patches if that's the easiest.

Oops, sorry about that.

Yeah at this point incremental patches work the best.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] netlink: Fix autobind race condition that leads to zero port ID

2015-09-21 Thread Herbert Xu
On Sun, Sep 20, 2015 at 10:55:21PM -0700, David Miller wrote:
> From: Herbert Xu 
> Date: Fri, 18 Sep 2015 19:16:50 +0800
> 
> > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
> > Reset portid after netlink_insert failure") introduced a race
> > condition where if two threads try to autobind the same socket
> > one of them may end up with a zero port ID.  This led to kernel
> > deadlocks that were observed by multiple people.
> > 
> > This patch reverts that commit and instead fixes it by introducing
> > a separte rhash_portid variable so that the real portid is only set
> > after the socket has been successfully hashed.
> > 
> > Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
> > Reported-by: Tejun Heo 
> > Reported-by: Linus Torvalds 
> > Signed-off-by: Herbert Xu 
> 
> Applied and queued up for -stable, thanks Herbert.

Sorry but Dave but there are still races with v4 as Tejun pointed
out.  I'm still working on it and I could post them as incremental
patches if that's the easiest.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Pravin B Shelar
VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). The patch fixes the bug by checking for
negative offset in skb_checksum_help().

Following is the kernel panic msg from old kernel hitting the bug.

[ cut here ]
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[] skb_checksum_help+0x144/0x150
Call Trace:

[] queue_userspace_packet+0x408/0x470 [openvswitch]
[] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[] ovs_vport_receive+0x2a/0x30 [openvswitch]
[] vxlan_rcv+0x53/0x60 [openvswitch]
[] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[] udp_queue_rcv_skb+0x2dc/0x3b0
[] __udp4_lib_rcv+0x1cf/0x6c0
[] udp_rcv+0x1a/0x20
[] ip_local_deliver_finish+0xdd/0x280
[] ip_local_deliver+0x88/0x90
[] ip_rcv_finish+0x10d/0x370
[] ip_rcv+0x235/0x300
[] __netif_receive_skb+0x55d/0x620
[] netif_receive_skb+0x80/0x90
[] virtnet_poll+0x555/0x6f0
[] net_rx_action+0x134/0x290
[] __do_softirq+0xa8/0x210
[] call_softirq+0x1c/0x30
[] do_softirq+0x65/0xa0
[] irq_exit+0x8e/0xb0
[] do_IRQ+0x63/0xe0
[] common_interrupt+0x6e/0x6e

Reported-by: Anupam Chanda 
Signed-off-by: Pravin B Shelar 
---
 net/core/dev.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ee0d628..008f1ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2408,6 +2408,9 @@ int skb_checksum_help(struct sk_buff *skb)
skb_warn_bad_offload(skb);
return -EINVAL;
}
+   offset = skb_checksum_start_offset(skb);
+   if (offset < 0)
+   goto out_set_summed;
 
/* Before computing a checksum, we should make sure no frag could
 * be modified by an external entity : checksum could be wrong.
@@ -2418,7 +2421,6 @@ int skb_checksum_help(struct sk_buff *skb)
goto out;
}
 
-   offset = skb_checksum_start_offset(skb);
BUG_ON(offset >= skb_headlen(skb));
csum = skb_checksum(skb, offset, skb->len - offset, 0);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] openvswitch: Zero flows on allocation.

2015-09-21 Thread David Miller
From: Jesse Gross 
Date: Fri, 18 Sep 2015 19:06:14 -0700

> @@ -80,7 +80,7 @@ struct sw_flow *ovs_flow_alloc(void)
>   struct flow_stats *stats;
>   int node;
>  
> - flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
> + flow = kmem_cache_alloc(flow_cache, GFP_KERNEL | __GFP_ZERO);
>   if (!flow)

Like Eric, I prefer that you use kmem_cache_zalloc() to fix
this.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Scott Feldman
On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> Jiri Pirko (6):
>   switchdev: rename "trans" to "trans_ph".
>   switchdev: introduce transaction infrastructure for attr_set and
> obj_add
>   rocker: switch to local transaction phase enum
>   switchdev: move transaction phase enum under transaction structure
>   rocker: use switchdev transaction queue for allocated memory
>   switchdev: split commit and prepare phase into two callbacks

Patches compile, but first test bombs.  Cut-and-paste of dump at end
of this email.

I'm not sure I'm liking this patchset because it looks like a way for
switchdev drivers to easily opt-out of the prepare-commit transaction
model by simply not implementing the *_pre op.  I would rather drivers
explicitly handle the PREPARE phase in code, even if that means
skipping it gracefully (in code) with a comment (in code) explaining
why it does not matter for this device/operation.  That's what DSA had
done, mostly because it was a retro-fit.

Also, the patchset removes the ABORT callback in case of a rollback
due to a failed PREPARE.  We can't make the assumption that it's just
a memory list to destroy on ABORT.  The driver, on PREPARE, may have
reserved device space or staged an operation on the device which we'll
need to undo on ABORT.

So we need ABORT back, and we need PREPARE to not be optional, so
what's left list enqueue/dequeue helpers, which I'm not seeing much
value in up-leveling as the driver can do list_add/del itself.

Am I missing something?  I didn't see a motivation statement for the
RFC so I'm not sure where you wanted to take this.

-scott



[1.998791] BUG: unable to handle kernel NULL pointer dereference
at 0008
[2.05] IP: []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
[2.05] PGD 0
[2.05] Oops:  [#1] SMP
[2.05] Modules linked in: floppy(+) ata_piix(+) libata
rocker(+) virtio_pci(+) virtio_ring virtio scsi_mod
[2.05] CPU: 0 PID: 91 Comm: modprobe Not tainted 4.2.0+ #3
[2.05] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[2.05] task: 88000f5e0800 ti: 88000f5f8000 task.ti:
88000f5f8000
[2.05] RIP: 0010:[]  []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
[2.05] RSP: 0018:88000f5fba20  EFLAGS: 00010246
[2.05] RAX: 88000f17c050 RBX: 88000b40 RCX: 0020
[2.05] RDX:  RSI:  RDI: 
[2.05] RBP:  R08:  R09: a0058680
[2.05] R10: 0001 R11: 9319 R12: 
[2.05] R13: 88000f5b7000 R14: 88000f5b7840 R15: 
[2.05] FS:  7fd132237700() GS:88000fc0()
knlGS:
[2.05] CS:  0010 DS:  ES:  CR0: 8005003b
[2.05] CR2: 0008 CR3: 0f5d5000 CR4: 06f0
[2.05] Stack:
[2.05]  a005a32a 0001 00ff88000f5b7000
8175725a
[2.05]  a0057750  a0058680
0020001a
[2.05]  88000b40  88000ea06000
88000f5b7000
[2.05] Call Trace:
[2.05]  [] ? rocker_cmd_exec+0x5a/0x1f0 [rocker]
[2.05]  [] ?
rocker_cmd_get_port_stats_prep+0x90/0x90 [rocker]
[2.05]  [] ?
rocker_cmd_get_port_settings_phys_name_proc+0xb0/0xb0 [rocker]
[2.05]  [] ? rocker_probe+0xb6a/0xe59 [rocker]
[2.05]  [] ? local_pci_probe+0x48/0xa0
[2.05]  [] ? pci_device_probe+0x112/0x130
[2.05]  [] ? driver_probe_device+0x196/0x2a0
[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
[2.05]  [] ? __driver_attach+0x8d/0x90
[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
[2.05]  [] ? bus_for_each_dev+0x4c/0x80
[2.05]  [] ? bus_add_driver+0x119/0x220
[2.05]  [] ? 0xa0065000
[2.05]  [] ? driver_register+0x5a/0xe0
[2.05]  [] ? 0xa0065000
[2.05]  [] ? rocker_module_init+0x33/0x1000 [rocker]
[2.05]  [] ? do_one_initcall+0x90/0x1e0
[2.05]  [] ? do_init_module+0x50/0x1d8
[2.05]  [] ? load_module+0x1c1c/0x2240
[2.05]  [] ? show_initstate+0x50/0x50
[2.05]  [] ? SyS_init_module+0x104/0x130
[2.05]  [] ? entry_SYSCALL_64_fastpath+0x16/0x75
[2.05] Code: 48 89 c1 48 c7 c2 10 dc 15 81 48 89 c6 e8 e4 0d
4a e1 48 8b 3b e8 1c 1b 4a e1 eb b6 66 2e 0f 1f 84 00 00 00 00 00 48
89 d1 89 f2 <8b> 77 08 e9 73 ff ff ff 0f 1f 00 48 83 ec 68 4c 89 7c 24
60 41
[2.05] RIP  []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] iptunnel: make rx/tx bytes counters consistent

2015-09-21 Thread Nicolas Dichtel

Le 21/09/2015 07:37, David Miller a écrit :

From: David Miller 
Date: Sun, 20 Sep 2015 22:35:13 -0700 (PDT)


After the patch:
$ ping -c1 192.168.0.121 ; ip -s l ls dev gre1
PING 192.168.0.121 (192.168.0.121) 56(84) bytes of data.
64 bytes from 192.168.0.121: icmp_req=1 ttl=64 time=2.95 ms

--- 192.168.0.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.955/2.955/2.955/0.000 ms


BTW, when you includ PING output in a commit message like this
it really makes things difficult.

"---" denotes end of the commit as far as tools like "git am"
are concerned.

I happened to notice this time and fix up the commit messages by hand,
but it'd be much better if it weren't left up to chance like that.

Oh ok. I will take care of that next time.

Thank you for fixing it.


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 3/7] bridge: define some min/max/default ageing time constants

2015-09-21 Thread Jiri Pirko
Sun, Sep 20, 2015 at 05:48:25PM CEST, sfel...@gmail.com wrote:
>From: Scott Feldman 
>
>Signed-off-by: Scott Feldman 
Acked-by: Jiri Pirko 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 4/7] rocker: adding port ageing_time for ageing out FDB entries

2015-09-21 Thread Jiri Pirko
Sun, Sep 20, 2015 at 05:48:26PM CEST, sfel...@gmail.com wrote:
>From: Scott Feldman 
>
>Follow-up patcheset will allow user to change ageing_time, but for now
>just hard-code it to a fixed value (the same value used as the default
>for the bridge driver).
>
>Signed-off-by: Scott Feldman 
Acked-by: Jiri Pirko 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ip: find correct route for socket which is not bound (v2)

2015-09-21 Thread Wengang Wang
This is the v2, comparing the v1, the changes is:
 * for loopback outbound device, it continue skipping cached route;
   for others, it goes through the cached route.

For multi-cast, we should find valid route(thus get the meaniful pmtu) for
the package on the socket which is not bound to a device(sk_bound_dev_if
being 0) too.

>From man page of socket(7)

   SO_BINDTODEVICE
Bind this socket to a particular device like “eth0”, as
specified in the passed interface name.  If the name is an
empty string or the option length is zero, the socket
device binding is removed. The  passed  option is  a
variable-length null-terminated interface name string with
the maximum size of IFNAMSIZ.  If a socket is bound to an
interface, only packets received from that particular
interface are processed by the socket. Note that this works
only for some socket types, particularly AF_INET sockets.
It is not supported for packet sockets (use normal bind(2)
there).

The man page doesn't say when socket not bound packages won't be routed.

A problem is hit that all multi-cast packages dropped by kernel(from sender
host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
length multi-cast  package. In side IPoIB the first send is dropped because
is exeeding the internal package size limitation mcast_mtu which is 2044.
So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A
correct route is configured for the multi-cast, so the setting of pmtu
cucceeded and the next multi-cast package(to the same target) is expected
to succeed(it would be well fragmented accroding to the pmtu I just set).
But actually the second and later multi-cast packages got dropped too. And
the reason is that the neighor looking up(fib_lookup) is skipped because of
the socket is not bound to device(sk_bound_dev_if being 0). After applied
the patch I proposed here, it works fine.

Signed-off-by: Wengang Wang 
---
 net/ipv4/route.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f4a556..c0534c2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
 */
 
fl4->flowi4_oif = dev_out->ifindex;
-   goto make_route;
+   if (dev_out->flags & IFF_LOOPBACK)
+   goto make_route;
+   else
+   goto lookup;
}
 
if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
@@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
goto make_route;
}
 
+lookup:
if (fib_lookup(net, fl4, , 0)) {
res.fi = NULL;
res.table = NULL;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: can-next 2015-09-17

2015-09-21 Thread Marc Kleine-Budde
Hello David,

this is a pull request of 8 patches for net-next/master.

All 8 patches are by me and cleanup the flexcan driver.

Marc

---

The following changes since commit a1ef48e1e8843e2f6be631b8cf1c21b24579b9d6:

  Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue (2015-09-20 
22:26:58 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.4-20150921

for you to fetch changes up to 6fa7da249269a6146ce456c43098901c81c8afdf:

  can: flexcan: enable interrupts atomically at the end of flexcan_chip_start() 
(2015-09-21 08:38:23 +0200)


linux-can-next-for-4.4-20150921


Marc Kleine-Budde (8):
  can: flexcan: cleanup coding style and fix typos
  can: headers: make header files self contained
  can: flexcan: remove unused header files
  can: flexcan: flexcan_chip_start(): cleanup writing of reg_mcr
  can: flexcan: rename feature into quirks
  can: flexcan: use pointer to struct regs instead of void pointer for mmio 
address space
  can: flexcan: give member of flexcan_priv holding mailboxes a sensible 
name
  can: flexcan: enable interrupts atomically at the end of 
flexcan_chip_start()

 drivers/net/can/flexcan.c | 197 ++
 include/linux/can/dev.h   |   3 +-
 include/linux/can/led.h   |   1 +
 3 files changed, 95 insertions(+), 106 deletions(-)

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Jiri Pirko
Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>> Jiri Pirko (6):
>>   switchdev: rename "trans" to "trans_ph".
>>   switchdev: introduce transaction infrastructure for attr_set and
>> obj_add
>>   rocker: switch to local transaction phase enum
>>   switchdev: move transaction phase enum under transaction structure
>>   rocker: use switchdev transaction queue for allocated memory
>>   switchdev: split commit and prepare phase into two callbacks
>
>Patches compile, but first test bombs.  Cut-and-paste of dump at end
>of this email.

Told you :)


>
>I'm not sure I'm liking this patchset because it looks like a way for
>switchdev drivers to easily opt-out of the prepare-commit transaction
>model by simply not implementing the *_pre op.  I would rather drivers
>explicitly handle the PREPARE phase in code, even if that means
>skipping it gracefully (in code) with a comment (in code) explaining
>why it does not matter for this device/operation.  That's what DSA had
>done, mostly because it was a retro-fit.

Each driver should handle this inside it. If it does not need prepare
state, it simply does not implement it. That is the same for all cb,
ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
separate callbacks. Implementing multiple callback in one is just ugly,
sorry.


>
>Also, the patchset removes the ABORT callback in case of a rollback
>due to a failed PREPARE.  We can't make the assumption that it's just
>a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>reserved device space or staged an operation on the device which we'll
>need to undo on ABORT.

Yep, just register an item with custom destructor, there you can do
whatever. Also, I believe much nicer comparing to current code.


>
>So we need ABORT back, and we need PREPARE to not be optional, so
>what's left list enqueue/dequeue helpers, which I'm not seeing much
>value in up-leveling as the driver can do list_add/del itself.

Why would every driver do it itself, over and over when there can be a
clean infrastructure to do that. Including abort phase. Without the driver
needed to be involved.


>
>Am I missing something?  I didn't see a motivation statement for the
>RFC so I'm not sure where you wanted to take this.

I want to make current code much nicer, easier to read and implement in
other drivers. Look at rocker.c and how often there is == PREPARE there.
It's nearly impossible to followthe code, sorry.

My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is 
everywhere)


>
>-scott
>
>
>
>[1.998791] BUG: unable to handle kernel NULL pointer dereference
>at 0008
>[2.05] IP: []
>rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
>[2.05] PGD 0
>[2.05] Oops:  [#1] SMP
>[2.05] Modules linked in: floppy(+) ata_piix(+) libata
>rocker(+) virtio_pci(+) virtio_ring virtio scsi_mod
>[2.05] CPU: 0 PID: 91 Comm: modprobe Not tainted 4.2.0+ #3
>[2.05] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>[2.05] task: 88000f5e0800 ti: 88000f5f8000 task.ti:
>88000f5f8000
>[2.05] RIP: 0010:[]  []
>rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
>[2.05] RSP: 0018:88000f5fba20  EFLAGS: 00010246
>[2.05] RAX: 88000f17c050 RBX: 88000b40 RCX: 
>0020
>[2.05] RDX:  RSI:  RDI: 
>
>[2.05] RBP:  R08:  R09: 
>a0058680
>[2.05] R10: 0001 R11: 9319 R12: 
>
>[2.05] R13: 88000f5b7000 R14: 88000f5b7840 R15: 
>
>[2.05] FS:  7fd132237700() GS:88000fc0()
>knlGS:
>[2.05] CS:  0010 DS:  ES:  CR0: 8005003b
>[2.05] CR2: 0008 CR3: 0f5d5000 CR4: 
>06f0
>[2.05] Stack:
>[2.05]  a005a32a 0001 00ff88000f5b7000
>8175725a
>[2.05]  a0057750  a0058680
>0020001a
>[2.05]  88000b40  88000ea06000
>88000f5b7000
>[2.05] Call Trace:
>[2.05]  [] ? rocker_cmd_exec+0x5a/0x1f0 [rocker]
>[2.05]  [] ?
>rocker_cmd_get_port_stats_prep+0x90/0x90 [rocker]
>[2.05]  [] ?
>rocker_cmd_get_port_settings_phys_name_proc+0xb0/0xb0 [rocker]
>[2.05]  [] ? rocker_probe+0xb6a/0xe59 [rocker]
>[2.05]  [] ? local_pci_probe+0x48/0xa0
>[2.05]  [] ? pci_device_probe+0x112/0x130
>[2.05]  [] ? driver_probe_device+0x196/0x2a0
>[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
>[2.05]  [] ? __driver_attach+0x8d/0x90
>[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
>[2.05]  [] ? 

[PATCH] lib: fix data race in rhashtable_rehash_one

2015-09-21 Thread Dmitry Vyukov
rhashtable_rehash_one() uses plain writes to update entry->next,
while it is being concurrently accessed by readers.
Unfortunately, the compiler is within its rights to (for example) use
byte-at-a-time writes to update the pointer, which would fatally confuse
concurrent readers.

Use WRITE_ONCE to update entry->next in rhashtable_rehash_one().

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov 
---
KTSAN report for the record:

ThreadSanitizer: data-race in netlink_lookup

Atomic read at 0x880480443bd0 of size 8 by thread 2747 on CPU 11:
 [< inline >] rhashtable_lookup_fast include/linux/rhashtable.h:543
 [< inline >] __netlink_lookup net/netlink/af_netlink.c:1026
 [] netlink_lookup+0x134/0x1c0 net/netlink/af_netlink.c:1046
 [< inline >] netlink_getsockbyportid net/netlink/af_netlink.c:1616
 [] netlink_unicast+0x111/0x300 net/netlink/af_netlink.c:1812
 [] netlink_sendmsg+0x4c9/0x5f0 net/netlink/af_netlink.c:2443
 [< inline >] sock_sendmsg_nosec net/socket.c:610
 [] sock_sendmsg+0x83/0x90 net/socket.c:620
 [] ___sys_sendmsg+0x3cf/0x3e0 net/socket.c:1952
 [] __sys_sendmsg+0x4c/0xb0 net/socket.c:1986
 [< inline >] SYSC_sendmsg net/socket.c:1997
 [] SyS_sendmsg+0x30/0x50 net/socket.c:1993
 [] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188

Previous write at 0x880480443bd0 of size 8 by thread 213 on CPU 4:
 [< inline >] rhashtable_rehash_one lib/rhashtable.c:193
 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213
 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
 [] rht_deferred_worker+0x3b0/0x6d0 lib/rhashtable.c:373
 [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [] kthread+0x150/0x170 kernel/kthread.c:209
 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutexes locked by thread 213:
Mutex 217217 is locked here:
 [] mutex_lock+0x57/0x70 kernel/locking/mutex.c:108
 [] rht_deferred_worker+0x45/0x6d0 lib/rhashtable.c:363
 [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [] kthread+0x150/0x170 kernel/kthread.c:209
 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutex 431216 is locked here:
 [< inline >] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:149
 [] _raw_spin_lock_bh+0x65/0x80 kernel/locking/spinlock.c:175
 [< inline >] spin_lock_bh include/linux/spinlock.h:317
 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:212
 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
 [] rht_deferred_worker+0x1e6/0x6d0 lib/rhashtable.c:373
 [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [] kthread+0x150/0x170 kernel/kthread.c:209
 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutex 432766 is locked here:
 [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
 [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
 [< inline >] rhashtable_rehash_one lib/rhashtable.c:186
 [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213
 [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
 [] rht_deferred_worker+0x36b/0x6d0 lib/rhashtable.c:373
 [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [] kthread+0x150/0x170 kernel/kthread.c:209
 [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
---
 lib/rhashtable.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c697..978624d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -188,9 +188,12 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
unsigned int old_hash)
  new_tbl, new_hash);
 
if (rht_is_a_nulls(head))
-   INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
-   else
-   RCU_INIT_POINTER(entry->next, head);
+   head = (struct rhash_head *)rht_marker(ht, new_hash);
+   /* We don't insert any new nodes that were not previously accessible
+* to readers, so we don't need to use rcu_assign_pointer().
+* But entry is being concurrently accessed by readers, so we need to
+* use at least WRITE_ONCE. */
+   WRITE_ONCE(entry->next, head);
 
rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
spin_unlock(new_bucket_lock);
-- 
2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] net: Fix module autoload for OF platform drivers

2015-09-21 Thread David Miller
From: Luis de Bethencourt 
Date: Fri, 18 Sep 2015 17:53:32 +0200

> Hi,
> 
> These patches add the missing MODULE_DEVICE_TABLE() for OF to export
> the information so modules have the correct aliases built-in and
> autoloading works correctly.
> 
> A longer explanation by Javier Canillas can be found here:
> https://lkml.org/lkml/2015/7/30/519

ks8851 already has this issue fixed, thus rest of series
applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: send loss probe after 1s if no RTT available

2015-09-21 Thread David Miller
From: Yuchung Cheng 
Date: Fri, 18 Sep 2015 11:40:33 -0700

> This patch makes TLP to use 1 sec timer by default when RTT is
> not available due to SYN/ACK retransmission or SYN cookies.
> 
> Prior to this change, the lack of RTT prevents TLP so the first
> data packets sent can only be recovered by fast recovery or RTO.
> If the fast recovery fails to trigger the RTO is 3 second when
> SYN/ACK is retransmitted. With this patch we can trigger fast
> recovery in 1sec instead.
> 
> Note that we need to check Fast Open more properly. A Fast Open
> connection could be (accepted then) closed before it receives
> the final ACK of 3WHS so the state is FIN_WAIT_1. Without the
> new check, TLP will retransmit FIN instead of SYN/ACK.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Nandita Dukkipati 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Eric Dumazet 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: usec resolution SYN/ACK RTT

2015-09-21 Thread David Miller
From: Yuchung Cheng 
Date: Fri, 18 Sep 2015 11:36:14 -0700

> Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK
> RTT is often measured as 0ms or sometimes 1ms, which would affect
> RTT estimation and min RTT samping used by some congestion control.
> 
> This patch improves SYN/ACK RTT to be usec resolution if platform
> supports it. While the timestamping of SYN/ACK is done in request
> sock, the RTT measurement is carefully arranged to avoid storing
> another u64 timestamp in tcp_sock.
> 
> For regular handshake w/o SYNACK retransmission, the RTT is sampled
> right after the child socket is created and right before the request
> sock is released (tcp_check_req() in tcp_minisocks.c)
> 
> For Fast Open the child socket is already created when SYN/ACK was
> sent, the RTT is sampled in tcp_rcv_state_process() after processing
> the final ACK an right before the request socket is released.
> 
> If the SYN/ACK was retransmistted or SYN-cookie was used, we rely
> on TCP timestamps to measure the RTT. The sample is taken at the
> same place in tcp_rcv_state_process() after the timestamp values
> are validated in tcp_validate_incoming(). Note that we do not store
> TS echo value in request_sock for SYN-cookies, because the value
> is already stored in tp->rx_opt used by tcp_ack_update_rtt().
> 
> One side benefit is that the RTT measurement now happens before
> initializing congestion control (of the passive side). Therefore
> the congestion control can use the SYN/ACK RTT.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Eric Dumazet 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC net-next 3/4] net: VRF device: Initial IPv6 support

2015-09-21 Thread David Ahern
Start point for IPv6 support by the VRF device.

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 226 +-
 1 file changed, 225 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 9e3afe011396..ea08a280ae2e 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -54,6 +55,7 @@ struct slave_queue {
 struct net_vrf {
struct slave_queue  queue;
struct rtable   *rth;
+   struct rt6_info *rt6;
u32 tb_id;
 };
 
@@ -101,12 +103,49 @@ static struct dst_ops vrf_dst_ops = {
.default_advmss = vrf_default_advmss,
 };
 
+/* neighbor handling is done with actual device; do not want
+ * to flip skb->dev for those ndisc packets. This really fails
+ * for multiple next protocols (e.g., NEXTHDR_HOP). But it is
+ * a start.
+ */
+static bool check_ipv6_frame(const struct sk_buff *skb)
+{
+   const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data;
+   size_t hlen = sizeof(*ipv6h);
+   bool rc = true;
+
+   if (skb->len < hlen)
+   return false;
+
+   if (ipv6h->nexthdr == NEXTHDR_ICMP) {
+   const struct icmp6hdr *icmph;
+
+   if (skb->len < hlen + sizeof(*icmph))
+   goto out;
+
+   icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h));
+   switch (icmph->icmp6_type) {
+   case NDISC_ROUTER_SOLICITATION:
+   case NDISC_ROUTER_ADVERTISEMENT:
+   case NDISC_NEIGHBOUR_SOLICITATION:
+   case NDISC_NEIGHBOUR_ADVERTISEMENT:
+   case NDISC_REDIRECT:
+   rc = false;
+   break;
+   }
+   }
+
+out:
+   return rc;
+}
+
 static bool is_ip_rx_frame(struct sk_buff *skb)
 {
switch (skb->protocol) {
case htons(ETH_P_IP):
-   case htons(ETH_P_IPV6):
return true;
+   case htons(ETH_P_IPV6):
+   return check_ipv6_frame(skb);
}
return false;
 }
@@ -169,6 +208,37 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct 
net_device *dev,
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
   struct net_device *dev)
 {
+   const struct ipv6hdr *iph = ipv6_hdr(skb);
+   struct net *net = dev_net(skb->dev);
+   struct flowi6 fl6 = {
+   /* needed to match OIF rule */
+   .flowi6_oif = dev->ifindex,
+   .flowi6_iif = LOOPBACK_IFINDEX,
+   .daddr = iph->daddr,
+   .saddr = iph->saddr,
+   .flowlabel = ip6_flowinfo(iph),
+   .flowi6_mark = skb->mark,
+   .flowi6_proto = iph->nexthdr,
+   .flowi6_flags = FLOWI_FLAG_L3MDEV_SRC,
+   };
+   int ret = NET_XMIT_DROP;
+   struct dst_entry *dst;
+
+   dst = ip6_route_output(net, NULL, );
+   if (dst == (struct dst_entry *)net->ipv6.ip6_null_entry)
+   goto err;
+
+   skb_dst_drop(skb);
+   skb_dst_set(skb, dst);
+
+   ret = ip6_local_out(skb);
+   if (unlikely(net_xmit_eval(ret)))
+   dev->stats.tx_errors++;
+   else
+   ret = NET_XMIT_SUCCESS;
+
+   return ret;
+err:
vrf_tx_error(dev, skb);
return NET_XMIT_DROP;
 }
@@ -265,6 +335,122 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct 
net_device *dev)
return ret;
 }
 
+static struct dst_entry *vrf_ip6_check(struct dst_entry *dst, u32 cookie)
+{
+   return dst;
+}
+
+static int vrf_ip6_local_out(struct sk_buff *skb)
+{
+   return ip6_local_out(skb);
+}
+
+static struct dst_ops vrf_dst_ops6 = {
+   .family = AF_INET6,
+   .local_out  = vrf_ip6_local_out,
+   .check  = vrf_ip6_check,
+   .mtu= vrf_v4_mtu,
+   .destroy= vrf_dst_destroy,
+   .default_advmss = vrf_default_advmss,
+};
+
+static int vrf_input6(struct sk_buff *skb)
+{
+   pr_err("vrf_input6: WTF!?!?\n");
+   kfree_skb(skb);
+   return 0;
+}
+
+/* modelled after ip6_finish_output2 */
+static int vrf_finish_output6(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+   struct dst_entry *dst = skb_dst(skb);
+   struct net_device *dev = dst->dev;
+   struct neighbour *neigh;
+   struct in6_addr *nexthop;
+   int ret;
+
+   skb->protocol = htons(ETH_P_IPV6);
+   skb->dev = dev;
+
+   rcu_read_lock_bh();
+   nexthop = rt6_nexthop((struct rt6_info *)dst, _hdr(skb)->daddr);
+   neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
+   if (unlikely(!neigh))
+   neigh = __neigh_create(_tbl, nexthop, dst->dev, false);
+   if (!IS_ERR(neigh)) {
+   ret = 

[RFC net-next 2/4] net: Remove use of IFF_SLAVE with L3 devices

2015-09-21 Thread David Ahern
Use of IFF_SLAVE flag causes problems with IPv6. addrconf_notify does
not respond to netdev events for devices with IFF_SLAVE set. This breaks
DAD, neighbor discovery and spirals to non-working death for IPv6.

L3 master devices will have IFF_MASTER and IFF_L3MDEV set.
L3 slave devices will only have IFF_L3MDEV set.

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 2 --
 include/linux/netdevice.h | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index bf48c8b448fc..9e3afe011396 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -430,7 +430,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
if (ret < 0)
goto out_unregister;
 
-   port_dev->flags |= IFF_SLAVE;
port_dev->priv_flags |= IFF_L3MDEV;
__vrf_insert_slave(queue, slave);
cycle_netdev(port_dev);
@@ -460,7 +459,6 @@ static int do_vrf_del_slave(struct net_device *dev, struct 
net_device *port_dev)
struct slave *slave;
 
netdev_upper_dev_unlink(port_dev, dev);
-   port_dev->flags &= ~IFF_SLAVE;
port_dev->priv_flags &= ~IFF_L3MDEV;
 
netdev_rx_handler_unregister(port_dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ae95d922a569..5ae287d1e3fe 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3831,7 +3831,7 @@ static inline bool netif_is_l3_master(const struct 
net_device *dev)
 
 static inline bool netif_is_l3_slave(const struct net_device *dev)
 {
-   return dev->flags & IFF_SLAVE && dev->priv_flags & IFF_L3MDEV;
+   return !(dev->flags & IFF_MASTER) && dev->priv_flags & IFF_L3MDEV;
 }
 
 static inline bool netif_is_bridge_master(const struct net_device *dev)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC net-next 0/4] net: VRF support in IPv6 stack

2015-09-21 Thread David Ahern
Initial support for VRFs in IPv6 stack. Patches apply on top of the L3
Master Device patches sent on Friday:
http://www.spinics.net/lists/netdev/msg343533.html

All patches can be found here
github.com/dsahern/linux.git vrf/ipv6-l3mdev-rfc1

David Ahern (4):
  l3mdev: ipv6 support
  net: Remove use of IFF_SLAVE with L3 devices
  net: VRF device: Initial IPv6 support
  net: ipv6: Initial support for VRFs

 drivers/net/vrf.c | 228 +-
 include/linux/netdevice.h |   2 +-
 include/net/l3mdev.h  |  43 +
 net/ipv6/addrconf.c   |   4 +-
 net/ipv6/datagram.c   |   4 +
 net/ipv6/icmp.c   |   6 +-
 net/ipv6/ip6_fib.c|   1 +
 net/ipv6/ip6_output.c |   6 +-
 net/ipv6/ndisc.c  |   9 +-
 net/ipv6/route.c  |  17 +++-
 10 files changed, 308 insertions(+), 12 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC net-next 4/4] net: ipv6: Initial support for VRFs

2015-09-21 Thread David Ahern
Add basic support for VRFs to IPv6 stack. This is a good start point.
ping to and from a VRF works. Basic tcp and udp clients and server all
work fine with VRFs.

Signed-off-by: David Ahern 
---
 net/ipv6/addrconf.c   |  4 +++-
 net/ipv6/datagram.c   |  4 
 net/ipv6/icmp.c   |  6 +-
 net/ipv6/ip6_fib.c|  1 +
 net/ipv6/ip6_output.c |  6 --
 net/ipv6/ndisc.c  |  9 +++--
 net/ipv6/route.c  | 17 +++--
 7 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 75d3dde32c69..f4677a9c01ac 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -81,6 +81,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2179,8 +2180,9 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
struct fib6_node *fn;
struct rt6_info *rt = NULL;
struct fib6_table *table;
+   u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
 
-   table = fib6_get_table(dev_net(dev), RT6_TABLE_PREFIX);
+   table = fib6_get_table(dev_net(dev), tb_id);
if (!table)
return NULL;
 
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 9aadd57808a5..11980ee57507 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -142,6 +142,10 @@ static int __ip6_datagram_connect(struct sock *sk, struct 
sockaddr *uaddr, int a
err = -EINVAL;
goto out;
}
+   } else if (sk->sk_bound_dev_if &&
+  netif_index_is_l3_master(sock_net(sk),
+   sk->sk_bound_dev_if)) {
+   fl6.flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
}
 
sk->sk_v6_daddr = *daddr;
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 6c2b2132c8d3..efb1c00f2270 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -496,6 +497,9 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info)
else if (!fl6.flowi6_oif)
fl6.flowi6_oif = np->ucast_oif;
 
+   if (!fl6.flowi6_oif)
+   fl6.flowi6_oif = l3mdev_master_ifindex(skb->dev);
+
dst = icmpv6_route_lookup(net, skb, sk, );
if (IS_ERR(dst))
goto out;
@@ -575,7 +579,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
fl6.daddr = ipv6_hdr(skb)->saddr;
if (saddr)
fl6.saddr = *saddr;
-   fl6.flowi6_oif = skb->dev->ifindex;
+   fl6.flowi6_oif = l3mdev_fib_oif(skb->dev);
fl6.fl6_icmp_type = ICMPV6_ECHO_REPLY;
fl6.flowi6_mark = mark;
security_skb_classify_flow(skb, flowi6_to_flowi());
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 418d9823692b..318cf5a34ca5 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -259,6 +259,7 @@ struct fib6_table *fib6_get_table(struct net *net, u32 id)
 
return NULL;
 }
+EXPORT_SYMBOL_GPL(fib6_get_table);
 
 static void __net_init fib6_tables_init(struct net *net)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 291a07be5dfb..bbd752cef5c2 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int ip6_finish_output2(struct sock *sk, struct sk_buff *skb)
 {
@@ -874,7 +875,8 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
 #ifdef CONFIG_IPV6_SUBTREES
ip6_rt_check(>rt6i_src, >saddr, np->saddr_cache) ||
 #endif
-   (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex)) {
+  (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC) &&
+ (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex))) {
dst_release(dst);
dst = NULL;
}
@@ -1026,7 +1028,7 @@ struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, 
struct flowi6 *fl6,
if (final_dst)
fl6->daddr = *final_dst;
if (!fl6->flowi6_oif)
-   fl6->flowi6_oif = dst->dev->ifindex;
+   fl6->flowi6_oif = l3mdev_fib_oif(dst->dev);
 
return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 
0);
 }
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index dde5a1e5875a..278627b01283 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -147,6 +148,7 @@ struct neigh_table nd_tbl = {
.gc_thresh2 =512,
.gc_thresh3 =   1024,
 };
+EXPORT_SYMBOL_GPL(nd_tbl);
 
 static void ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data)
 {
@@ -441,8 +443,9 @@ static void ndisc_send_skb(struct sk_buff *skb,
 
if (!dst) {
struct flowi6 fl6;
+   int oif = l3mdev_fib_oif(skb->dev);
 
-   icmpv6_flow_init(sk, , type, saddr, daddr, 

[RFC net-next 1/4] l3mdev: ipv6 support

2015-09-21 Thread David Ahern
Add lookup of cached IPv6 route to l3mdev operations.

Signed-off-by: David Ahern 
---
 include/net/l3mdev.h | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 8befd629f8ac..2ee593662ef4 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -19,12 +19,16 @@
  * @l3mdev_fib_table: Get FIB table id to use for lookups
  *
  * @l3dev_get_rtable: Get cached IPv4 rtable (dst_entry) for device
+ *
+ * @l3dev_rt6_dst:Get cached IPv6 rt6_info (dst_entry) for device
  */
 
 struct l3mdev_ops {
u32 (*l3mdev_fib_table)(const struct net_device *dev);
struct rtable * (*l3mdev_get_rtable)(const struct net_device *dev,
 const struct flowi4 *fl4);
+   struct dst_entry * (*l3mdev_rt6_dst)(const struct net_device *dev,
+const struct flowi6 *fl6);
 };
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
@@ -84,6 +88,33 @@ static inline struct rtable *l3mdev_get_rtable(const struct 
net_device *dev,
return NULL;
 }
 
+/* netif_is_l3_master already checked by caller */
+static inline struct dst_entry *l3mdev_rt6_dst(const struct net_device *dev,
+  const struct flowi6 *fl6)
+{
+   if (dev->l3mdev_ops->l3mdev_rt6_dst)
+   return dev->l3mdev_ops->l3mdev_rt6_dst(dev, fl6);
+
+   return NULL;
+}
+
+static inline
+struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
+   const struct flowi6 *fl6)
+{
+   struct dst_entry *dst = NULL;
+   struct net_device *dev;
+
+   dev = dev_get_by_index(net, fl6->flowi6_oif);
+   if (dev) {
+   if (netif_is_l3_master(dev))
+   dst = l3mdev_rt6_dst(dev, fl6);
+   dev_put(dev);
+   }
+
+   return dst;
+}
+
 static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 {
struct net_device *dev;
@@ -141,6 +172,18 @@ static inline struct rtable *l3mdev_get_rtable(const 
struct net_device *dev,
 {
return NULL;
 }
+static inline
+struct dst_entry *l3mdev_rt6_dst(const struct net_device *dev,
+const struct flowi6 *fl6)
+{
+   return NULL;
+}
+static inline
+struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
+   const struct flowi6 *fl6)
+{
+   return NULL;
+}
 
 static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] tcp/dccp: fix timewait races in timer handling

2015-09-21 Thread David Miller
From: Eric Dumazet 
Date: Sat, 19 Sep 2015 09:08:34 -0700

> From: Eric Dumazet 
> 
> When creating a timewait socket, we need to arm the timer before
> allowing other cpus to find it. The signal allowing cpus to find
> the socket is setting tw_refcnt to non zero value.
> 
> As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
> call inet_twsk_schedule() first.
> 
> This also means we need to remove tw_refcnt changes from
> inet_twsk_schedule() and let the caller handle it.
> 
> Note that because we use mod_timer_pinned(), we have the guarantee
> the timer wont expire before we set tw_refcnt as we run in BH context.
> 
> To make things more readable I introduced inet_twsk_reschedule() helper.
> 
> When rearming the timer, we can use mod_timer_pending() to make sure
> we do not rearm a canceled timer.
> 
> Note: This bug can possibly trigger if packets of a flow can hit
> multiple cpus. This does not normally happen, unless flow steering
> is broken somehow. This explains this bug was spotted ~5 months after
> its introduction.
> 
> A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
> but will be provided in a separate patch for proper tracking.
> 
> Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
> Signed-off-by: Eric Dumazet 
> Reported-by: Ying Cai 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] inet: fix races in reqsk_queue_hash_req()

2015-09-21 Thread David Miller
From: Eric Dumazet 
Date: Sat, 19 Sep 2015 09:48:04 -0700

> From: Eric Dumazet 
> 
> Before allowing lockless LISTEN processing, we need to make
> sure to arm the SYN_RECV timer before the req socket is visible
> in hash tables.
> 
> Also, req->rsk_hash should be written before we set rsk_refcnt
> to a non zero value.
> 
> Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
> Signed-off-by: Eric Dumazet 
> Cc: Ying Cai 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 5/7] rocker: add FDB cleanup timer

2015-09-21 Thread David Miller
From: sfel...@gmail.com
Date: Sun, 20 Sep 2015 08:48:27 -0700

> From: Scott Feldman 
> 
> Add a timer to each rocker switch to do FDB entry cleanup by ageing out
> expired entries.  The timer scheduling algo is copied from the bridge
> driver, for the most part, to keep the firing of the timer to a minimum.
> 
> Signed-off-by: Scott Feldman 
> Acked-by: Jiri Pirko 

You need to del_timer_sync() or similar on this timer in
rocker_remove().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: build failure after merge of the bluetooth tree

2015-09-21 Thread Stephen Rothwell
Hi Gustavo,

On Mon, 14 Sep 2015 10:22:34 +1000 Stephen Rothwell  
wrote:
>
> On Mon, 14 Sep 2015 10:14:28 +1000 Stephen Rothwell  
> wrote:
> >
> > I applied the patches that Andrew has had in his post merge series
> > (but I think you were sent a rolled up version):
> 
> Actually it was sent by Alexander to Marcel:
> 
> From: Alexander Aring 
> To: mar...@holtmann.org
> Cc: Andrew Morton ,
>   Stephen Rothwell ,
>   Alexander Aring ,
>   Stefan Schmidt 
> Subject: [PATCH bluetooth-next] drivers/net/ieee802154/at86rf230.c: 
> seq_printf() now returns NULL
> Date: Fri, 11 Sep 2015 11:23:30 +0200
> Message-Id: <1441963410-24844-1-git-send-email-alex.ar...@gmail.com>
> X-Mailer: git-send-email 2.5.1
> 
> From: Andrew Morton 
> 
> I will shortly be sending
> http://ozlabs.org/~akpm/mmots/broken-out/fs-seq_file-convert-int-seq_vprint-seq_printf-etc-returns-to-void.patch
> to Linus.  This will cause the linux-next version of
> drivers/net/ieee802154/at86rf230.c to break at compilation time.
> 
> Below is the fix.  I suggest you apply this immediately.
> 
> Otherwise I'll try to remember to send this in after Alexander's
> 890acf8330cac is merged.  But there will be a window during which the
> build fails, and we'll get emails...
> 
> From: Stephen Rothwell 
> Subject: drivers/net/ieee802154/at86rf230.c: seq_printf() now returns NULL

OK, this is now a problem for the net-next tree since the bluetooth
tree was merged there :-(

Can someone please apply this patch?

Hi Dave,

An x64_64 allmodconfig build after merging the next-next tree breaks in
linux-next due to the patch below not being applied to the bluetooth
tree.  I have been applying the equivalent to the bluetooth tree merge
in linux-next for a while now.

[Patch repeated for Dave - this is from and email from Andrew via
Alexander to Marcel which I forwarded to Gustavo]

From: Stephen Rothwell 
Subject: drivers/net/ieee802154/at86rf230.c: seq_printf() now returns NULL

Signed-off-by: Stephen Rothwell 
Cc: Alexander Aring 
Cc: Stefan Schmidt 
Cc: Marcel Holtmann 
Signed-off-by: Andrew Morton 
Signed-off-by: Alexander Aring 
---
 drivers/net/ieee802154/at86rf230.c | 35 ++-
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ieee802154/at86rf230.c 
b/drivers/net/ieee802154/at86rf230.c
index b8b0628..9756e64 100644
--- a/drivers/net/ieee802154/at86rf230.c
+++ b/drivers/net/ieee802154/at86rf230.c
@@ -1645,32 +1645,17 @@ static struct dentry *at86rf230_debugfs_root;
 static int at86rf230_stats_show(struct seq_file *file, void *offset)
 {
struct at86rf230_local *lp = file->private;
-   int ret;
-
-   ret = seq_printf(file, "SUCCESS:\t\t%8llu\n", lp->trac.success);
-   if (ret < 0)
-   return ret;
-
-   ret = seq_printf(file, "SUCCESS_DATA_PENDING:\t%8llu\n",
-lp->trac.success_data_pending);
-   if (ret < 0)
-   return ret;
-
-   ret = seq_printf(file, "SUCCESS_WAIT_FOR_ACK:\t%8llu\n",
-lp->trac.success_wait_for_ack);
-   if (ret < 0)
-   return ret;
-
-   ret = seq_printf(file, "CHANNEL_ACCESS_FAILURE:\t%8llu\n",
-lp->trac.channel_access_failure);
-   if (ret < 0)
-   return ret;
 
-   ret = seq_printf(file, "NO_ACK:\t\t\t%8llu\n", lp->trac.no_ack);
-   if (ret < 0)
-   return ret;
-
-   return seq_printf(file, "INVALID:\t\t%8llu\n", lp->trac.invalid);
+   seq_printf(file, "SUCCESS:\t\t%8llu\n", lp->trac.success);
+   seq_printf(file, "SUCCESS_DATA_PENDING:\t%8llu\n",
+  lp->trac.success_data_pending);
+   seq_printf(file, "SUCCESS_WAIT_FOR_ACK:\t%8llu\n",
+  lp->trac.success_wait_for_ack);
+   seq_printf(file, "CHANNEL_ACCESS_FAILURE:\t%8llu\n",
+  lp->trac.channel_access_failure);
+   seq_printf(file, "NO_ACK:\t\t\t%8llu\n", lp->trac.no_ack);
+   seq_printf(file, "INVALID:\t\t%8llu\n", lp->trac.invalid);
+   return 0;
 }
 
 static int at86rf230_stats_open(struct inode *inode, struct file *file)
-- 
2.5.1

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.1.0, kernel panic, pppoe_release

2015-09-21 Thread Denys Fedoryshchenko

Hi,
Sorry for late reply, was not able to push new kernel on pppoes without 
permissions (it's production servers), just got OK.


I am testing patch on another pppoe server with 9k users, for ~3 days, 
seems fine. I will test today

also on server that was experiencing crashes within 1 day.

On 2015-09-10 18:56, Guillaume Nault wrote:

On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:

Probably my knowledge of kernel is not sufficient, but i will try few
approaches.
One of them to add to pppoe_unbind_sock_work:

pppox_unbind_sock(sk);
+/* Signal the death of the socket. */
+sk->sk_state = PPPOX_DEAD;


I don't believe this will fix anything. pppox_unbind_sock() already
sets sk->sk_state when necessary.

I will wait first, to make sure this patch was causing kernel panic 
(it

needs 24h testing cycle), then i will try this fix.


I suspect the problem goes with actions performed on the underlying
interface (MAC address, MTU or link state update). This triggers
pppoe_flush_dev(), which cleans up the device without announcing it
in sk->sk_state.

Can you pleas try the following patch?

---
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
if (po->pppoe_dev == dev &&
 			sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) 
{

pppox_unbind_sock(sk);
-   sk->sk_state = PPPOX_ZOMBIE;
sk->sk_state_change(sk);
po->pppoe_dev = NULL;
dev_put(dev);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: build failure after merge of the bluetooth tree

2015-09-21 Thread David Miller
From: Stephen Rothwell 
Date: Tue, 22 Sep 2015 11:20:15 +1000

> From: Stephen Rothwell 
> Subject: drivers/net/ieee802154/at86rf230.c: seq_printf() now returns NULL
> 
> Signed-off-by: Stephen Rothwell 
> Cc: Alexander Aring 
> Cc: Stefan Schmidt 
> Cc: Marcel Holtmann 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Alexander Aring 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Mark Rustad
On 9/21/15 9:14 PM, Jarod Wilson wrote:
> Just switching to adapter->io_addr everywhere seems to not work as 
> noted above. :\ Note that I'm also chasing this from the other end 
> with the author of the pci patches that seem to have triggered this, 
> so the real bug might be over in pci-land, but hardening against 
> explosions in igb still seems like a worthwhile effort here.

My understanding is that there can be problems if too many writes to a
removed device happen. That is why ixgbe avoids doing that by testing
for removal in some places. The io_addr does get used in the transmit
path simply to avoid adding a test to that hot path. That approach
seems to be working well for ixgbe.



signature.asc
Description: OpenPGP digital signature


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Jarod Wilson

Alexander Duyck wrote:

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues such as
this, where the disconnect happened just before igb_configure_tx_ring():

[ 414.440115] igb :15:00.0: enabling device ( -> 0002)
[ 414.474934] pps pps0: new PPS source ptp1
[ 414.474937] igb :15:00.0: added PHC on eth0
[ 414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network
Connection
[ 414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1)
e8:ea:6a:00:1b:2a
[ 414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[ 414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s),
4 tx queue(s)
[ 414.478453] igb :15:00.0 enp21s0: renamed from eth0
[ 414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[ 414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now
detached
[ 414.854808] BUG: unable to handle kernel paging request at
3818
[ 414.854827] IP: []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.854846] PGD 0
[ 414.854849] Oops: 0002 [#1] SMP
[ 414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t
igb dca ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iptable_filter bnep dm_mirror
dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel
[ 414.855073] drbg ansi_cprng snd_hda_codec_realtek
snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd snd_hda_intel snd_hda_codec microcode snd_hda_core
snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi uvcvideo btusb
cfg80211 videobuf2_vmalloc videobuf2_memops btrtl btbcm videobuf2_core
btintel bluetooth v4l2_common snd_timer videodev snd parport_pc
rtsx_pci_ms joydev pcspkr input_leds i2c_i801 media sg memstick rfkill
soundcore lpc_ich 8250_fintek parport mei_me hp_accel ie31200_edac
shpchp lis3lv02d mei edac_core input_polldev hp_wireless tpm_infineon
sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e
drm_kms_helper
[ 414.855309] ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[ 414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[ 414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS
M70 Ver. 01.07 02/26/2015
[ 414.855365] task: 880484698c00 ti: 88005859c000 task.ti:
88005859c000
[ 414.855380] RIP: 0010:[] []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.855401] RSP: 0018:88005859f608 EFLAGS: 00010246
[ 414.855410] RAX: 3818 RBX:  RCX:
3818
[ 414.855424] RDX:  RSI: 0008 RDI:
002a9fe6
[ 414.855437] RBP: 88005859f638 R08: 03030300 R09:
ffe7
[ 414.855451] R10: 81fa91b4 R11: 07e3 R12:

[ 414.855464] R13: 880471c98840 R14: 8804670a1180 R15:
000483cce000
[ 414.855478] FS: 7f389c6fb8c0() GS:88049dc8()
knlGS:
[ 414.855493] CS: 0010 DS:  ES:  CR0: 80050033
[ 414.855504] CR2: 3818 CR3: 0004875da000 CR4:
001406e0
[ 414.855518] Stack:
[ 414.855520] 88005859f638 880471c98840 880471c98df8
0001
[ 414.855538] 880471c98848 0001 88005859f698
a0b99cb0
[ 414.85] 88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f
f5454218094e72d1
[ 414.855572] Call Trace:
[ 414.855577] [] igb_configure+0x240/0x400 [igb]
[ 414.855590] [] __igb_open+0xc2/0x560 [igb]
[ 414.855602] [] ? notifier_call_chain+0x4d/0x80
[ 414.855614] [] igb_open+0x10/0x20 [igb]
[ 414.855625] [] __dev_open+0xb1/0x130
[ 414.855636] [] __dev_change_flags+0xa1/0x160
[ 414.855647] [] dev_change_flags+0x29/0x60
[ 414.855658] [] do_setlink+0x5d3/0xaa0
[ 414.855679] [] ? nla_parse+0xa3/0x100
[ 414.855689] [] rtnl_newlink+0x4f0/0x880
[ 414.855700] [] ? rtnl_newlink+0xf3/0x880
[ 414.855721] [] ? netlink_unicast+0x1ae/0x220
[ 414.855734] [] ? security_capable+0x48/0x60
[ 414.855746] [] ? ns_capable+0x2d/0x60
[ 414.855756] [] rtnetlink_rcv_msg+0x95/0x240
[ 414.855768] [] ? sock_has_perm+0x70/0x90
[ 414.855779] [] ? rtnetlink_rcv+0x40/0x40
[ 414.855789] [] netlink_rcv_skb+0xaf/0xc0
[ 414.855800] [] rtnetlink_rcv+0x2c/0x40
[ 

Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Pravin Shelar
On Mon, Sep 21, 2015 at 8:21 PM, Eric Dumazet  wrote:
> On Mon, 2015-09-21 at 19:49 -0700, Pravin Shelar wrote:
>> On Mon, Sep 21, 2015 at 7:14 PM, Eric Dumazet  wrote:
>> > On Mon, 2015-09-21 at 18:04 -0700, Pravin Shelar wrote:
>> >> On Mon, Sep 21, 2015 at 5:14 PM, David Miller  wrote:
>> >> > From: Pravin B Shelar 
>> >> > Date: Sun, 20 Sep 2015 23:53:17 -0700
>> >> >
>> >> >> VXLAN device can receive skb with checksum partial. But the checksum
>> >> >> offset could be in outer header which is pulled on receive.
>> >> >
>> >> > Such a scenerio is a bug.
>> >> >
>> >> > Anything that pulls off a header should use a utility function such
>> >> > as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
>> >> > fixed up properly.
>> >>
>> >> skb_postpull_rcsum() does not change checksum-offset. vxlan receive
>> >> already calls this function.
>> >
>> > Then the bug is here.
>> >
>> > Otherwise we might have to 'fix' other places.
>> >
>> I posted a patch to fix skb_postpull_rcsum() to handle this case. But
>> that was not accepted.
>> https://patchwork.ozlabs.org/patch/512625/
>>
>> And specific solution for skb_checksum_help() was suggested.
>>
>> http://marc.info/?l=linux-netdev=144108078931774=2
>
> If we pull a header where the csum is, then for sure CHECKSUM_PARTIAL
> becomes buggy and void.
>
> Tom was not advocating doing an operation (skb_postpull_rcsum()) leaving
> skb in a wrong state.
>
> We should fix callers that are pulling header in such a way.
>

So same as first patch but set skb checksum flag to CHECKSUM_NONE?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull request: bluetooth-next 2015-09-18

2015-09-21 Thread David Miller
From: Johan Hedberg 
Date: Fri, 18 Sep 2015 13:54:55 +0300

> Here's the first bluetooth-next pull request for the 4.4 kernel:
> 
>  - ieee802154 cleanups & fixes
>  - debugfs support for the at86rf230 driver
>  - Support for quirky (seemingly counterfeit) CSR Bluetooth controllers
>  - Power management and device config improvements for Intel controllers
>  - Fix for devices with incorrect advertising data length
>  - Fix for closing HCI user channel socket
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] net: irda: pxaficp_ir: use sched_clock() for time management

2015-09-21 Thread David Miller
From: Robert Jarzmik 
Date: Fri, 18 Sep 2015 18:36:56 +0200

> Which brings me to wonder which is the more correct :
>  (a) replace to reproduce the same calculation
>  Previously mtt was compared to a difference of 76ns steps (as 307ns / 4 =
>  76ns):
>  while ((sched_clock() - si->last_clk) * 76 < mtt)
> 
>  (b) change the calculation assuming mtt is in microseconds :
>  while ((sched_clock() - si->last_clk) * 1000 < mtt)
> 
> I have no IRDA protocol knowledge so unless someone points me to the correct
> calculation I'll try my luck with (b).

"a" would be "safer" and less likely to break anything, although as
you say "b" might be more correct.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-21 Thread Aaron Conole
AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
is set.

This is referenced in kernel bugzilla #12323 @
https://bugzilla.kernel.org/show_bug.cgi?id=12323

As described both in the BZ and lkml thread @
http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
AF_UNIX socket only reads a single skb, where the desired effect is
to return as much skb data has been queued, until hitting the recv
buffer size (whichever comes first).

The modified MSG_PEEK path will now move to the next skb in the tree
and jump to the again: label, rather than following the natural loop
structure. This requires duplicating some of the loop head actions.

This was tested using the python socketpair python code attached to
the bugzilla issue.

Signed-off-by: Aaron Conole 
---
 net/unix/af_unix.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 03ee4d3..f8ef53f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2179,9 +2179,24 @@ unlock:
if (UNIXCB(skb).fp)
scm.fp = scm_fp_dup(UNIXCB(skb).fp);
 
-   sk_peek_offset_fwd(sk, chunk);
+   if (skip) {
+   sk_peek_offset_fwd(sk, chunk);
+   skip -= chunk;
+   }
 
-   break;
+   if (UNIXCB(skb).fp)
+   break;
+
+   /* XXX - this is ugly; a better approach would be
+* rewriting this function
+*/
+   last = skb;
+   last_len = skb->len;
+   unix_state_lock(sk);
+   skb = skb_peek_next(skb, >sk_receive_queue);
+   if (skb)
+   goto again;
+   goto unlock;
}
} while (size);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-21 Thread Aaron Conole
This patch set implements a bugfix for kernel.org bugzilla #12323, allowing
MSG_PEEK to return all queued data on the unix domain socket, not just the
data contained in a single SKB. 

This is the v3 version of this patch, which includes a suggested modification
by Eric Dumazet to convert the unix_sk() conversion macro to a static inline
function. These patches are independent and can be applied separately.

This set was tested over a 24-hour period, utilizing a loop continually 
executing the bugzilla issue attached python code. It was instrumented with
a pr_err_once() ([   13.798683] unix: went there at least one time).

Aaron Conole (2):
  [net] af_unix: Convert the unix_sk macro to an inline function for
type safety
  [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK
flag

 include/net/af_unix.h |  6 +-
 net/unix/af_unix.c| 19 +--
 2 files changed, 22 insertions(+), 3 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Vivien Didelot
Hi Jiri,

On Sep. Monday 21 (39) 08:25 PM, Jiri Pirko wrote:
> Mon, Sep 21, 2015 at 07:13:58PM CEST, vivien.dide...@savoirfairelinux.com 
> wrote:
> >Hi Jiri, Scott,
> >
> >On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
> >> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
> >> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> >> >> Jiri Pirko (6):
> >> >>   switchdev: rename "trans" to "trans_ph".
> >> >>   switchdev: introduce transaction infrastructure for attr_set and
> >> >> obj_add
> >> >>   rocker: switch to local transaction phase enum
> >> >>   switchdev: move transaction phase enum under transaction structure
> >> >>   rocker: use switchdev transaction queue for allocated memory
> >> >>   switchdev: split commit and prepare phase into two callbacks
> >> >
> >> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
> >> >of this email.
> >> 
> >> Told you :)
> >> 
> >> 
> >> >
> >> >I'm not sure I'm liking this patchset because it looks like a way for
> >> >switchdev drivers to easily opt-out of the prepare-commit transaction
> >> >model by simply not implementing the *_pre op.  I would rather drivers
> >> >explicitly handle the PREPARE phase in code, even if that means
> >> >skipping it gracefully (in code) with a comment (in code) explaining
> >> >why it does not matter for this device/operation.  That's what DSA had
> >> >done, mostly because it was a retro-fit.
> >> 
> >> Each driver should handle this inside it. If it does not need prepare
> >> state, it simply does not implement it. That is the same for all cb,
> >> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
> >> separate callbacks. Implementing multiple callback in one is just ugly,
> >> sorry.
> >
> >This is true, (in DSA) we don't have to implement the prepare phase if
> >we fully support the feature in hardware.
> >
> >To give a real example, Marvell switch drivers currently implement all
> >add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
> >prepare phase needed.
> >
> >Now, I have local patches to enable strict 802.1Q mode in these switches
> >(all the logic is based on the hardware VLAN table). But it does not use
> >per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
> >why we need to push the feature checking down to the drivers in DSA.
> >
> >I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
> >code will return -EOPNOTSUPP if the given VID is 0.
> >
> >Another example: mv88e6xxx support tagged VLANs, so no hardware check
> >needed. But the Broadcom Starfighter 2 only supports port-based VLANs
> >(which is today wrongly implemented through "bridge_join/leave"). By
> >implementing .port_vlan_pre_add (another pending patch for DSA), the
> >driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.
> >
> >Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
> >and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
> >the abort phase (calling each destructor) and getting rid of the
> >SWITCHDEV_TRANS_* flags sounds better to me.
> 
> Agree, if pre/commit is going to be in one function, we should have
> only prepare/commit enums. It can be carried around as a single bool
> value in switchdev_trans structure. Will include that in my transaction
> patchset.
> 
> 
> >
> >> >Also, the patchset removes the ABORT callback in case of a rollback
> >> >due to a failed PREPARE.  We can't make the assumption that it's just
> >> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
> >> >reserved device space or staged an operation on the device which we'll
> >> >need to undo on ABORT.
> >> 
> >> Yep, just register an item with custom destructor, there you can do
> >> whatever. Also, I believe much nicer comparing to current code.
> >> 
> >> 
> >> >
> >> >So we need ABORT back, and we need PREPARE to not be optional, so
> >> >what's left list enqueue/dequeue helpers, which I'm not seeing much
> >> >value in up-leveling as the driver can do list_add/del itself.
> >> 
> >> Why would every driver do it itself, over and over when there can be a
> >> clean infrastructure to do that. Including abort phase. Without the driver
> >> needed to be involved.
> >
> >Maybe the term ".destructor" has a too strong meaning to deallocation,
> >but you can indeed do whatever you need in this function.
> 
> It is a destructor. Don't know about a better name, suggestions?

Nope, I'm personally fine with this term.

> >
> >> >Am I missing something?  I didn't see a motivation statement for the
> >> >RFC so I'm not sure where you wanted to take this.
> >> 
> >> I want to make current code much nicer, easier to read and implement in
> >> other drivers. Look at rocker.c and how often there is == PREPARE there.
> >> It's nearly impossible to followthe code, sorry.
> >> 
> >> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is
> >> everywhere)
> >

[PATCH net-next] Driver: Vmxnet3: Extend register dump support

2015-09-21 Thread Shrikrishna Khare
Signed-off-by: Shrikrishna Khare 
Signed-off-by: Bhavesh Davda 
Acked-by: Srividya Murali 
---
 drivers/net/vmxnet3/vmxnet3_ethtool.c | 118 ++
 drivers/net/vmxnet3/vmxnet3_int.h |   4 +-
 2 files changed, 92 insertions(+), 30 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c 
b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index c1d0e7a..a681569 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -183,16 +183,22 @@ vmxnet3_get_sset_count(struct net_device *netdev, int 
sset)
 }
 
 
-/* Should be multiple of 4 */
-#define NUM_TX_REGS8
-#define NUM_RX_REGS12
-
+/* This is a version 2 of the vmxnet3 ethtool_regs which goes hand in hand with
+ * the version 2 of the vmxnet3 support for ethtool(8) --register-dump.
+ * Therefore, if any registers are added, removed or modified, then a version
+ * bump and a corresponding change in the vmxnet3 support for ethtool(8)
+ * --register-dump would be required.
+ */
 static int
 vmxnet3_get_regs_len(struct net_device *netdev)
 {
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
-   return (adapter->num_tx_queues * NUM_TX_REGS * sizeof(u32) +
-   adapter->num_rx_queues * NUM_RX_REGS * sizeof(u32));
+
+   return ((9 /* BAR1 registers */ +
+   (1 + adapter->intr.num_intrs) +
+   (1 + adapter->num_tx_queues * 17 /* Tx queue registers */) +
+   (1 + adapter->num_rx_queues * 23 /* Rx queue registers */)) *
+   sizeof(u32));
 }
 
 
@@ -342,6 +348,12 @@ vmxnet3_get_ethtool_stats(struct net_device *netdev,
 }
 
 
+/* This is a version 2 of the vmxnet3 ethtool_regs which goes hand in hand with
+ * the version 2 of the vmxnet3 support for ethtool(8) --register-dump.
+ * Therefore, if any registers are added, removed or modified, then a version
+ * bump and a corresponding change in the vmxnet3 support for ethtool(8)
+ * --register-dump would be required.
+ */
 static void
 vmxnet3_get_regs(struct net_device *netdev, struct ethtool_regs *regs, void *p)
 {
@@ -351,40 +363,90 @@ vmxnet3_get_regs(struct net_device *netdev, struct 
ethtool_regs *regs, void *p)
 
memset(p, 0, vmxnet3_get_regs_len(netdev));
 
-   regs->version = 1;
+   regs->version = 2;
 
/* Update vmxnet3_get_regs_len if we want to dump more registers */
 
-   /* make each ring use multiple of 16 bytes */
-   for (i = 0; i < adapter->num_tx_queues; i++) {
-   buf[j++] = adapter->tx_queue[i].tx_ring.next2fill;
-   buf[j++] = adapter->tx_queue[i].tx_ring.next2comp;
-   buf[j++] = adapter->tx_queue[i].tx_ring.gen;
-   buf[j++] = 0;
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_VRRS);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_UVRS);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_DSAL);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_DSAH);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_MACL);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_MACH);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_ICR);
+   buf[j++] = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_ECR);
+
+   buf[j++] = adapter->intr.num_intrs;
+   for (i = 0; i < adapter->intr.num_intrs; i++) {
+   buf[j++] = VMXNET3_READ_BAR0_REG(adapter, VMXNET3_REG_IMR
++ i * VMXNET3_REG_ALIGN);
+   }
 
-   buf[j++] = adapter->tx_queue[i].comp_ring.next2proc;
-   buf[j++] = adapter->tx_queue[i].comp_ring.gen;
-   buf[j++] = adapter->tx_queue[i].stopped;
-   buf[j++] = 0;
+   buf[j++] = adapter->num_tx_queues;
+   for (i = 0; i < adapter->num_tx_queues; i++) {
+   struct vmxnet3_tx_queue *tq = >tx_queue[i];
+
+   buf[j++] = VMXNET3_READ_BAR0_REG(adapter, VMXNET3_REG_TXPROD +
+i * VMXNET3_REG_ALIGN);
+
+   buf[j++] = VMXNET3_GET_ADDR_LO(tq->tx_ring.basePA);
+   buf[j++] = VMXNET3_GET_ADDR_HI(tq->tx_ring.basePA);
+   buf[j++] = tq->tx_ring.size;
+   buf[j++] = tq->tx_ring.next2fill;
+   buf[j++] = tq->tx_ring.next2comp;
+   buf[j++] = tq->tx_ring.gen;
+
+   buf[j++] = VMXNET3_GET_ADDR_LO(tq->data_ring.basePA);
+   buf[j++] = VMXNET3_GET_ADDR_HI(tq->data_ring.basePA);
+   buf[j++] = tq->data_ring.size;
+   /* transmit data ring buffer size */
+   buf[j++] = VMXNET3_HDR_COPY_SIZE;
+
+   buf[j++] = VMXNET3_GET_ADDR_LO(tq->comp_ring.basePA);
+   buf[j++] = VMXNET3_GET_ADDR_HI(tq->comp_ring.basePA);
+   buf[j++] = tq->comp_ring.size;
+ 

[net-next PATCH 1/4] drivers: net: cpsw: davinci_emac: move reading mac id to common file

2015-09-21 Thread Mugunthan V N
Moving mac address reading from ethernet driver to common
file for better maintenance and for code reusable.

Signed-off-by: Mugunthan V N 
---
 drivers/net/ethernet/ti/cpsw-common.c  | 58 --
 drivers/net/ethernet/ti/cpsw.c | 11 +++
 drivers/net/ethernet/ti/cpsw.h |  3 +-
 drivers/net/ethernet/ti/davinci_emac.c | 44 ++
 4 files changed, 57 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw-common.c 
b/drivers/net/ethernet/ti/cpsw-common.c
index f595094..c70417c 100644
--- a/drivers/net/ethernet/ti/cpsw-common.c
+++ b/drivers/net/ethernet/ti/cpsw-common.c
@@ -19,11 +19,38 @@
 
 #include "cpsw.h"
 
-#define AM33XX_CTRL_MAC_LO_REG(offset, id) ((offset) + 0x8 * (id))
-#define AM33XX_CTRL_MAC_HI_REG(offset, id) ((offset) + 0x8 * (id) + 0x4)
+#define CTRL_MAC_LO_REG(offset, id) ((offset) + 0x8 * (id))
+#define CTRL_MAC_HI_REG(offset, id) ((offset) + 0x8 * (id) + 0x4)
 
-int cpsw_am33xx_cm_get_macid(struct device *dev, u16 offset, int slave,
-u8 *mac_addr)
+static int davinci_emac_3517_get_macid(struct device *dev, u16 offset,
+  int slave, u8 *mac_addr)
+{
+   u32 macid_lsb;
+   u32 macid_msb;
+   struct regmap *syscon;
+
+   syscon = syscon_regmap_lookup_by_phandle(dev->of_node, "syscon");
+   if (IS_ERR(syscon)) {
+   if (PTR_ERR(syscon) == -ENODEV)
+   return 0;
+   return PTR_ERR(syscon);
+   }
+
+   regmap_read(syscon, CTRL_MAC_LO_REG(offset, slave), _lsb);
+   regmap_read(syscon, CTRL_MAC_HI_REG(offset, slave), _msb);
+
+   mac_addr[0] = (macid_msb >> 16) & 0xff;
+   mac_addr[1] = (macid_msb >> 8)  & 0xff;
+   mac_addr[2] = macid_msb & 0xff;
+   mac_addr[3] = (macid_lsb >> 16) & 0xff;
+   mac_addr[4] = (macid_lsb >> 8)  & 0xff;
+   mac_addr[5] = macid_lsb & 0xff;
+
+   return 0;
+}
+
+static int cpsw_am33xx_cm_get_macid(struct device *dev, u16 offset, int slave,
+   u8 *mac_addr)
 {
u32 macid_lo;
u32 macid_hi;
@@ -36,10 +63,8 @@ int cpsw_am33xx_cm_get_macid(struct device *dev, u16 offset, 
int slave,
return PTR_ERR(syscon);
}
 
-   regmap_read(syscon, AM33XX_CTRL_MAC_LO_REG(offset, slave),
-   _lo);
-   regmap_read(syscon, AM33XX_CTRL_MAC_HI_REG(offset, slave),
-   _hi);
+   regmap_read(syscon, CTRL_MAC_LO_REG(offset, slave), _lo);
+   regmap_read(syscon, CTRL_MAC_HI_REG(offset, slave), _hi);
 
mac_addr[5] = (macid_lo >> 8) & 0xff;
mac_addr[4] = macid_lo & 0xff;
@@ -50,6 +75,21 @@ int cpsw_am33xx_cm_get_macid(struct device *dev, u16 offset, 
int slave,
 
return 0;
 }
-EXPORT_SYMBOL_GPL(cpsw_am33xx_cm_get_macid);
+
+int ti_cm_get_macid(struct device *dev, int slave, u8 *mac_addr)
+{
+   if (of_machine_is_compatible("ti,am33xx"))
+   return cpsw_am33xx_cm_get_macid(dev, 0x630, slave, mac_addr);
+
+   if (of_device_is_compatible(dev->of_node, "ti,am3517-emac"))
+   return davinci_emac_3517_get_macid(dev, 0x110, slave, mac_addr);
+
+   if (of_device_is_compatible(dev->of_node, "ti,dm816-emac"))
+   return cpsw_am33xx_cm_get_macid(dev, 0x30, slave, mac_addr);
+
+   dev_err(dev, "incompatible machine/device type for reading mac 
address\n");
+   return -ENOENT;
+}
+EXPORT_SYMBOL_GPL(ti_cm_get_macid);
 
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c670317..75584cc 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2058,13 +2058,10 @@ no_phy_slave:
if (mac_addr) {
memcpy(slave_data->mac_addr, mac_addr, ETH_ALEN);
} else {
-   if (of_machine_is_compatible("ti,am33xx")) {
-   ret = cpsw_am33xx_cm_get_macid(>dev,
-   0x630, i,
-   slave_data->mac_addr);
-   if (ret)
-   return ret;
-   }
+   ret = ti_cm_get_macid(>dev, i,
+ slave_data->mac_addr);
+   if (ret)
+   return ret;
}
if (data->dual_emac) {
if (of_property_read_u32(slave_node, 
"dual_emac_res_vlan",
diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h
index ca90efa..442a703 100644
--- a/drivers/net/ethernet/ti/cpsw.h
+++ b/drivers/net/ethernet/ti/cpsw.h
@@ -41,7 +41,6 @@ struct cpsw_platform_data {
 };
 
 void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave);
-int 

[net-next PATCH 2/4] drivers: net: cpsw-common: add support for reading mac address for dra7 and am437x platforms

2015-09-21 Thread Mugunthan V N
Adding support for reading mac address using syscon driver for
dra7 and am437x platforms

Signed-off-by: Mugunthan V N 
---
 drivers/net/ethernet/ti/cpsw-common.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw-common.c 
b/drivers/net/ethernet/ti/cpsw-common.c
index c70417c..c08be62 100644
--- a/drivers/net/ethernet/ti/cpsw-common.c
+++ b/drivers/net/ethernet/ti/cpsw-common.c
@@ -87,6 +87,12 @@ int ti_cm_get_macid(struct device *dev, int slave, u8 
*mac_addr)
if (of_device_is_compatible(dev->of_node, "ti,dm816-emac"))
return cpsw_am33xx_cm_get_macid(dev, 0x30, slave, mac_addr);
 
+   if (of_machine_is_compatible("ti,am4372"))
+   return cpsw_am33xx_cm_get_macid(dev, 0x630, slave, mac_addr);
+
+   if (of_machine_is_compatible("ti,dra7"))
+   return davinci_emac_3517_get_macid(dev, 0x514, slave, mac_addr);
+
dev_err(dev, "incompatible machine/device type for reading mac 
address\n");
return -ENOENT;
 }
-- 
2.6.0.rc2.10.gf4d9753

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib: fix data race in rhashtable_rehash_one

2015-09-21 Thread Eric Dumazet
On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
> rhashtable_rehash_one() uses plain writes to update entry->next,
> while it is being concurrently accessed by readers.
> Unfortunately, the compiler is within its rights to (for example) use
> byte-at-a-time writes to update the pointer, which would fatally confuse
> concurrent readers.
> 
> Use WRITE_ONCE to update entry->next in rhashtable_rehash_one().
> 
> The data race was found with KernelThreadSanitizer (KTSAN).
> 
> Signed-off-by: Dmitry Vyukov 
> ---
> KTSAN report for the record:
> 
> ThreadSanitizer: data-race in netlink_lookup
> 
> Atomic read at 0x880480443bd0 of size 8 by thread 2747 on CPU 11:
>  [< inline >] rhashtable_lookup_fast include/linux/rhashtable.h:543
>  [< inline >] __netlink_lookup net/netlink/af_netlink.c:1026
>  [] netlink_lookup+0x134/0x1c0 net/netlink/af_netlink.c:1046
>  [< inline >] netlink_getsockbyportid net/netlink/af_netlink.c:1616
>  [] netlink_unicast+0x111/0x300 
> net/netlink/af_netlink.c:1812
>  [] netlink_sendmsg+0x4c9/0x5f0 
> net/netlink/af_netlink.c:2443
>  [< inline >] sock_sendmsg_nosec net/socket.c:610
>  [] sock_sendmsg+0x83/0x90 net/socket.c:620
>  [] ___sys_sendmsg+0x3cf/0x3e0 net/socket.c:1952
>  [] __sys_sendmsg+0x4c/0xb0 net/socket.c:1986
>  [< inline >] SYSC_sendmsg net/socket.c:1997
>  [] SyS_sendmsg+0x30/0x50 net/socket.c:1993
>  [] entry_SYSCALL_64_fastpath+0x31/0x95
> arch/x86/entry/entry_64.S:188
> 
> Previous write at 0x880480443bd0 of size 8 by thread 213 on CPU 4:
>  [< inline >] rhashtable_rehash_one lib/rhashtable.c:193
>  [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213
>  [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
>  [] rht_deferred_worker+0x3b0/0x6d0 lib/rhashtable.c:373
>  [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [] kthread+0x150/0x170 kernel/kthread.c:209
>  [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutexes locked by thread 213:
> Mutex 217217 is locked here:
>  [] mutex_lock+0x57/0x70 kernel/locking/mutex.c:108
>  [] rht_deferred_worker+0x45/0x6d0 lib/rhashtable.c:363
>  [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [] kthread+0x150/0x170 kernel/kthread.c:209
>  [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutex 431216 is locked here:
>  [< inline >] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:149
>  [] _raw_spin_lock_bh+0x65/0x80 
> kernel/locking/spinlock.c:175
>  [< inline >] spin_lock_bh include/linux/spinlock.h:317
>  [< inline >] rhashtable_rehash_chain lib/rhashtable.c:212
>  [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
>  [] rht_deferred_worker+0x1e6/0x6d0 lib/rhashtable.c:373
>  [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [] kthread+0x150/0x170 kernel/kthread.c:209
>  [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutex 432766 is locked here:
>  [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
>  [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
>  [< inline >] rhashtable_rehash_one lib/rhashtable.c:186
>  [< inline >] rhashtable_rehash_chain lib/rhashtable.c:213
>  [< inline >] rhashtable_rehash_table lib/rhashtable.c:257
>  [] rht_deferred_worker+0x36b/0x6d0 lib/rhashtable.c:373
>  [] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [] kthread+0x150/0x170 kernel/kthread.c:209
>  [] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> ---
>  lib/rhashtable.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index cc0c697..978624d 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -188,9 +188,12 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
> unsigned int old_hash)
> new_tbl, new_hash);
>  
>   if (rht_is_a_nulls(head))
> - INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
> - else
> - RCU_INIT_POINTER(entry->next, head);
> + head = (struct rhash_head *)rht_marker(ht, new_hash);
> + /* We don't insert any new nodes that were not previously accessible
> +  * to readers, so we don't need to use rcu_assign_pointer().
> +  * But entry is being concurrently accessed by readers, so we need to
> +  * use at least WRITE_ONCE. */

This is bogus.

1) Linux is certainly not working if some arch or compiler is not doing
single word writes. WRITE_ONCE() would not help at all to enforce this.

2) If  new node is not yet visible, we don't care if we write
entry->next using any kind of operation.

So the WRITE_ONCE() is not needed at all.



> + 

[PATCH 22/38] orinoco: fix checking for default value

2015-09-21 Thread Andrzej Hajda
Thresholds uses -1 to indicate that default value should be used.
Since thresholds are unsigned sign checking makes no sense.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 drivers/net/wireless/orinoco/cfg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/orinoco/cfg.c 
b/drivers/net/wireless/orinoco/cfg.c
index a9e94b6..0f6ea31 100644
--- a/drivers/net/wireless/orinoco/cfg.c
+++ b/drivers/net/wireless/orinoco/cfg.c
@@ -220,7 +220,7 @@ static int orinoco_set_wiphy_params(struct wiphy *wiphy, 
u32 changed)
if (changed & WIPHY_PARAM_FRAG_THRESHOLD) {
/* Set fragmentation */
if (priv->has_mwo) {
-   if (wiphy->frag_threshold < 0)
+   if (wiphy->frag_threshold == -1)
frag_value = 0;
else {
printk(KERN_WARNING "%s: Fixed fragmentation "
@@ -230,7 +230,7 @@ static int orinoco_set_wiphy_params(struct wiphy *wiphy, 
u32 changed)
frag_value = 1;
}
} else {
-   if (wiphy->frag_threshold < 0)
+   if (wiphy->frag_threshold == -1)
frag_value = 2346;
else if ((wiphy->frag_threshold < 257) ||
 (wiphy->frag_threshold > 2347))
@@ -252,7 +252,7 @@ static int orinoco_set_wiphy_params(struct wiphy *wiphy, 
u32 changed)
 * the upper limit.
 */
 
-   if (wiphy->rts_threshold < 0)
+   if (wiphy->rts_threshold == -1)
rts_value = 2347;
else if (wiphy->rts_threshold > 2347)
err = -EINVAL;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] airo: fix IW_AUTH_ALG_OPEN_SYSTEM

2015-09-21 Thread Ondrej Zary
IW_AUTH_ALG_OPEN_SYSTEM is ambiguous in set_auth for WEP as
wpa_supplicant uses it for both no encryption and WEP open system.
Cache the last mode set (only of these two) and use it here.

This allows wpa_supplicant to work with unencrypted APs.

Signed-off-by: Ondrej Zary 
---
 drivers/net/wireless/airo.c |   59 +--
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index d0c97c2..67001a8 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -1237,6 +1237,7 @@ struct airo_info {
 
int wep_capable;
int max_wep_idx;
+   int last_auth;
 
/* WPA-related stuff */
unsigned int bssListFirst;
@@ -3786,6 +3787,16 @@ badrx:
}
 }
 
+static inline void set_auth_type(struct airo_info *local, int auth_type)
+{
+   local->config.authType = auth_type;
+   /* Cache the last auth type used (of AUTH_OPEN and AUTH_ENCRYPT).
+* Used by airo_set_auth()
+*/
+   if (auth_type == AUTH_OPEN || auth_type == AUTH_ENCRYPT)
+   local->last_auth = auth_type;
+}
+
 static u16 setup_card(struct airo_info *ai, u8 *mac, int lock)
 {
Cmd cmd;
@@ -3862,7 +3873,7 @@ static u16 setup_card(struct airo_info *ai, u8 *mac, int 
lock)
"level scale");
}
ai->config.opmode = adhoc ? MODE_STA_IBSS : MODE_STA_ESS;
-   ai->config.authType = AUTH_OPEN;
+   set_auth_type(ai, AUTH_OPEN);
ai->config.modulation = MOD_CCK;
 
if (le16_to_cpu(cap_rid.len) >= sizeof(cap_rid) &&
@@ -4880,13 +4891,13 @@ static void proc_config_on_close(struct inode *inode, 
struct file *file)
line += 5;
switch( line[0] ) {
case 's':
-   ai->config.authType = AUTH_SHAREDKEY;
+   set_auth_type(ai, AUTH_SHAREDKEY);
break;
case 'e':
-   ai->config.authType = AUTH_ENCRYPT;
+   set_auth_type(ai, AUTH_ENCRYPT);
break;
default:
-   ai->config.authType = AUTH_OPEN;
+   set_auth_type(ai, AUTH_OPEN);
break;
}
set_bit (FLAG_COMMIT, >flags);
@@ -6368,9 +6379,8 @@ static int airo_set_encode(struct net_device *dev,
 * should be enabled (user may turn it off later)
 * This is also how "iwconfig ethX key on" works */
if((index == current_index) && (key.len > 0) &&
-  (local->config.authType == AUTH_OPEN)) {
-   local->config.authType = AUTH_ENCRYPT;
-   }
+  (local->config.authType == AUTH_OPEN))
+   set_auth_type(local, AUTH_ENCRYPT);
} else {
/* Do we want to just set the transmit key index ? */
int index = (dwrq->flags & IW_ENCODE_INDEX) - 1;
@@ -6389,12 +6399,12 @@ static int airo_set_encode(struct net_device *dev,
}
}
/* Read the flags */
-   if(dwrq->flags & IW_ENCODE_DISABLED)
-   local->config.authType = AUTH_OPEN; // disable encryption
+   if (dwrq->flags & IW_ENCODE_DISABLED)
+   set_auth_type(local, AUTH_OPEN);/* disable encryption */
if(dwrq->flags & IW_ENCODE_RESTRICTED)
-   local->config.authType = AUTH_SHAREDKEY;// Only Both
-   if(dwrq->flags & IW_ENCODE_OPEN)
-   local->config.authType = AUTH_ENCRYPT;  // Only Wep
+   set_auth_type(local, AUTH_SHAREDKEY);   /* Only Both */
+   if (dwrq->flags & IW_ENCODE_OPEN)
+   set_auth_type(local, AUTH_ENCRYPT); /* Only Wep */
/* Commit the changes to flags if needed */
if (local->config.authType != currentAuthType)
set_bit (FLAG_COMMIT, >flags);
@@ -6549,12 +6559,12 @@ static int airo_set_encodeext(struct net_device *dev,
}
 
/* Read the flags */
-   if(encoding->flags & IW_ENCODE_DISABLED)
-   local->config.authType = AUTH_OPEN; // disable encryption
+   if (encoding->flags & IW_ENCODE_DISABLED)
+   set_auth_type(local, AUTH_OPEN);/* disable encryption */
if(encoding->flags & IW_ENCODE_RESTRICTED)
-   local->config.authType = AUTH_SHAREDKEY;// Only Both
-   if(encoding->flags & IW_ENCODE_OPEN)
-   local->config.authType = AUTH_ENCRYPT;  // Only Wep
+   set_auth_type(local, AUTH_SHAREDKEY);   /* Only Both */
+

[PATCH 27/38] usbnet: remove invalid check

2015-09-21 Thread Andrzej Hajda
skb->len is always non-negative.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 drivers/net/usb/lan78xx.c  | 5 -
 drivers/net/usb/smsc75xx.c | 5 -
 drivers/net/usb/smsc95xx.c | 5 -
 3 files changed, 15 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index a39518f..e0556dc 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2522,11 +2522,6 @@ static int lan78xx_rx(struct lan78xx_net *dev, struct 
sk_buff *skb)
skb_pull(skb, align_count);
}
 
-   if (unlikely(skb->len < 0)) {
-   netdev_warn(dev->net, "invalid rx length<0 %d", skb->len);
-   return 0;
-   }
-
return 1;
 }
 
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index d9e7892..30033db 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -2185,11 +2185,6 @@ static int smsc75xx_rx_fixup(struct usbnet *dev, struct 
sk_buff *skb)
skb_pull(skb, align_count);
}
 
-   if (unlikely(skb->len < 0)) {
-   netdev_warn(dev->net, "invalid rx length<0 %d\n", skb->len);
-   return 0;
-   }
-
return 1;
 }
 
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 26423ad..66b3ab9 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1815,11 +1815,6 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct 
sk_buff *skb)
skb_pull(skb, align_count);
}
 
-   if (unlikely(skb->len < 0)) {
-   netdev_warn(dev->net, "invalid rx length<0 %d\n", skb->len);
-   return 0;
-   }
-
return 1;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/38] net/ibm/emac: fix type of phy_mode

2015-09-21 Thread Andrzej Hajda
phy_mode can be negative.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 drivers/net/ethernet/ibm/emac/core.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.h 
b/drivers/net/ethernet/ibm/emac/core.h
index 28df374..f379e47 100644
--- a/drivers/net/ethernet/ibm/emac/core.h
+++ b/drivers/net/ethernet/ibm/emac/core.h
@@ -181,7 +181,7 @@ struct emac_instance {
struct mal_commac   commac;
 
/* PHY infos */
-   u32 phy_mode;
+   int phy_mode;
u32 phy_map;
u32 phy_address;
u32 phy_feat_exc;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] airo: Implement netif_carrier_on/off

2015-09-21 Thread Ondrej Zary
Add calls to netif_carrier_on and netif_carrier_off

Signed-off-by: Ondrej Zary 
---
 drivers/net/wireless/airo.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index 67001a8..8ae838d 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -3267,6 +3267,7 @@ static void airo_handle_link(struct airo_info *ai)
wake_up_interruptible(>thr_wait);
} else
airo_send_event(ai->dev);
+   netif_carrier_on(ai->dev);
} else if (!scan_forceloss) {
if (auto_wep && !ai->expires) {
ai->expires = RUN_AT(3*HZ);
@@ -3277,6 +3278,9 @@ static void airo_handle_link(struct airo_info *ai)
eth_zero_addr(wrqu.ap_addr.sa_data);
wrqu.ap_addr.sa_family = ARPHRD_ETHER;
wireless_send_event(ai->dev, SIOCGIWAP, , NULL);
+   netif_carrier_off(ai->dev);
+   } else {
+   netif_carrier_off(ai->dev);
}
 }
 
@@ -3613,6 +3617,7 @@ static void disable_MAC( struct airo_info *ai, int lock ) 
{
return;
 
if (test_bit(FLAG_ENABLED, >flags)) {
+   netif_carrier_off(ai->dev);
memset(, 0, sizeof(cmd));
cmd.cmd = MAC_DISABLE; // disable in case already enabled
issuecommand(ai, , );
-- 
Ondrej Zary

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH 3/4] arm: dts: dra7: add syscon phandle to cpsw node

2015-09-21 Thread Mugunthan V N
There are 2 MACIDs stored in the control module of the dra7.
These are read by the cpsw driver if no valid MACID was found
in the devicetree.

Signed-off-by: Mugunthan V N 
---
 arch/arm/boot/dts/dra7.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index 5d65db9..76c739d 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -1447,6 +1447,7 @@
 ,
 ;
ranges;
+   syscon = <_conf>;
status = "disabled";
 
davinci_mdio: mdio@48485000 {
-- 
2.6.0.rc2.10.gf4d9753

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH 4/4] arm: dts: am4372: add syscon phandle to cpsw node

2015-09-21 Thread Mugunthan V N
There are 2 MACIDs stored in the control module of the am4372.
These are read by the cpsw driver if no valid MACID was found
in the devicetree.

Signed-off-by: Mugunthan V N 
---
 arch/arm/boot/dts/am4372.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi
index 0447c04a..d83ff9c 100644
--- a/arch/arm/boot/dts/am4372.dtsi
+++ b/arch/arm/boot/dts/am4372.dtsi
@@ -591,6 +591,7 @@
cpts_clock_mult = <0x8000>;
cpts_clock_shift = <29>;
ranges;
+   syscon = <_conf>;
 
davinci_mdio: mdio@4a101000 {
compatible = "ti,am4372-mdio","ti,davinci_mdio";
-- 
2.6.0.rc2.10.gf4d9753

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH 0/4] Add support for reading macid when DT macid not found

2015-09-21 Thread Mugunthan V N
Did a boot test on dra7-evm [1] and am437x-gp-evm [2].
Pushed a branch [3] for others to test the patch.

[1]: http://pastebin.ubuntu.com/12513420/
[2]: http://pastebin.ubuntu.com/12513428/
[3]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git 
cpsw-macid-read-support

Mugunthan V N (4):
  drivers: net: cpsw: davinci_emac: move reading mac id to common file
  drivers: net: cpsw-common: add support for reading mac address for
dra7 and am437x platforms
  arm: dts: dra7: add syscon phandle to cpsw node
  arm: dts: am4372: add syscon phandle to cpsw node

 arch/arm/boot/dts/am4372.dtsi  |  1 +
 arch/arm/boot/dts/dra7.dtsi|  1 +
 drivers/net/ethernet/ti/cpsw-common.c  | 64 +-
 drivers/net/ethernet/ti/cpsw.c | 11 +++---
 drivers/net/ethernet/ti/cpsw.h |  3 +-
 drivers/net/ethernet/ti/davinci_emac.c | 44 ++-
 6 files changed, 65 insertions(+), 59 deletions(-)

-- 
2.6.0.rc2.10.gf4d9753

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Ariel Elior
> -Original Message-
> From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz]
> Sent: Monday, September 21, 2015 1:32 PM
> To: Ariel Elior 
> Cc: netdev ; n...@linuxbox.cz
> Subject: Re: bnx2x - occasional high packet loss (on LAN)
> 
> Hello Ariel,
> 
> after few days of torturing NICs with flood pings, card
> seems to have given up with lots of errors..
> 
> I've uploaded new kernel log here:
> 
> http://nik.lbox.cz/download/dmesg.txt
> 
> Will this help?
> 
> I still have it in this hung state now, in case I could provide
> more info for diagnostics..
> 
> however please note this can be different problem then I was reporting
> originaly, since I only had high packet loss, while now whole card
> seems to be blocked..  but maybe it just is worse case of the same
> problem?

Hi Nikola,
Seems like the link below is the same file you shared before - I don't see any 
errors there.

Two things you can collect for me to help debug this issue:

1.
Output of ethtool -d eth1 after problem occurs (redirect to a file)

2.
dmesg after enabling link related debug messages. Use
modprobe bnx2x debug=0x4
Or
ethtool -s eth1 msglvl 0x4
to enable these prints.

This holds for both of your problems (unless it is the same issue).

Thanks,
Ariel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netlink: Replace rhash_portid with bound

2015-09-21 Thread Herbert Xu
On Sun, Sep 20, 2015 at 11:11:04PM -0700, David Miller wrote:
>
> Yeah at this point incremental patches work the best.

OK here is the patch:

---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.

Tejun is right that a barrier is unavoidable.  Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.

Barriers have been added where necessary to ensure that a valid
portid is used.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero 
port ID")
Reported-by: Tejun Heo 
Reported-by: Linus Torvalds 
Signed-off-by: Herbert Xu 

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 303efb7..f5362aae 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -24,6 +24,7 @@
 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -1015,7 +1016,7 @@ static inline int netlink_compare(struct 
rhashtable_compare_arg *arg,
const struct netlink_compare_arg *x = arg->key;
const struct netlink_sock *nlk = ptr;
 
-   return nlk->rhash_portid != x->portid ||
+   return nlk->portid != x->portid ||
   !net_eq(sock_net(>sk), read_pnet(>pnet));
 }
 
@@ -1041,7 +1042,7 @@ static int __netlink_insert(struct netlink_table *table, 
struct sock *sk)
 {
struct netlink_compare_arg arg;
 
-   netlink_compare_arg_init(, sock_net(sk), nlk_sk(sk)->rhash_portid);
+   netlink_compare_arg_init(, sock_net(sk), nlk_sk(sk)->portid);
return rhashtable_lookup_insert_key(>hash, ,
_sk(sk)->node,
netlink_rhashtable_params);
@@ -1095,7 +1096,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
lock_sock(sk);
 
err = -EBUSY;
-   if (nlk_sk(sk)->portid)
+   if (nlk_sk(sk)->bound)
goto err;
 
err = -ENOMEM;
@@ -1103,7 +1104,7 @@ static int netlink_insert(struct sock *sk, u32 portid)
unlikely(atomic_read(>hash.nelems) >= UINT_MAX))
goto err;
 
-   nlk_sk(sk)->rhash_portid = portid;
+   nlk_sk(sk)->portid = portid;
sock_hold(sk);
 
err = __netlink_insert(table, sk);
@@ -1119,7 +1120,11 @@ static int netlink_insert(struct sock *sk, u32 portid)
goto err;
}
 
-   nlk_sk(sk)->portid = portid;
+   /* rhashtable_insert carries an implicit write memory barrier
+* so we don't need an smp_wmb here in order to ensure that
+* portid is set before bound.
+*/
+   nlk_sk(sk)->bound = portid;
 
 err:
release_sock(sk);
@@ -1521,9 +1526,11 @@ static int netlink_bind(struct socket *sock, struct 
sockaddr *addr,
return err;
}
 
-   if (nlk->portid)
+   /* Ensure nlk->portid is up-to-date. */
+   if (smp_load_acquire(>bound)) {
if (nladdr->nl_pid != nlk->portid)
return -EINVAL;
+   }
 
if (nlk->netlink_bind && groups) {
int group;
@@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct 
sockaddr *addr,
}
}
 
-   if (!nlk->portid) {
+   if (!nlk->bound) {
err = nladdr->nl_pid ?
netlink_insert(sk, nladdr->nl_pid) :
netlink_autobind(sock);
@@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct 
sockaddr *addr,
!netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
return -EPERM;
 
-   if (!nlk->portid)
+   if (!nlk->bound)
err = netlink_autobind(sock);
 
if (err == 0) {
@@ -2428,7 +2435,8 @@ static int netlink_sendmsg(struct socket *sock, struct 
msghdr *msg, size_t len)
dst_group = nlk->dst_group;
}
 
-   if (!nlk->portid) {
+   /* Ensure nlk->portid is up-to-date. */
+   if (!smp_load_acquire(>bound)) {
err = netlink_autobind(sock);
if (err)
goto out;
@@ -3257,7 +3265,7 @@ static inline u32 netlink_hash(const void *data, u32 len, 
u32 seed)
const struct netlink_sock *nlk = data;
struct netlink_compare_arg arg;
 
-   netlink_compare_arg_init(, sock_net(>sk), nlk->rhash_portid);
+   netlink_compare_arg_init(, sock_net(>sk), nlk->portid);
return jhash2((u32 *), netlink_compare_arg_len / sizeof(u32), seed);
 }
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index c96dfa3..e6aae40 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -25,7 +25,6 @@ struct netlink_ring {
 struct netlink_sock {
/* struct sock has to be the first member of 

[PATCH 08/38] openvswitch: fix handling result of ipv6_skip_exthdr

2015-09-21 Thread Andrzej Hajda
The function can return negative value.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 net/openvswitch/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 002a755..fde3391 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -253,7 +253,7 @@ static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
const struct nf_conntrack_helper *helper;
const struct nf_conn_help *help;
enum ip_conntrack_info ctinfo;
-   unsigned int protoff;
+   int protoff;
struct nf_conn *ct;
 
ct = nf_ct_get(skb, );
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Nikola Ciprich
Hello Ariel,

after few days of torturing NICs with flood pings, card
seems to have given up with lots of errors..

I've uploaded new kernel log here:

http://nik.lbox.cz/download/dmesg.txt

Will this help?

I still have it in this hung state now, in case I could provide
more info for diagnostics..

however please note this can be different problem then I was reporting
originaly, since I only had high packet loss, while now whole card
seems to be blocked..  but maybe it just is worse case of the same
problem?

BR

nik





On Wed, Sep 16, 2015 at 10:18:34AM +0200, Nikola Ciprich wrote:
> On Wed, Sep 16, 2015 at 08:15:41AM +, Ariel Elior wrote:
> > Hi Nikola,
> > Please provide dmesg output from your system.
> > Thanks,
> > Ariel
> 
> Hello Ariel,
> 
> here it is:
> 
> http://nik.lbox.cz/download/dmesg.txt
> 
> BR
> 
> nik
> 
> 
> > 
> 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpaGhvGIm8WQ.pgp
Description: PGP signature


Re: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Nikola Ciprich
> Hi Nikola,
> Seems like the link below is the same file you shared before - I don't see 
> any errors there.

ouch, the file was correct, but the permissions were wrong..
so maybe you were getting older file from some proxy?

anyways, I've copied file so you can get it from
new location:

http://nik.lbox.cz/download/dmesg2.txt



> 
> Two things you can collect for me to help debug this issue:
> 
> 1.
> Output of ethtool -d eth1 after problem occurs (redirect to a file)

here it is:
http://nik.lbox.cz/download/ethtool-eth4.txt
http://nik.lbox.cz/download/ethtool-eth5.txt

> 2.
> dmesg after enabling link related debug messages. Use
> modprobe bnx2x debug=0x4
> Or
> ethtool -s eth1 msglvl 0x4

I have set this for both interfaces, but don't see anything new
in dmesg..


> to enable these prints.
> 
> This holds for both of your problems (unless it is the same issue).
> 
> Thanks,
> Ariel
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpv_dwk6X4To.pgp
Description: PGP signature


Re: [PATCH 1/2 v2] airo: fix IW_AUTH_ALG_OPEN_SYSTEM

2015-09-21 Thread Kalle Valo
Ondrej Zary  writes:

> IW_AUTH_ALG_OPEN_SYSTEM is ambiguous in set_auth for WEP as
> wpa_supplicant uses it for both no encryption and WEP open system.
> Cache the last mode set (only of these two) and use it here.
>
> This allows wpa_supplicant to work with unencrypted APs.
>
> Signed-off-by: Ondrej Zary 
> ---
>  drivers/net/wireless/airo.c |   33 -
>  1 file changed, 24 insertions(+), 9 deletions(-)

You should CC linux-wireless mailing list, otherwise patchwork won't see
it and I will miss your patch:

https://patchwork.kernel.org/project/linux-wireless/list/

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/38] mwifiex: fix comparison expression

2015-09-21 Thread Andrzej Hajda
To avoid underflows signed variables should be used in expression.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 drivers/net/wireless/mwifiex/11n_rxreorder.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/11n_rxreorder.c 
b/drivers/net/wireless/mwifiex/11n_rxreorder.c
index 2906cd5..b3970a8 100644
--- a/drivers/net/wireless/mwifiex/11n_rxreorder.c
+++ b/drivers/net/wireless/mwifiex/11n_rxreorder.c
@@ -615,10 +615,10 @@ int mwifiex_11n_rx_reorder_pkt(struct mwifiex_private 
*priv,
((end_win > start_win) && ((seq_num > end_win) ||
   (seq_num < start_win {
end_win = seq_num;
-   if (((seq_num - win_size) + 1) >= 0)
+   if (((end_win - win_size) + 1) >= 0)
start_win = (end_win - win_size) + 1;
else
-   start_win = (MAX_TID_VALUE - (win_size - seq_num)) + 1;
+   start_win = (MAX_TID_VALUE - (win_size - end_win)) + 1;
mwifiex_11n_dispatch_pkt_until_start_win(priv, tbl, start_win);
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/38] rndis_wlan: fix checking for default value

2015-09-21 Thread Andrzej Hajda
Thresholds uses -1 to indicate that default value should be used.
Since thresholds are unsigned sign checking makes no sense.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda 
---
 drivers/net/wireless/rndis_wlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rndis_wlan.c 
b/drivers/net/wireless/rndis_wlan.c
index 71a825c..a13d1f2 100644
--- a/drivers/net/wireless/rndis_wlan.c
+++ b/drivers/net/wireless/rndis_wlan.c
@@ -1236,7 +1236,7 @@ static int set_rts_threshold(struct usbnet *usbdev, u32 
rts_threshold)
 
netdev_dbg(usbdev->net, "%s(): %i\n", __func__, rts_threshold);
 
-   if (rts_threshold < 0 || rts_threshold > 2347)
+   if (rts_threshold == -1 || rts_threshold > 2347)
rts_threshold = 2347;
 
tmp = cpu_to_le32(rts_threshold);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] kcm: Kernel Connection Multiplexor (KCM)

2015-09-21 Thread Tom Herbert
On Mon, Sep 21, 2015 at 2:26 PM, Sowmini Varadhan
 wrote:
> On (09/21/15 10:33), Tom Herbert wrote:
>> >
>> > Some things that were not clear to me from the patch-set:
>> >
>> > The doc statses that we re-assemble packets the "stated length" -
>> > but how will the receiver know the "stated length"?
>>
>> BPF program returns the length of the next message. In my testing so
>> far I've been using HTTP/2 which defines a frame format with first 3
>> bytes being header length field . The BPF program (using LLVM/Clang--
>> thanks Alexei!) is just:
>
> Maybe I dont see something about the mux/demux here (I have to
> take a closer look at reserve_psock/unreserve_psock), but
> will every tcp segment have a 3 byte length in the payload?
>
No, there is no provision in TCP that application layer headers align
with TCP segments or that message boundaries are respected with TCP
segments. What we need to do, which you're probably doing for RDS, is
do message delineation on the stream as a sequence of:

1) Read protocol header to determine message length (BPF used here)
2) Read data up to the length of the message
3) Deliver message
4) Goto #1 (i.e. process next message in the stream).

> Not every TCP segment in the RDS-TCP case will have a RDS header,
> thus the comments before rds_send_xmit(), thus applying the bpf filter
> to a TCP segment holding some "from-the-middle" piece of the RDS dgram
> may not be possible
>
>> > the notes say one can "accept()" over a kcm socket- but "accept()"
>> > is itself a connection-oriented concept- one does not accept() on
> :
>> The accept method is overloaded on KCM sockets to do the socket
>> cloning operation. This is unrelated to TCP semantics, connection
>> management is performed on TCP sockets (i.e. before being attached to
>> a KCM multiplexor).
>
> If possible,it might be better to use some other
> glibc-func/name/syscall/sockopt/whatever
> for this, rather than overloading accept().. feels like that would
> keep the semantics cleaner, and probably less likely to trip
> up on accept code in the kernel..
>
I'll a look at alternatives, but I sort of think this is okay since
the semantics of accept are defined per protocol (in this case the
"protocol" is KCM).

Thanks,
Tom

> --Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 0/5] s390: qeth and iucv patches

2015-09-21 Thread David Miller
From: Ursula Braun 
Date: Fri, 18 Sep 2015 16:06:47 +0200

> here is version 2 of some s390 related qeth patches for net-next. The patch by
> Thomas Richter adds a new feature to the qeth layer2 code; the remaining
> patches are minor improvements.
> Version 2 of patch 4 uses the desired indentation in function declarations
> and definitions spanning multiple lines in almost all cases. Thomas run into a
> conflict with the maximum number of columns once. Thus you will still see one
> function definition using an earlier column before the opening paranthesis.

Series applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] sunvnet:Invoke SET_NETDEV_DEV() to set up the vdev in vnet_new()

2015-09-21 Thread David Miller
From: Sowmini Varadhan 
Date: Fri, 18 Sep 2015 17:47:55 -0400

> `ls /sys/devices/channel-devices/vnet-port-0-0/net' is missing without
> this change, and applications like NetworkManager are looking in
> sysfs for the information.
> 
> Signed-off-by: Sowmini Varadhan 

This is a bug, so applied to 'net', thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] net: phy: Fix module autoload for OF platform drivers

2015-09-21 Thread David Miller
From: Luis de Bethencourt 
Date: Fri, 18 Sep 2015 18:15:53 +0200

> These patches add the missing MODULE_DEVICE_TABLE() for OF to export
> the information so modules have the correct aliases built-in and
> autoloading works correctly.
> 
> A longer explanation by Javier Canillas can be found here:
> https://lkml.org/lkml/2015/7/30/519
> 
> Sorry if these two patches should've been part of the ethernet series:
> https://lkml.org/lkml/2015/9/18/567

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 4/4] net: ipv6: Initial support for VRFs

2015-09-21 Thread David Ahern

On 9/21/15 6:08 PM, Tom Herbert wrote:

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 9aadd57808a5..11980ee57507 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -142,6 +142,10 @@ static int __ip6_datagram_connect(struct sock *sk, struct 
sockaddr *uaddr, int a
 err = -EINVAL;
 goto out;
 }
+   } else if (sk->sk_bound_dev_if &&
+  netif_index_is_l3_master(sock_net(sk),


I suppose I have the same issues with this that were put in the IPv4
code path. Core IPv6 code should not care about any specific network
interfaces other than maybe loopback. Generalizing VPF to be l3m
doesn't really address this point. Have you looked at abstracting more
of this into the ndo functions (i.e. for source address selection) or
routing lookup?


Socket binding to an interface makes the socket layer care somewhat 
about references to a device. For this case and the ipv4 version the 
flag needs to be set here because of what the connect function means for 
datagram sockets. Once you go down a layer (to L3/routing) there is no 
proper place to add this flag to the lookups.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Pravin Shelar
On Mon, Sep 21, 2015 at 5:14 PM, David Miller  wrote:
> From: Pravin B Shelar 
> Date: Sun, 20 Sep 2015 23:53:17 -0700
>
>> VXLAN device can receive skb with checksum partial. But the checksum
>> offset could be in outer header which is pulled on receive.
>
> Such a scenerio is a bug.
>
> Anything that pulls off a header should use a utility function such
> as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
> fixed up properly.

skb_postpull_rcsum() does not change checksum-offset. vxlan receive
already calls this function.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] openvswitch: Zero flows on allocation.

2015-09-21 Thread Jesse Gross
On Sun, Sep 20, 2015 at 11:24 PM, David Miller  wrote:
> From: Jesse Gross 
> Date: Fri, 18 Sep 2015 19:06:14 -0700
>
>> @@ -80,7 +80,7 @@ struct sw_flow *ovs_flow_alloc(void)
>>   struct flow_stats *stats;
>>   int node;
>>
>> - flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
>> + flow = kmem_cache_alloc(flow_cache, GFP_KERNEL | __GFP_ZERO);
>>   if (!flow)
>
> Like Eric, I prefer that you use kmem_cache_zalloc() to fix
> this.

Sure, I'll make both changes and send out a new patch in a bit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] geneve: ensure ECN info is handled properly in all tx/rx paths

2015-09-21 Thread Jesse Gross
On Mon, Sep 21, 2015 at 7:29 AM, John W. Linville
 wrote:
> Partially due to a pre-exising "thinko", the new metadata-based tx/rx
> paths were handling ECN propagation differently than the traditional
> tx/rx paths.  This patch removes the "thinko" (involving multiple
> ip_hdr assignments) on the rx path and corrects the ECN handling on
> both the rx and tx paths.
>
> Signed-off-by: John W. Linville 

Reviewed-by: Jesse Gross 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DSA: phy polling

2015-09-21 Thread Andrew Lunn
On Mon, Sep 14, 2015 at 11:42:54AM +0100, Russell King - ARM Linux wrote:
> Andrew,
> 
> I think you're the current maintainer of the Marvell DSA code, as being
> the most recent author of changes to it. :)

Hi Russell

Sorry for the slow reply, i've been on vacation.

Humm, i suppose i might be the defacto Maintainer for Marvell parts,
but i've no NDA with Marvell, so no access to the data sheets etc.
 
> I've noticed in my testing that the Marvell DSA code seems to poll the
> internal phy link status in mv88e6xxx_poll_link(), and set the network
> device carrier status according to the results.

Peter Korsgaard comment might be correct, the switch needs to know
what the PHY has negotiated. Hence the use of the PPU. There are also
comments in the code that the PPU is needed for indirect access to the
PHY.

So we probably need to keep the PPU, but disable it from changing the
networks stacks idea of the link state, etc.

I will add this to my TODO list to play with it.

  Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute2 net-next] ip route: Add RTM_F_LOOKUP_TABLE flag and show table id

2015-09-21 Thread David Miller
From: David Ahern 
Date: Mon, 21 Sep 2015 16:23:13 -0600

> With the new flag a AND kernel that supports it ip will only show the
> table id IF it is not main:
> 
> root@vm-wheezy2:~# ./ip route get 10.0.0.20
> 10.0.0.20 dev eth0  src 10.0.0.2
> cache
> 
> root@vm-wheezy2:~# ./ip route get 10.2.1.254
> 10.2.1.254 dev eth1  table 10  src 10.2.1.2
> cache
> 
> That's my point. I have not changed existing users.

Ok, thanks for the clarification.  I misread the conditional
statements around the area your patch touches, sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: New driver for QinHeng CH9200 devices

2015-09-21 Thread David Miller
From: Matthew Garrett 
Date: Sun, 20 Sep 2015 02:25:38 -0700

> From: Matthew Garrett 
> 
> There's a bunch of cheap USB 10/100 devices based on QinHeng chipsets. The
> vendor driver supports the CH9100 and CH9200 devices, but the majority of
> the code is of the if (ch9100) {} else {} form, with the most significant
> difference being that CH9200 provides a real MII interface but CH9100 fakes
> one with a bunch of global variables and magic commands. I don't have a
> CH9100, so it's probably better if someone who does provides an independent
> driver for it. In any case, this is a lightly cleaned up version of the
> vendor driver with all the CH9100 code dropped.
> 
> Signed-off-by: Matthew Garrett 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] RDS: increase size of hash-table to 8K

2015-09-21 Thread David Miller
From: Santosh Shilimkar 
Date: Sat, 19 Sep 2015 19:04:42 -0400

> Even with per bucket locking scheme, in a massive parallel
> system with active rds sockets which could be in excess of multiple
> of 10K, rds_bin_lookup() workload is siginificant because of smaller
> hashtable size.
> 
> With some tests, it was found that we get modest but still nice
> reduction in rds_bind_lookup with bigger bucket.
> 
>   Hashtable   Baseline(1k)Delta
>   2048:   8.28%   -2.45%
>   4096:   8.28%   -4.60%
>   8192:   8.28%   -6.46%
>   16384:  8.28%   -6.75%
> 
> Based on the data, we set 8K as the bind hash-table size.
> 
> Signed-off-by: Santosh Shilimkar 
> Signed-off-by: Santosh Shilimkar 

Like others I would strongly prefer that you use a dynamically sized
hash table.

Eating 8k just because a module just happened to get loaded is really
not appropriate.

And there are many other places that use such a scheme, one example is
the AF_NETLINK socket hash table.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] tcp: factorize sk_txhash init

2015-09-21 Thread Eric Dumazet
From: Eric Dumazet 

Neal suggested to move sk_txhash init into tcp_create_openreq_child(),
called both from IPv4 and IPv6.

This opportunity was missed in commit 58d607d3e52f ("tcp: provide
skb->hash to synack packets")

Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
---
 net/ipv4/tcp_ipv4.c  |1 -
 net/ipv4/tcp_minisocks.c |1 +
 net/ipv6/tcp_ipv6.c  |2 --
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d671d742a239..7e2646542312 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1276,7 +1276,6 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct 
sk_buff *skb,
newinet->mc_index = inet_iif(skb);
newinet->mc_ttl   = ip_hdr(skb)->ttl;
newinet->rcv_tos  = ip_hdr(skb)->tos;
-   newsk->sk_txhash  = tcp_rsk(req)->txhash;
inet_csk(newsk)->icsk_ext_hdr_len = 0;
if (inet_opt)
inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6d8795b066ac..22ee9ef9db5e 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -471,6 +471,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, 
struct request_sock *req,
tcp_enable_early_retrans(newtp);
newtp->tlp_high_seq = 0;
newtp->lsndtime = treq->snt_synack;
+   newsk->sk_txhash = treq->txhash;
newtp->last_oow_ack_time = 0;
newtp->total_retrans = req->num_retrans;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index f9c0e2640671..a004e0b0b3e9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1090,8 +1090,6 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, 
struct sk_buff *skb,
newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr;
newsk->sk_bound_dev_if = ireq->ir_iif;
 
-   newsk->sk_txhash = tcp_rsk(req)->txhash;
-
/* Now IPv6 options...
 
   First: no IPv4 options.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] RDS: increase size of hash-table to 8K

2015-09-21 Thread santosh shilimkar

On 9/21/2015 4:05 PM, David Miller wrote:

From: Santosh Shilimkar 
Date: Sat, 19 Sep 2015 19:04:42 -0400


Even with per bucket locking scheme, in a massive parallel
system with active rds sockets which could be in excess of multiple
of 10K, rds_bin_lookup() workload is siginificant because of smaller
hashtable size.

With some tests, it was found that we get modest but still nice
reduction in rds_bind_lookup with bigger bucket.

Hashtable   Baseline(1k)Delta
2048:   8.28%   -2.45%
4096:   8.28%   -4.60%
8192:   8.28%   -6.46%
16384:  8.28%   -6.75%

Based on the data, we set 8K as the bind hash-table size.

Signed-off-by: Santosh Shilimkar 
Signed-off-by: Santosh Shilimkar 


Like others I would strongly prefer that you use a dynamically sized
hash table.

Eating 8k just because a module just happened to get loaded is really
not appropriate.

And there are many other places that use such a scheme, one example is
the AF_NETLINK socket hash table.


OK. Thanks for AF_NETLINK pointer. I will look it up.

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: can-next 2015-09-17

2015-09-21 Thread David Miller
From: Marc Kleine-Budde 
Date: Mon, 21 Sep 2015 09:20:21 +0200

> this is a pull request of 8 patches for net-next/master.
> 
> All 8 patches are by me and cleanup the flexcan driver.

Pulled, thanks Marc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: bcmgenet: Remove duplicate test for tx_coalesce_usecs_high

2015-09-21 Thread David Miller
From: Florian Fainelli 
Date: Fri, 18 Sep 2015 14:16:53 -0700

> We were checking twice for ec->tx_coalesce_usecs_high, remove the
> duplicate test.
> 
> Reported-by: Julia Lawall 
> Reported-by: kbuild-...@01.org
> Fixes: 2f9130709d2c19 ("net: bcmgenet: Implement TX coalescing control knobs")
> Signed-off-by: Florian Fainelli 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] geneve: remove vlan-related feature assignment

2015-09-21 Thread David Miller
From: "John W. Linville" 
Date: Fri, 18 Sep 2015 16:20:32 -0400

> The code handling vlan tag insertion was dropped in commit 371bd1061d29
> ("geneve: Consolidate Geneve functionality in single module.").  Now we
> need to drop the related vlan feature bits in the netdev structure.
> 
> Signed-off-by: John W. Linville 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: use network byte order for destination port config parameter

2015-09-21 Thread David Miller
From: "John W. Linville" 
Date: Fri, 18 Sep 2015 15:59:10 -0400

> This is primarily for consistancy with vxlan and other tunnels which
> use network byte order for similar parameters.
> 
> Signed-off-by: John W. Linville 

This doesn't apply to any of my trees.  Can you respin it against
'net'?  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 4/4] net: ipv6: Initial support for VRFs

2015-09-21 Thread Tom Herbert
On Mon, Sep 21, 2015 at 4:32 PM, David Ahern  wrote:
> Add basic support for VRFs to IPv6 stack. This is a good start point.
> ping to and from a VRF works. Basic tcp and udp clients and server all
> work fine with VRFs.
>
> Signed-off-by: David Ahern 
> ---
>  net/ipv6/addrconf.c   |  4 +++-
>  net/ipv6/datagram.c   |  4 
>  net/ipv6/icmp.c   |  6 +-
>  net/ipv6/ip6_fib.c|  1 +
>  net/ipv6/ip6_output.c |  6 --
>  net/ipv6/ndisc.c  |  9 +++--
>  net/ipv6/route.c  | 17 +++--
>  7 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 75d3dde32c69..f4677a9c01ac 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -81,6 +81,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2179,8 +2180,9 @@ static struct rt6_info *addrconf_get_prefix_route(const 
> struct in6_addr *pfx,
> struct fib6_node *fn;
> struct rt6_info *rt = NULL;
> struct fib6_table *table;
> +   u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
>
> -   table = fib6_get_table(dev_net(dev), RT6_TABLE_PREFIX);
> +   table = fib6_get_table(dev_net(dev), tb_id);
> if (!table)
> return NULL;
>
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index 9aadd57808a5..11980ee57507 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -142,6 +142,10 @@ static int __ip6_datagram_connect(struct sock *sk, 
> struct sockaddr *uaddr, int a
> err = -EINVAL;
> goto out;
> }
> +   } else if (sk->sk_bound_dev_if &&
> +  netif_index_is_l3_master(sock_net(sk),

I suppose I have the same issues with this that were put in the IPv4
code path. Core IPv6 code should not care about any specific network
interfaces other than maybe loopback. Generalizing VPF to be l3m
doesn't really address this point. Have you looked at abstracting more
of this into the ndo functions (i.e. for source address selection) or
routing lookup?

Tom


> +   sk->sk_bound_dev_if)) {
> +   fl6.flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
> }
>
> sk->sk_v6_daddr = *daddr;
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index 6c2b2132c8d3..efb1c00f2270 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -68,6 +68,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -496,6 +497,9 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
> code, __u32 info)
> else if (!fl6.flowi6_oif)
> fl6.flowi6_oif = np->ucast_oif;
>
> +   if (!fl6.flowi6_oif)
> +   fl6.flowi6_oif = l3mdev_master_ifindex(skb->dev);
> +
> dst = icmpv6_route_lookup(net, skb, sk, );
> if (IS_ERR(dst))
> goto out;
> @@ -575,7 +579,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
> fl6.daddr = ipv6_hdr(skb)->saddr;
> if (saddr)
> fl6.saddr = *saddr;
> -   fl6.flowi6_oif = skb->dev->ifindex;
> +   fl6.flowi6_oif = l3mdev_fib_oif(skb->dev);
> fl6.fl6_icmp_type = ICMPV6_ECHO_REPLY;
> fl6.flowi6_mark = mark;
> security_skb_classify_flow(skb, flowi6_to_flowi());
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 418d9823692b..318cf5a34ca5 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -259,6 +259,7 @@ struct fib6_table *fib6_get_table(struct net *net, u32 id)
>
> return NULL;
>  }
> +EXPORT_SYMBOL_GPL(fib6_get_table);
>
>  static void __net_init fib6_tables_init(struct net *net)
>  {
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 291a07be5dfb..bbd752cef5c2 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -55,6 +55,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  static int ip6_finish_output2(struct sock *sk, struct sk_buff *skb)
>  {
> @@ -874,7 +875,8 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
>  #ifdef CONFIG_IPV6_SUBTREES
> ip6_rt_check(>rt6i_src, >saddr, np->saddr_cache) ||
>  #endif
> -   (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex)) {
> +  (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC) &&
> + (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex))) {
> dst_release(dst);
> dst = NULL;
> }
> @@ -1026,7 +1028,7 @@ struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, 
> struct flowi6 *fl6,
> if (final_dst)
> fl6->daddr = *final_dst;
> if (!fl6->flowi6_oif)
> -   fl6->flowi6_oif = dst->dev->ifindex;
> +   fl6->flowi6_oif = l3mdev_fib_oif(dst->dev);
>
> return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 
> 0);
>  }
> diff --git 

Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Eric Dumazet
On Mon, 2015-09-21 at 18:04 -0700, Pravin Shelar wrote:
> On Mon, Sep 21, 2015 at 5:14 PM, David Miller  wrote:
> > From: Pravin B Shelar 
> > Date: Sun, 20 Sep 2015 23:53:17 -0700
> >
> >> VXLAN device can receive skb with checksum partial. But the checksum
> >> offset could be in outer header which is pulled on receive.
> >
> > Such a scenerio is a bug.
> >
> > Anything that pulls off a header should use a utility function such
> > as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
> > fixed up properly.
> 
> skb_postpull_rcsum() does not change checksum-offset. vxlan receive
> already calls this function.

Then the bug is here.

Otherwise we might have to 'fix' other places.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] kcm: Kernel Connection Multiplexor (KCM)

2015-09-21 Thread Sowmini Varadhan
On (09/21/15 15:36), Tom Herbert wrote:
> segments. What we need to do, which you're probably doing for RDS, is
> do message delineation on the stream as a sequence of:
> 
> 1) Read protocol header to determine message length (BPF used here)

right, that's what rds does- first reads the sizeof(rds_header),
and from that, figures out payload len, to stitch each rds dgram 
together from intermediate tcp segments..

> 2) Read data up to the length of the message
> 3) Deliver message
> 4) Goto #1 (i.e. process next message in the stream).

Thanks for the rest of the responses.

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 197b:0250 JMicron JMC250 Gigabit ethernet doesn't work

2015-09-21 Thread Guo-Fu Tseng
Dear druchaty:

Woule it get link if you set set the speed to 100M?

If it is, then it is a HW-PHY bug, nothing we can do in driver except set the 
speed to 100M.


On Sat, 19 Sep 2015 21:01:28 +0300,  wrote
> [1.] One line summary of the problem: 197b:0250 JMicron JMC250 Gigabit
> ethernet doesn't work
> 
> [2.] Full description of the problem/report:
> Laptop ASUS X52JU can't connect to the router ASUS RT-AC68U via
> ethernet. NetworkManager shows that cable is unplugged. Router has
> Gigabit ethernet ports and laptop doesn't see them. I found that it's
> a bug of the JMC kernel module.
> 
> [4.] Kernel version (from /proc/version):
> Linux version 4.3.0-040300rc1-generic (kernel@gomeisa) (gcc version
> 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #201509160642 SMP Wed Sep 16
> 10:44:16 UTC 2015
> 
> [7.] Environment
> Description: Ubuntu 14.04.3 LTS
> Release: 14.04
> 
> [7.1.] Software
> Linux nick-notebook 4.3.0-040300rc1-generic #201509160642 SMP Wed Sep
> 16 10:44:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> Gnu C  4.8
> Gnu make   3.81
> binutils   2.24
> util-linux 2.20.1
> mount  support
> module-init-tools  15
> e2fsprogs  1.42.9
> pcmciautils018
> PPP2.4.5
> Linux C Library2.19
> Dynamic linker (ldd)   2.19
> Procps 3.3.9
> Net-tools  1.60
> Kbd1.15.5
> Sh-utils   8.21
> wireless-tools 30
> Modules Loaded nls_iso8859_1 nls_utf8 isofs uas usb_storage
> drbg ansi_cprng ctr ccm rfcomm bnep arc4 ath9k intel_powerclamp
> coretemp ath9k_common ath9k_hw uvcvideo amdkfd videobuf2_vmalloc
> videobuf2_memops amd_iommu_v2 ath radeon videobuf2_core mac80211
> v4l2_common kvm_intel videodev hid_logitech_hidpp media ttm
> drm_kms_helper snd_hda_codec_conexant snd_hda_codec_generic
> snd_hda_codec_hdmi snd_hda_intel snd_hda_codec drm kvm cfg80211
> snd_hda_core snd_seq_midi snd_seq_midi_event snd_rawmidi snd_hwdep
> i2c_algo_bit btusb fb_sys_fops syscopyarea snd_pcm sysfillrect
> sysimgblt snd_seq jmb38x_ms lpc_ich joydev btrtl snd_seq_device
> memstick input_leds btbcm btintel serio_raw bluetooth snd_timer shpchp
> mei_me snd soundcore mei asus_laptop sparse_keymap input_polldev
> parport_pc video ppdev mac_hid lp parport hid_logitech_dj usbhid hid
> psmouse ahci libahci jme mii sdhci_pci sdhci fjes
> 
> [7.2.] Processor information (from /proc/cpuinfo):
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 37
> model name : Intel(R) Core(TM) i3 CPU   M 380  @ 2.53GHz
> stepping : 5
> microcode : 0x2
> cpu MHz : 1066.000
> cache size : 3072 KB
> physical id : 0
> siblings : 4
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
> cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm arat dtherm
> tpr_shadow vnmi flexpriority ept vpid
> bugs :
> bogomips : 5053.72
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management:
> 
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 6
> model : 37
> model name : Intel(R) Core(TM) i3 CPU   M 380  @ 2.53GHz
> stepping : 5
> microcode : 0x2
> cpu MHz : 1066.000
> cache size : 3072 KB
> physical id : 0
> siblings : 4
> core id : 2
> cpu cores : 2
> apicid : 4
> initial apicid : 4
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
> cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm arat dtherm
> tpr_shadow vnmi flexpriority ept vpid
> bugs :
> bogomips : 5053.72
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management:
> 
> processor : 2
> vendor_id : GenuineIntel
> cpu family : 6
> model : 37
> model name : Intel(R) Core(TM) i3 CPU   M 380  @ 2.53GHz
> stepping : 5
> microcode : 0x2
> cpu MHz : 933.000
> cache size : 3072 KB
> physical id : 0
> siblings : 4
> core id : 0
> cpu cores : 2
> apicid : 1
> initial apicid : 1
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
> cx16 xtpr pdcm 

Re: [PATCH] lib: fix data race in rhashtable_rehash_one

2015-09-21 Thread Eric Dumazet
On Tue, 2015-09-22 at 00:25 +0200, Thomas Graf wrote:
> On 09/21/15 at 07:51am, Eric Dumazet wrote:
> > The important part here is that we rehash an item, so we need to make
> > sure to maintain consistent ->next field, and need to prevent compiler
> > from using ->next as a temporary variable.
> > 
> > ptr->next = 1UL | ((base + offset) << 1);
> > 
> > Is dangerous because compiler could issue :
> > 
> > ptr->next = (base + offset);
> > 
> > ptr->next <<= 1;
> > 
> > ptr->next += 1UL;
> > 
> > Frankly, all this looks like an oversight in this code.
> > 
> > Not sure why the NULLS value is even recomputed.
> 
> The hash of the chain is part of the NULLS value. Since the
> entry might have been moved to a different chain, the NULLS
> value must be recalculated to contain the proper hash.
> 
> However, nobody is using the hash today as far as I can
> see so we could as well just remove it and use the base
> value only for the nulls marker.

What I said is :

In @head you already have the correct nulls value, from hash table.

You do not need to recompute this value, and/or test if hash table chain
is empty.

If hash bucket is empty, it contains the appropriate NULLS value.

If you are paranoiac add this debugging check :

if (rht_is_a_nulls(head))
BUG_ON(head != (struct rhash_head *)rht_marker(ht, new_hash));


Therefore, simply fix the bug and unnecessary code with :

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c69710dcf..a54ff8949f91 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
unsigned int old_hash)
head = rht_dereference_bucket(new_tbl->buckets[new_hash],
  new_tbl, new_hash);
 
-   if (rht_is_a_nulls(head))
-   INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
-   else
-   RCU_INIT_POINTER(entry->next, head);
+   RCU_INIT_POINTER(entry->next, head);
 
rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
spin_unlock(new_bucket_lock);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread David Miller
From: Pravin B Shelar 
Date: Sun, 20 Sep 2015 23:53:17 -0700

> VXLAN device can receive skb with checksum partial. But the checksum
> offset could be in outer header which is pulled on receive.

Such a scenerio is a bug.

Anything that pulls off a header should use a utility function such
as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
fixed up properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH 0/4] Add support for reading macid when DT macid not found

2015-09-21 Thread David Miller
From: Mugunthan V N 
Date: Mon, 21 Sep 2015 15:56:49 +0530

> Did a boot test on dra7-evm [1] and am437x-gp-evm [2].
> Pushed a branch [3] for others to test the patch.
> 
> [1]: http://pastebin.ubuntu.com/12513420/
> [2]: http://pastebin.ubuntu.com/12513428/
> [3]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git 
> cpsw-macid-read-support

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: actually force the speed on the CPU port

2015-09-21 Thread Andrew Lunn
On Mon, Sep 21, 2015 at 09:42:59PM +0100, Russell King wrote:
> Commit 54d792f257c6 ("net: dsa: Centralise global and port setup
> code into mv88e6xxx.") merged in the 4.2 merge window broke the link
> speed forcing for the CPU port of Marvell DSA switches.  The original
> code was:
> 
> /* MAC Forcing register: don't force link, speed, duplex
>  * or flow control state to any particular values on physical
>  * ports, but force the CPU port and all DSA ports to 1000 Mb/s
>  * full duplex.
>  */
> if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p))
> REG_WRITE(addr, 0x01, 0x003e);
> else
> REG_WRITE(addr, 0x01, 0x0003);
> 
> but the new code does a read-modify-write:
> 
> reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
> if (dsa_is_cpu_port(ds, port) ||
> ds->dsa_port_mask & (1 << port)) {
> reg |= PORT_PCS_CTRL_FORCE_LINK |
> PORT_PCS_CTRL_LINK_UP |
> PORT_PCS_CTRL_DUPLEX_FULL |
> PORT_PCS_CTRL_FORCE_DUPLEX;
> if (mv88e6xxx_6065_family(ds))
> reg |= PORT_PCS_CTRL_100;
> else
> reg |= PORT_PCS_CTRL_1000;
> 
> The link speed in the PCS control register is a two bit field.  Forcing
> the link speed in this way doesn't ensure that the bit field is set to
> the correct value - on the hardware I have here, the speed bitfield
> remains set to 0x03, resulting in the speed not being forced to gigabit.
> 
> We must clear both bits before forcing the link speed.
> 
> Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into 
> mv88e6xxx.")
> Signed-off-by: Russell King 

Acked-by: Andrew Lunn 

Thanks for fixing this Russell

   Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Pravin Shelar
On Mon, Sep 21, 2015 at 7:14 PM, Eric Dumazet  wrote:
> On Mon, 2015-09-21 at 18:04 -0700, Pravin Shelar wrote:
>> On Mon, Sep 21, 2015 at 5:14 PM, David Miller  wrote:
>> > From: Pravin B Shelar 
>> > Date: Sun, 20 Sep 2015 23:53:17 -0700
>> >
>> >> VXLAN device can receive skb with checksum partial. But the checksum
>> >> offset could be in outer header which is pulled on receive.
>> >
>> > Such a scenerio is a bug.
>> >
>> > Anything that pulls off a header should use a utility function such
>> > as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
>> > fixed up properly.
>>
>> skb_postpull_rcsum() does not change checksum-offset. vxlan receive
>> already calls this function.
>
> Then the bug is here.
>
> Otherwise we might have to 'fix' other places.
>
I posted a patch to fix skb_postpull_rcsum() to handle this case. But
that was not accepted.
https://patchwork.ozlabs.org/patch/512625/

And specific solution for skb_checksum_help() was suggested.

http://marc.info/?l=linux-netdev=144108078931774=2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Alexander Duyck

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues such as
this, where the disconnect happened just before igb_configure_tx_ring():

[  414.440115] igb :15:00.0: enabling device ( -> 0002)
[  414.474934] pps pps0: new PPS source ptp1
[  414.474937] igb :15:00.0: added PHC on eth0
[  414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network Connection
[  414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1) e8:ea:6a:00:1b:2a
[  414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[  414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx 
queue(s)
[  414.478453] igb :15:00.0 enp21s0: renamed from eth0
[  414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[  414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now detached
[  414.854808] BUG: unable to handle kernel paging request at 3818
[  414.854827] IP: [] igb_configure_tx_ring+0x14c/0x250 [igb]
[  414.854846] PGD 0
[  414.854849] Oops: 0002 [#1] SMP
[  414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t igb dca 
ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT 
nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter bnep 
dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp 
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm 
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel
[  414.855073]  drbg ansi_cprng snd_hda_codec_realtek snd_hda_codec_generic 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
snd_hda_intel snd_hda_codec microcode snd_hda_core snd_hwdep snd_seq 
snd_seq_device snd_pcm iwlwifi uvcvideo btusb cfg80211 videobuf2_vmalloc 
videobuf2_memops btrtl btbcm videobuf2_core btintel bluetooth v4l2_common 
snd_timer videodev snd parport_pc rtsx_pci_ms joydev pcspkr input_leds i2c_i801 
media sg memstick rfkill soundcore lpc_ich 8250_fintek parport mei_me hp_accel 
ie31200_edac shpchp lis3lv02d mei edac_core input_polldev hp_wireless 
tpm_infineon sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel 
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e drm_kms_helper
[  414.855309]  ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[  414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted 
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[  414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 
Ver. 01.07 02/26/2015
[  414.855365] task: 880484698c00 ti: 88005859c000 task.ti: 
88005859c000
[  414.855380] RIP: 0010:[]  [] 
igb_configure_tx_ring+0x14c/0x250 [igb]
[  414.855401] RSP: 0018:88005859f608  EFLAGS: 00010246
[  414.855410] RAX: 3818 RBX:  RCX: 3818
[  414.855424] RDX:  RSI: 0008 RDI: 002a9fe6
[  414.855437] RBP: 88005859f638 R08: 03030300 R09: ffe7
[  414.855451] R10: 81fa91b4 R11: 07e3 R12: 
[  414.855464] R13: 880471c98840 R14: 8804670a1180 R15: 000483cce000
[  414.855478] FS:  7f389c6fb8c0() GS:88049dc8() 
knlGS:
[  414.855493] CS:  0010 DS:  ES:  CR0: 80050033
[  414.855504] CR2: 3818 CR3: 0004875da000 CR4: 001406e0
[  414.855518] Stack:
[  414.855520]  88005859f638 880471c98840 880471c98df8 
0001
[  414.855538]  880471c98848 0001 88005859f698 
a0b99cb0
[  414.85]  88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f 
f5454218094e72d1
[  414.855572] Call Trace:
[  414.855577]  [] igb_configure+0x240/0x400 [igb]
[  414.855590]  [] __igb_open+0xc2/0x560 [igb]
[  414.855602]  [] ? notifier_call_chain+0x4d/0x80
[  414.855614]  [] igb_open+0x10/0x20 [igb]
[  414.855625]  [] __dev_open+0xb1/0x130
[  414.855636]  [] __dev_change_flags+0xa1/0x160
[  414.855647]  [] dev_change_flags+0x29/0x60
[  414.855658]  [] do_setlink+0x5d3/0xaa0
[  414.855679]  [] ? nla_parse+0xa3/0x100
[  414.855689]  [] rtnl_newlink+0x4f0/0x880
[  414.855700]  [] ? rtnl_newlink+0xf3/0x880
[  414.855721]  [] ? netlink_unicast+0x1ae/0x220
[  414.855734]  [] ? security_capable+0x48/0x60
[  414.855746]  [] ? ns_capable+0x2d/0x60
[  414.855756]  [] rtnetlink_rcv_msg+0x95/0x240
[  414.855768]  [] ? sock_has_perm+0x70/0x90
[  414.855779]  [] ? rtnetlink_rcv+0x40/0x40
[ 

Re: [iproute2 net-next] ip route: Add RTM_F_LOOKUP_TABLE flag and show table id

2015-09-21 Thread David Miller
From: David Ahern 
Date: Mon, 21 Sep 2015 15:28:53 -0600

> On 9/21/15 3:19 PM, Stephen Hemminger wrote:
>>> @@ -1638,6 +1638,8 @@ static int iproute_get(int argc, char **argv)
>>> if (req.r.rtm_family == AF_UNSPEC)
>>> req.r.rtm_family = AF_INET;
>>>
>>> +   req.r.rtm_flags |= RTM_F_LOOKUP_TABLE;
>>> +
>>> if (rtnl_talk(, , , sizeof(req)) < 0)
>>> exit(2);
>>>
>>
>> How will this work (or not) on older kernels?
>>
> 
> It works just fine. First test used the wrong VM and was puzzled to
> not see the table id in the output. Then I realized the older kernel
> did not recognize the RTM_F_LOOKUP_TABLE; silently ignores the
> flag. With a kernel that does recognize it I get the table id in the
> output when it is not main.

I think if it always gave MAIN in older kernels, iproute should continue
to do so.

You can't just remove the table ID output just because you disagree with
the semantics given by old kernels.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute2 net-next] ip route: Add RTM_F_LOOKUP_TABLE flag and show table id

2015-09-21 Thread David Miller
From: David Ahern 
Date: Mon, 21 Sep 2015 16:03:00 -0600

> On 9/21/15 3:58 PM, David Miller wrote:
>> I think if it always gave MAIN in older kernels, iproute should
>> continue
>> to do so.
>>
>> You can't just remove the table ID output just because you disagree
>> with
>> the semantics given by old kernels.
>>
> 
> Current semantics are maintained. Kernel was hardcoded to return main;
> iproute2 was hardcoded to not show main.

Since iproute2 always showed MAIN, it should conitnue to do so when
run on older kernels.

And again this is regardless of whether you disagree with those
semantics or not.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute2 net-next] ip route: Add RTM_F_LOOKUP_TABLE flag and show table id

2015-09-21 Thread David Ahern

On 9/21/15 3:58 PM, David Miller wrote:

I think if it always gave MAIN in older kernels, iproute should continue
to do so.

You can't just remove the table ID output just because you disagree with
the semantics given by old kernels.



Current semantics are maintained. Kernel was hardcoded to return main; 
iproute2 was hardcoded to not show main.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] phy: fix of_mdio_find_bus() device refcount leak

2015-09-21 Thread David Miller
From: Russell King - ARM Linux 
Date: Mon, 21 Sep 2015 20:32:07 +0100

> In the case of the mdio mux code, I'm dropping the reference when
> either (a) we've encountered an error during initialisation and
> we're cleaning up, or (b) when the mdio mux code is being torn down
> after the mdiomux bus has been unregistered and freed.  In both
> cases, we're done with the mdio bus that was returned from
> of_mdio_find_bus().
> 
> In case (a), the devres code will release the kmalloc'd memory when
> mdio_mux_gpio_probe() or mdio_mux_mmioreg_probe() propagates the error
> out of their probe() function.
> 
> I'm not sure why you think anything is wrong here - maybe it's the odd
> code structure to the success path at the bottom of mdio_mux_init()?

Ok I may have misread your change.  I'll restudy it when you respin
the series with the commit message fixed and the DSA change added.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute2 net-next] ip route: Add RTM_F_LOOKUP_TABLE flag and show table id

2015-09-21 Thread David Ahern

On 9/21/15 4:03 PM, David Miller wrote:

From: David Ahern 
Date: Mon, 21 Sep 2015 16:03:00 -0600


On 9/21/15 3:58 PM, David Miller wrote:

I think if it always gave MAIN in older kernels, iproute should
continue
to do so.

You can't just remove the table ID output just because you disagree
with
the semantics given by old kernels.



Current semantics are maintained. Kernel was hardcoded to return main;
iproute2 was hardcoded to not show main.


Since iproute2 always showed MAIN, it should conitnue to do so when
run on older kernels.

And again this is regardless of whether you disagree with those
semantics or not.



Dave:

ip does *not* show the table id or string today:

root@vm-wheezy2:~# ip route get 10.2.1.254
10.2.1.254 dev eth1 src 10.2.1.2
cache


With the new flag a AND kernel that supports it ip will only show the 
table id IF it is not main:


root@vm-wheezy2:~# ./ip route get 10.0.0.20
10.0.0.20 dev eth0  src 10.0.0.2
cache

root@vm-wheezy2:~# ./ip route get 10.2.1.254
10.2.1.254 dev eth1  table 10  src 10.2.1.2
cache

That's my point. I have not changed existing users.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib: fix data race in rhashtable_rehash_one

2015-09-21 Thread Thomas Graf
On 09/21/15 at 07:51am, Eric Dumazet wrote:
> The important part here is that we rehash an item, so we need to make
> sure to maintain consistent ->next field, and need to prevent compiler
> from using ->next as a temporary variable.
> 
> ptr->next = 1UL | ((base + offset) << 1);
> 
> Is dangerous because compiler could issue :
> 
> ptr->next = (base + offset);
> 
> ptr->next <<= 1;
> 
> ptr->next += 1UL;
> 
> Frankly, all this looks like an oversight in this code.
> 
> Not sure why the NULLS value is even recomputed.

The hash of the chain is part of the NULLS value. Since the
entry might have been moved to a different chain, the NULLS
value must be recalculated to contain the proper hash.

However, nobody is using the hash today as far as I can
see so we could as well just remove it and use the base
value only for the nulls marker.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] openvswitch: Zero flows on allocation.

2015-09-21 Thread Jesse Gross
When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross 
---
v2: This uses a different strategy from v1 since I realized that there
are other code paths that could theoretically hit the same problem.
Depending on the situation, this might also be faster too since it zeros
less memory.
---
 net/openvswitch/datapath.c   |  4 ++--
 net/openvswitch/flow_table.c | 23 ---
 net/openvswitch/flow_table.h |  2 +-
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6fbd2de..b816ff8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -952,7 +952,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
if (error)
goto err_kfree_flow;
 
-   ovs_flow_mask_key(_flow->key, , );
+   ovs_flow_mask_key(_flow->key, , true, );
 
/* Extract flow identifier. */
error = ovs_nla_get_identifier(_flow->id, a[OVS_FLOW_ATTR_UFID],
@@ -1080,7 +1080,7 @@ static struct sw_flow_actions *get_flow_actions(struct 
net *net,
struct sw_flow_key masked_key;
int error;
 
-   ovs_flow_mask_key(_key, key, mask);
+   ovs_flow_mask_key(_key, key, true, mask);
error = ovs_nla_copy_actions(net, a, _key, , log);
if (error) {
OVS_NLERR(log,
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index d22d8e9..f2ea83b 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -57,20 +57,21 @@ static u16 range_n_bytes(const struct sw_flow_key_range 
*range)
 }
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
-  const struct sw_flow_mask *mask)
+  bool full, const struct sw_flow_mask *mask)
 {
-   const long *m = (const long *)((const u8 *)>key +
-   mask->range.start);
-   const long *s = (const long *)((const u8 *)src +
-   mask->range.start);
-   long *d = (long *)((u8 *)dst + mask->range.start);
+   int start = full ? 0 : mask->range.start;
+   int len = full ? sizeof *dst : range_n_bytes(>range);
+   const long *m = (const long *)((const u8 *)>key + start);
+   const long *s = (const long *)((const u8 *)src + start);
+   long *d = (long *)((u8 *)dst + start);
int i;
 
-   /* The memory outside of the 'mask->range' are not set since
-* further operations on 'dst' only uses contents within
-* 'mask->range'.
+   /* If 'full' is true then all of 'dst' is fully initialized. Otherwise,
+* if 'full' is false the memory outside of the 'mask->range' is left
+* uninitialized. This can be used as an optimization when further
+* operations on 'dst' only use contents within 'mask->range'.
 */
-   for (i = 0; i < range_n_bytes(>range); i += sizeof(long))
+   for (i = 0; i < len; i += sizeof(long))
*d++ = *s++ & *m++;
 }
 
@@ -475,7 +476,7 @@ static struct sw_flow *masked_flow_lookup(struct 
table_instance *ti,
u32 hash;
struct sw_flow_key masked_key;
 
-   ovs_flow_mask_key(_key, unmasked, mask);
+   ovs_flow_mask_key(_key, unmasked, false, mask);
hash = flow_hash(_key, >range);
head = find_bucket(ti, hash);
hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) {
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 616eda1..2dd9900 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -86,5 +86,5 @@ struct sw_flow *ovs_flow_tbl_lookup_ufid(struct flow_table *,
 bool ovs_flow_cmp(const struct sw_flow *, const 

Re: [PATCH net] net: Handle negative checksum offset in skb-checksum-help

2015-09-21 Thread Eric Dumazet
On Mon, 2015-09-21 at 19:49 -0700, Pravin Shelar wrote:
> On Mon, Sep 21, 2015 at 7:14 PM, Eric Dumazet  wrote:
> > On Mon, 2015-09-21 at 18:04 -0700, Pravin Shelar wrote:
> >> On Mon, Sep 21, 2015 at 5:14 PM, David Miller  wrote:
> >> > From: Pravin B Shelar 
> >> > Date: Sun, 20 Sep 2015 23:53:17 -0700
> >> >
> >> >> VXLAN device can receive skb with checksum partial. But the checksum
> >> >> offset could be in outer header which is pulled on receive.
> >> >
> >> > Such a scenerio is a bug.
> >> >
> >> > Anything that pulls off a header should use a utility function such
> >> > as skb_pull_rcsum() or skb_postpull_rcsum() to make sure this gets
> >> > fixed up properly.
> >>
> >> skb_postpull_rcsum() does not change checksum-offset. vxlan receive
> >> already calls this function.
> >
> > Then the bug is here.
> >
> > Otherwise we might have to 'fix' other places.
> >
> I posted a patch to fix skb_postpull_rcsum() to handle this case. But
> that was not accepted.
> https://patchwork.ozlabs.org/patch/512625/
> 
> And specific solution for skb_checksum_help() was suggested.
> 
> http://marc.info/?l=linux-netdev=144108078931774=2

If we pull a header where the csum is, then for sure CHECKSUM_PARTIAL
becomes buggy and void.

Tom was not advocating doing an operation (skb_postpull_rcsum()) leaving
skb in a wrong state.

We should fix callers that are pulling header in such a way.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] netlink: Replace rhash_portid with bound

2015-09-21 Thread Herbert Xu
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way.  You have to pair store_release
> and load_acquire.  Besides, it isn't a particularly good idea to

OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.

> depend on memory barriers embedded in other data structures like the
> above.  Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.

But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.

> There's no reason to be overly smart here.  This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.

It's not about being overly smart.  It's about actually understanding
what's going on with the code.  I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.

> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct 
> > sockaddr *addr,
> > }
> > }
> >  
> > -   if (!nlk->portid) {
> > +   if (!nlk->bound) {
> 
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable.  That doesn't change anything.  Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.

The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.

However, there is a real bug here that none of these acquire/release
helpers discovered.  The two bound tests here used to be a single
one.  Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket.  So we need to
repeat the portid check in order to maintain consistency.

> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, 
> > struct sockaddr *addr,
> > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> > return -EPERM;
> >  
> > -   if (!nlk->portid)
> > +   if (!nlk->bound)
> 
> Don't we need load_acquire here too?  Is this path holding a lock
> which makes that unnecessary?

Ditto.

---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.

Tejun is right that a barrier is unavoidable.  Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.

Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.

I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one.  This
combined with only reading nlk->bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero 
port ID")
Reported-by: Tejun Heo 
Reported-by: Linus Torvalds 
Signed-off-by: Herbert Xu 

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 303efb7..2c15fae 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1015,7 +1015,7 @@ static inline int netlink_compare(struct 
rhashtable_compare_arg *arg,
const struct netlink_compare_arg *x = arg->key;
const struct netlink_sock *nlk = ptr;
 
-   return nlk->rhash_portid != x->portid ||
+   return nlk->portid != x->portid ||
   !net_eq(sock_net(>sk), read_pnet(>pnet));
 }
 
@@ -1041,7 +1041,7 @@ static int __netlink_insert(struct netlink_table *table, 
struct sock *sk)
 {
struct netlink_compare_arg arg;
 
-   netlink_compare_arg_init(, sock_net(sk), nlk_sk(sk)->rhash_portid);
+   netlink_compare_arg_init(, sock_net(sk), nlk_sk(sk)->portid);
return rhashtable_lookup_insert_key(>hash, ,
_sk(sk)->node,
netlink_rhashtable_params);
@@ -1094,8 +1094,8 @@ static int netlink_insert(struct sock *sk, u32 portid)
 
lock_sock(sk);
 
-   err = -EBUSY;
-   if (nlk_sk(sk)->portid)
+   err = nlk_sk(sk)->portid == portid ? 0 : -EBUSY;
+

Re: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq Ultrascale+ MPSoC

2015-09-21 Thread Harini Katakam
Hi Richard,

On Tue, Sep 22, 2015 at 12:09 AM, Richard Cochran
 wrote:
> On Mon, Sep 21, 2015 at 11:19:32PM +0530, Harini Katakam wrote:
>> Ping
>
> 1) trim your replies
>
> 2) put the PTP maintainer on PTP patches for review
>

I'm sorry I missed that. Will do so in the future.

Regards,
Harini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Alexander Duyck

On 09/21/2015 09:14 PM, Jarod Wilson wrote:

Alexander Duyck wrote:

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues
such as
this, where the disconnect happened just before igb_configure_tx_ring():

[ 414.440115] igb :15:00.0: enabling device ( -> 0002)
[ 414.474934] pps pps0: new PPS source ptp1
[ 414.474937] igb :15:00.0: added PHC on eth0
[ 414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network
Connection
[ 414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1)
e8:ea:6a:00:1b:2a
[ 414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[ 414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s),
4 tx queue(s)
[ 414.478453] igb :15:00.0 enp21s0: renamed from eth0
[ 414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[ 414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now
detached
[ 414.854808] BUG: unable to handle kernel paging request at
3818
[ 414.854827] IP: []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.854846] PGD 0
[ 414.854849] Oops: 0002 [#1] SMP
[ 414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t
igb dca ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iptable_filter bnep dm_mirror
dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel
[ 414.855073] drbg ansi_cprng snd_hda_codec_realtek
snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd snd_hda_intel snd_hda_codec microcode snd_hda_core
snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi uvcvideo btusb
cfg80211 videobuf2_vmalloc videobuf2_memops btrtl btbcm videobuf2_core
btintel bluetooth v4l2_common snd_timer videodev snd parport_pc
rtsx_pci_ms joydev pcspkr input_leds i2c_i801 media sg memstick rfkill
soundcore lpc_ich 8250_fintek parport mei_me hp_accel ie31200_edac
shpchp lis3lv02d mei edac_core input_polldev hp_wireless tpm_infineon
sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e
drm_kms_helper
[ 414.855309] ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[ 414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[ 414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS
M70 Ver. 01.07 02/26/2015
[ 414.855365] task: 880484698c00 ti: 88005859c000 task.ti:
88005859c000
[ 414.855380] RIP: 0010:[] []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.855401] RSP: 0018:88005859f608 EFLAGS: 00010246
[ 414.855410] RAX: 3818 RBX:  RCX:
3818
[ 414.855424] RDX:  RSI: 0008 RDI:
002a9fe6
[ 414.855437] RBP: 88005859f638 R08: 03030300 R09:
ffe7
[ 414.855451] R10: 81fa91b4 R11: 07e3 R12:

[ 414.855464] R13: 880471c98840 R14: 8804670a1180 R15:
000483cce000
[ 414.855478] FS: 7f389c6fb8c0() GS:88049dc8()
knlGS:
[ 414.855493] CS: 0010 DS:  ES:  CR0: 80050033
[ 414.855504] CR2: 3818 CR3: 0004875da000 CR4:
001406e0
[ 414.855518] Stack:
[ 414.855520] 88005859f638 880471c98840 880471c98df8
0001
[ 414.855538] 880471c98848 0001 88005859f698
a0b99cb0
[ 414.85] 88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f
f5454218094e72d1
[ 414.855572] Call Trace:
[ 414.855577] [] igb_configure+0x240/0x400 [igb]
[ 414.855590] [] __igb_open+0xc2/0x560 [igb]
[ 414.855602] [] ? notifier_call_chain+0x4d/0x80
[ 414.855614] [] igb_open+0x10/0x20 [igb]
[ 414.855625] [] __dev_open+0xb1/0x130
[ 414.855636] [] __dev_change_flags+0xa1/0x160
[ 414.855647] [] dev_change_flags+0x29/0x60
[ 414.855658] [] do_setlink+0x5d3/0xaa0
[ 414.855679] [] ? nla_parse+0xa3/0x100
[ 414.855689] [] rtnl_newlink+0x4f0/0x880
[ 414.855700] [] ? rtnl_newlink+0xf3/0x880
[ 414.855721] [] ? netlink_unicast+0x1ae/0x220
[ 414.855734] [] ? security_capable+0x48/0x60
[ 414.855746] [] ? ns_capable+0x2d/0x60
[ 414.855756] [] rtnetlink_rcv_msg+0x95/0x240
[ 414.855768] [] ? sock_has_perm+0x70/0x90
[ 414.855779] [] ? rtnetlink_rcv+0x40/0x40
[ 414.855789] [] 

[PATCH 13/17] net: gianfar: remove misuse of IRQF_NO_SUSPEND flag

2015-09-21 Thread Sudeep Holla
The device is set as wakeup capable using proper wakeup API but the
driver misuses IRQF_NO_SUSPEND to set the interrupt as wakeup source
which is incorrect.

This patch removes the use of IRQF_NO_SUSPEND flags replacing it with
enable_irq_wake instead.

Cc: "David S. Miller" 
Cc: Claudiu Manoil 
Cc: Kevin Hao 
Cc: netdev@vger.kernel.org
Signed-off-by: Sudeep Holla 
---
 drivers/net/ethernet/freescale/gianfar.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 4b69d061d90f..803ed4c93503 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1970,8 +1970,7 @@ static int register_grp_irqs(struct gfar_priv_grp *grp)
/* Install our interrupt handlers for Error,
 * Transmit, and Receive
 */
-   err = request_irq(gfar_irq(grp, ER)->irq, gfar_error,
- IRQF_NO_SUSPEND,
+   err = request_irq(gfar_irq(grp, ER)->irq, gfar_error, 0,
  gfar_irq(grp, ER)->name, grp);
if (err < 0) {
netif_err(priv, intr, dev, "Can't get IRQ %d\n",
@@ -1979,6 +1978,8 @@ static int register_grp_irqs(struct gfar_priv_grp *grp)
 
goto err_irq_fail;
}
+   enable_irq_wake(gfar_irq(grp, ER)->irq);
+
err = request_irq(gfar_irq(grp, TX)->irq, gfar_transmit, 0,
  gfar_irq(grp, TX)->name, grp);
if (err < 0) {
@@ -1994,14 +1995,14 @@ static int register_grp_irqs(struct gfar_priv_grp *grp)
goto rx_irq_fail;
}
} else {
-   err = request_irq(gfar_irq(grp, TX)->irq, gfar_interrupt,
- IRQF_NO_SUSPEND,
+   err = request_irq(gfar_irq(grp, TX)->irq, gfar_interrupt, 0,
  gfar_irq(grp, TX)->name, grp);
if (err < 0) {
netif_err(priv, intr, dev, "Can't get IRQ %d\n",
  gfar_irq(grp, TX)->irq);
goto err_irq_fail;
}
+   enable_irq_wake(gfar_irq(grp, TX)->irq);
}
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LOAN!

2015-09-21 Thread JM Financial
Welcome to JM Financial. We give out loans of all kinds. If you are in need of 
urgent loan kindly contact us now for instant approval for just 2% interest 
rate.

APPLICATION FORM
NAME...
COUNTRY
AMOUNT
PHONE NUMBER.
ADDRESS
DURATION...
PURPOSE
Email:.

Mr Mr. Prakash Lass Dickson
C.E.O JM Financial Services Ltd.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/15] RDS: connection scalability and performance improvements

2015-09-21 Thread santosh shilimkar

On 9/20/2015 1:37 AM, Sagi Grimberg wrote:

On 9/20/2015 2:04 AM, Santosh Shilimkar wrote:

This series addresses RDS connection bottlenecks on massive workloads and
improve the RDMA performance almost by 3X. RDS TCP also gets a small gain
of about 12%.

RDS is being used in massive systems with high scalability where several
hundred thousand end points and tens of thousands of local processes
are operating in tens of thousand sockets. Being RC(reliable connection),
socket bind and release happens very often and any inefficiencies in
bind hash look ups hurts the overall system performance. RDS bin
hash-table
uses global spin-lock which is the biggest bottleneck. To make matter
worst,
it uses rcu inside global lock for hash buckets.
This is being addressed by simply using per bucket rw lock which makes
the
locking simple and very efficient. The hash table size is also scaled up
accordingly.

For RDS RDMA improvement, the completion handling is revamped so that we
can do batch completions. Both send and receive completion handlers are
split logically to achieve the same. RDS 8K messages being one of the
key usecase, mr pool is adapted to have the 8K mrs along with default 1M
mrs. And while doing this, few fixes and couple of bottlenecks seen with
rds_sendmsg() are addressed.


Hi Santosh,

I think that can get a more effective code review if you CC the
Linux-rdma mailing list.


I will do that from next time. Thanks Sagi !!

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] 8139cp: Do not re-enable RX interrupts in cp_tx_timeout()

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

If an RX interrupt was already received but NAPI has not yet run when
the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
already disabled. Blindly re-enabling them will cause an IRQ storm.

This is somewhat less painful than it was a few minutes ago before I
fixed the return value from cp_interrupt(), but still suboptimal.

Unconditionally leave RX interrupts disabled after the reset, and
schedule NAPI to check the receive ring and re-enable them.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index f1054ad..d12fc50 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1269,9 +1269,10 @@ static void cp_tx_timeout(struct net_device *dev)
rc = cp_init_rings(cp);
cp_start_hw(cp);
__cp_set_rx_mode(dev);
-   cp_enable_irq(cp);
+   cpw16_f(IntrMask, cp_norx_intr_mask);
 
netif_wake_queue(dev);
+   napi_schedule_irqoff(>napi);
 
spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

The TX timeout handling has been observed to trigger RX IRQ storms. And
since cp_interrupt() just keeps saying that it handled the interrupt,
the machine then dies. Fix the return value from cp_interrupt(), and
the offending IRQ gets disabled and the machine survives.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index ba3dab7..f1054ad 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -371,7 +371,7 @@ struct cp_private {
 
 
 static void __cp_set_rx_mode (struct net_device *dev);
-static void cp_tx (struct cp_private *cp);
+static int cp_tx (struct cp_private *cp);
 static void cp_clean_rings (struct cp_private *cp);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cp_poll_controller(struct net_device *dev);
@@ -587,8 +587,6 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
if (!status || (status == 0x))
goto out_unlock;
 
-   handled = 1;
-
netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
  status, cpr8(Cmd), cpr16(CpCmd));
 
@@ -596,25 +594,30 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
 
/* close possible race's with dev_close */
if (unlikely(!netif_running(dev))) {
+   handled = 1;
cpw16(IntrMask, 0);
goto out_unlock;
}
 
-   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr))
+   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr)) {
if (napi_schedule_prep(>napi)) {
+   handled = 1;
cpw16_f(IntrMask, cp_norx_intr_mask);
__napi_schedule(>napi);
}
-
+   }
if (status & (TxOK | TxErr | TxEmpty | SWInt))
-   cp_tx(cp);
-   if (status & LinkChg)
-   mii_check_media(>mii_if, netif_msg_link(cp), false);
+   handled |= cp_tx(cp);
 
+   if (status & LinkChg) {
+   handled = 1;
+   mii_check_media(>mii_if, netif_msg_link(cp), false);
+   }
 
if (status & PciErr) {
u16 pci_status;
 
+   handled = 1;
pci_read_config_word(cp->pdev, PCI_STATUS, _status);
pci_write_config_word(cp->pdev, PCI_STATUS, pci_status);
netdev_err(dev, "PCI bus error, status=%04x, PCI status=%04x\n",
@@ -645,11 +648,12 @@ static void cp_poll_controller(struct net_device *dev)
 }
 #endif
 
-static void cp_tx (struct cp_private *cp)
+static int cp_tx (struct cp_private *cp)
 {
unsigned tx_head = cp->tx_head;
unsigned tx_tail = cp->tx_tail;
unsigned bytes_compl = 0, pkts_compl = 0;
+   int handled = 0;
 
while (tx_tail != tx_head) {
struct cp_desc *txd = cp->tx_ring + tx_tail;
@@ -661,6 +665,7 @@ static void cp_tx (struct cp_private *cp)
if (status & DescOwn)
break;
 
+   handled = 1;
skb = cp->tx_skb[tx_tail];
BUG_ON(!skb);
 
@@ -704,6 +709,8 @@ static void cp_tx (struct cp_private *cp)
netdev_completed_queue(cp->dev, pkts_compl, bytes_compl);
if (TX_BUFFS_AVAIL(cp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(cp->dev);
+
+   return handled;
 }
 
 static inline u32 cp_tx_vlan_tag(struct sk_buff *skb)
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] 8139cp: Call __cp_set_rx_mode() from cp_tx_timeout()

2015-09-21 Thread David Woodhouse
On Sun, 2015-09-20 at 22:24 -0700, David Miller wrote:
> From: David Woodhouse 
> Date: Fri, 18 Sep 2015 00:21:54 +0100
> 
> > Unless we reset the RX config, on real hardware I don't seem to
> receive
> > any packets after a TX timeout.
> > 
> > Signed-off-by: David Woodhouse 
> 
> Applied.

Thanks. I'll send another batch, including the original patches 3/2 and
4/3 from this series, in reply to this message.

After which, I think we might be able to turn on TX checksumming by
default and I also have a way to implement early detection of the TX
stall I've been seeing.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] lib: fix data race in rhashtable_rehash_one

2015-09-21 Thread Dmitry Vyukov
On Mon, Sep 21, 2015 at 4:51 PM, Eric Dumazet  wrote:
> On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote:
>> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
>> > rhashtable_rehash_one() uses plain writes to update entry->next,
>> > while it is being concurrently accessed by readers.
>> > Unfortunately, the compiler is within its rights to (for example) use
>> > byte-at-a-time writes to update the pointer, which would fatally confuse
>> > concurrent readers.
>> >
>> This is bogus.
>>
>> 1) Linux is certainly not working if some arch or compiler is not doing
>> single word writes. WRITE_ONCE() would not help at all to enforce this.
>>
>> 2) If  new node is not yet visible, we don't care if we write
>> entry->next using any kind of operation.
>>
>> So the WRITE_ONCE() is not needed at all.
>>
>>
>>
>> > +   WRITE_ONCE(entry->next, head);
>>
>>
>> The rcu_assign_pointer() immediately following is enough in this case.
>>
>> We have hundred of similar cases in the kernel.
>>
>>
>
> The changelog and comment are totally confusing.
>
> Please remove the bogus parts in them, and/or rephrase.
>
> The important part here is that we rehash an item, so we need to make
> sure to maintain consistent ->next field, and need to prevent compiler
> from using ->next as a temporary variable.
>
> ptr->next = 1UL | ((base + offset) << 1);
>
> Is dangerous because compiler could issue :
>
> ptr->next = (base + offset);
>
> ptr->next <<= 1;
>
> ptr->next += 1UL;
>
> Frankly, all this looks like an oversight in this code.
>
> Not sure why the NULLS value is even recomputed.

I have not looked in detail yet, but the NULLS recomputation uses
new_hash, which obviously wasn't available when the value was
previously computed. Don't know yet whether it is important or not.



>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index cc0c69710dcf..0a29f07ba45a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
> unsigned int old_hash)
> head = rht_dereference_bucket(new_tbl->buckets[new_hash],
>   new_tbl, new_hash);
>
> -   if (rht_is_a_nulls(head))
> -   INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
> -   else
> -   RCU_INIT_POINTER(entry->next, head);
> +   RCU_INIT_POINTER(entry->next, head);
>
> rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
> spin_unlock(new_bucket_lock);
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "ktsan" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to ktsan+unsubscr...@googlegroups.com.
> To post to this group, send email to kt...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/ktsan/1442847108.29850.56.camel%40edumazet-glaptop2.roam.corp.google.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Dmitry Vyukov, Software Engineer, dvyu...@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >