Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Jiri Pirko
Fri, Jan 22, 2016 at 09:59:12PM CET, jay.vosbu...@canonical.com wrote:
>Jarod Wilson  wrote:
>
>>The network core tries to keep track of dropped packets, but some packets
>>you wouldn't really call dropped, so much as intentionally ignored, under
>>certain circumstances. One such case is that of bonding and team device
>>slaves that are currently inactive. Their respective rx_handler functions
>>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>>which ends up tracking into the network core's __netif_receive_skb_core()
>>function's drop path, with no pt_prev set. On a noisy network, this can
>>result in a very rapidly incrementing rx_dropped counter, not only on the
>>inactive slave(s), but also on the master device, such as the following:
>[...]
>>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>>active-backup bond0, and you can see that all three have high drop counts,
>>with the master bond0 showing a tally of all three.
>>
>>I know that this was previously discussed some here:
>>
>>http://www.spinics.net/lists/netdev/msg226341.html
>>
>>It seems additional counters never came to fruition, but honestly, for
>>this particular case, I'm not even sure they're warranted, I'd be inclined
>>to say just silently drop these packets without incrementing a counter. At
>>least, that's probably what would make someone who has complained loudly
>>about this issue happy, as they have monitoring tools that are squaking
>>loudly at any increments to rx_dropped.

In this case, it is delivered with exact delivery according to per-dev
registered callback. We just have to avoid it gets to bond. So this case
is not "to drop", but rather "to block skb to don't get where it does
not belong".

>
>   I don't think the kernel should silently drop packets; there
>should be a counter somewhere.  If a packet is being thrown away
>deliberately, it should not just vanish into the screaming void of
>space.  Someday someone will try and track down where that packet is
>being dropped.
>
>   I've had that same conversation with customers who insist on
>accounting for every packet drop (from the "any drop is an error"
>mindset), so I understand the issue.
>
>   Thinking about the prior discussion, the rx_drop_inactive is
>still a good idea, but I'd actually today get good use from a
>"rx_drop_unforwardable" (or an equivalent but shorter name) counter that
>counts every time a packet is dropped due to is_skb_forwardable()
>returning false.  __dev_forward_skb does this (and hits rx_dropped), as
>does the bridge (and does not count it).
>
>   -J
>
>>CC: "David S. Miller" 
>>CC: Eric Dumazet 
>>CC: Jiri Pirko 
>>CC: Daniel Borkmann 
>>CC: Tom Herbert 
>>CC: Jay Vosburgh 
>>CC: Veaceslav Falico 
>>CC: Andy Gospodarek 
>>CC: netdev@vger.kernel.org
>>Signed-off-by: Jarod Wilson 
>>---
>> net/core/dev.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 8cba3d8..1354c7b 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -4153,8 +4153,11 @@ ncls:
>>  else
>>  ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>>  } else {
>>+ if (deliver_exact)
>>+ goto inactive; /* bond or team inactive slave */
>> drop:
>>  atomic_long_inc(>dev->rx_dropped);
>>+inactive:
>>  kfree_skb(skb);
>>  /* Jamal, now you will not able to escape explaining
>>   * me how you were going to use this. :-)
>>-- 
>>1.8.3.1
>>
>
>---
>   -Jay Vosburgh, jay.vosbu...@canonical.com


[PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Russell King
Since commit 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del
ops"), the Marvell 88E6xxx switch has been unable to pass traffic
between ports - any received traffic is discarded by the switch.
Taking a port out of bridge mode and configuring a vlan on it also the
port to start passing traffic.

With the debugfs files re-instated to allow debug of this issue by
comparing the register settings between the working and non-working
case, the reason becomes clear:

 GLOBAL GLOBAL2 SERDES   0123456
- 7:  707f2001 2222202
+ 7:  707f2001 1111101

Register 7 for the ports is the default vlan tag register, and in the
non-working setup, it has been set to 2, despite vlan 2 not being
configured.  This causes the switch to drop all packets coming in to
these ports.  The working setup has the default vlan tag register set
to 1, which is the default vlan when none is configured.

Inspection of the code reveals why.  The code prior to this commit
was:

-   for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
...
-   if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID)
-   err = ds->drv->port_pvid_set(ds, p->port, vid);

but the new code is:

+   for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
...
+   }
...
+   if (pvid)
+   err = _mv88e6xxx_port_pvid_set(ds, port, vid);

This causes the new code to always set the default vlan to one higher
than the old code.

Fix this.

Fixes: 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops")
Cc: 
Signed-off-by: Russell King 
---
As this is a regression on 4.4 compared to 4.3, I've added the stable
tag.

 drivers/net/dsa/mv88e6xxx.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index b06dba05594a..bf622675397d 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1515,11 +1515,12 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int 
port,
err = _mv88e6xxx_port_vlan_add(ds, port, vid, untagged);
if (err)
goto unlock;
+
+   /* no PVID with ranges, otherwise it's a bug */
+   if (pvid)
+   err = _mv88e6xxx_port_pvid_set(ds, port, vid);
}
 
-   /* no PVID with ranges, otherwise it's a bug */
-   if (pvid)
-   err = _mv88e6xxx_port_pvid_set(ds, port, vid);
 unlock:
mutex_unlock(>smi_mutex);
 
-- 
2.1.0



[PATCH] ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV

2016-01-23 Thread Thomas Egerer
The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
to select CRYPTO_ECHAINIV in order to work properly. This solves the
issues caused by a misconfiguration as described in [1].
The original approach, patching crypto/Kconfig was turned down by
Herbert Xu [2].

[1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
[2] http://marc.info/?l=linux-crypto-vger=145224655809562=2

Signed-off-by: Thomas Egerer 
---
 net/ipv4/Kconfig | 1 +
 net/ipv6/Kconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index c229205..7758247 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -353,6 +353,7 @@ config INET_ESP
select CRYPTO_CBC
select CRYPTO_SHA1
select CRYPTO_DES
+   select CRYPTO_ECHAINIV
---help---
  Support for IPsec ESP.
 
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index bb7dabe..40c8975 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -69,6 +69,7 @@ config INET6_ESP
select CRYPTO_CBC
select CRYPTO_SHA1
select CRYPTO_DES


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Jiri Pirko
Fri, Jan 22, 2016 at 08:11:22PM CET, ja...@redhat.com wrote:
>The network core tries to keep track of dropped packets, but some packets
>you wouldn't really call dropped, so much as intentionally ignored, under
>certain circumstances. One such case is that of bonding and team device
>slaves that are currently inactive. Their respective rx_handler functions
>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>which ends up tracking into the network core's __netif_receive_skb_core()
>function's drop path, with no pt_prev set. On a noisy network, this can
>result in a very rapidly incrementing rx_dropped counter, not only on the
>inactive slave(s), but also on the master device, such as the following:
>
>Inter-|   Receive|  Transmit
> face |bytespackets errs drop fifo frame compressed multicast|bytes
> packets errs drop fifo colls carrier compressed
>  p7p1: 14783346  1404300 1404280 0  0  2040  680  
>  8000 0   0  0
>  p7p2: 14805198  140648000 0  0  20340
>0000 0   0  0
> bond0: 53365248  5327980 4211600 0  0115151 2040  
> 24000 0   0  0
>lo:5420  54000 0  0 0 5420 
>  54000 0   0  0
>  p5p1: 19292195  1961970 1403680 0  0 56564  680  
>  8000 0   0  0
>  p5p2: 19289707  1961710 1403640 0  0 56547  680  
>  8000 0   0  0
>   em3: 20996626  158214000 0  0   3830
>0000 0   0  0
>   em2: 14065122  138462000 0  0   3100
>0000 0   0  0
>   em1: 14063162  138440000 0  0   3080
>0000 0   0  0
>   em4: 21050830  158729000 0  0   38571662
>  469000 0   0  0
>   ib0:   0   0000 0  0 00 
>   0000 0   0  0
>
>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>active-backup bond0, and you can see that all three have high drop counts,
>with the master bond0 showing a tally of all three.
>
>I know that this was previously discussed some here:
>
>http://www.spinics.net/lists/netdev/msg226341.html
>
>It seems additional counters never came to fruition, but honestly, for
>this particular case, I'm not even sure they're warranted, I'd be inclined
>to say just silently drop these packets without incrementing a counter. At
>least, that's probably what would make someone who has complained loudly
>about this issue happy, as they have monitoring tools that are squaking
>loudly at any increments to rx_dropped.
>
>CC: "David S. Miller" 
>CC: Eric Dumazet 
>CC: Jiri Pirko 
>CC: Daniel Borkmann 
>CC: Tom Herbert 
>CC: Jay Vosburgh 
>CC: Veaceslav Falico 
>CC: Andy Gospodarek 
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson 

Acked-by: Jiri Pirko 

I think this should be considered as a bug and go to -net.


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Eric Dumazet
On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
> + if (deliver_exact)
> + goto inactive; /* bond or team inactive slave */
>  drop:
>   atomic_long_inc(>dev->rx_dropped);
> +inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)

Note that if you still have a kfree_skb() instead of consume_skb(),
some tools will still give you a wrong signal (packet dropped ...).

But then maybe the signal is telling some truth.

We receive a packet, and decide to drop it because no one was willing to
handle it.

Maybe someone wants to know a particular slave receives 10,000 such
frames per second and hurts performance with useless work.

We should at least increment some counter and maybe dump it with
"ethtool -S" or something.





Re: struct pid memory leak

2016-01-23 Thread Willy Tarreau
Hi Eric, Dmitry,

On Fri, Jan 22, 2016 at 08:50:01AM -0800, Eric Dumazet wrote:
> CC netdev, as it looks some af_unix issue ...
> 
> On Fri, 2016-01-22 at 16:08 +0100, Dmitry Vyukov wrote:
> > Hello,
> > 
> > The following program causes struct pid memory leak:
> > 
> > // autogenerated by syzkaller (http://github.com/google/syzkaller)
(...)
> > unreferenced object 0x8800324af200 (size 112):
> >   comm "syz-executor", pid 18413, jiffies 4295500287 (age 14.321s)
> >   hex dump (first 32 bytes):
> > 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> >   backtrace:
> > [] kmemleak_alloc+0x63/0xa0 mm/kmemleak.c:916
> > [< inline >] kmemleak_alloc_recursive 
> > include/linux/kmemleak.h:47
(...)
> > On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2 (Jan 20).

I can't reproduce this with the indicated commit. I'm unsure how/what
I'm supposed to see. Is a certain config needed ? I've enabled kmemleak
in my .config but there are too few information here to go further
unfortunately.

Regards,
Willy


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Andrew Lunn
On Sat, Jan 23, 2016 at 07:48:57PM +, Russell King - ARM Linux wrote:
> On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote:
> > I'm testing on a 6172. But 6172 and 6176 are both in the same family
> > 6352, and share the same driver.
> 
> Hmm, can't be that then.
> 
> > So you initially have lan1 in an bridge. I don't.

Running tcpdump on the device i'm trying to ping, there are ARP
requests and replies. But the replies are never received by the
target, arp -a shows .

Looking at the stats counters in debugfs, the packets are counted in
in_unicast, but also sw_in_filtered.

Port 0 is lan0 and port 5 is the cpu port.

root@370-rd:/sys/kernel/debug/dsa0# cat regs
GLOBAL GLOBAL2 SERDES   0123456  
 0:  c874   01940  1d0f 1d0f 1d0f 1d0f 100f  e076 
 1:   fa0   0 149 33333 c03e3 
 2:   fa0 141 0000000 
 3: 0 ea1  1721 1721 1721 1721 1721 1721 1721 
 4:  6000 258 1e0   433  433  433  433  433 373f  433 
 5:  3000  ff   0 0000000 
 6:   fa01f0f   47f   7e   7d   7c   7b   7a   79 
 7:  3331707f2001 0  fa1  fa2  fa3  fa40  fa6 
 8:   3037800   0  2c80 2c80 2c80 2c80 2c80 2c80 2c80 
 9: 01600   0 1111111 
 a:   148   0   0 0000000 
 b:  20001000   0 1248   10 2000   40 
 c:   f0f  7f   0 0000000 
 d: 0 5f1   3 0000000 
 e: 0   0   0 0000000 
 f: 0 f00c000  dada dada dada dada dada dada dada 
10: 0   04005 0000000 
11: 0   08000 0000000 
12:     0   012000080 
13:     0   0 00000   220 
14:   400   0 0000000 
15:     0   0 0000000 
16:     0   1  700f 7002 7004 7008   33   330 
17:     0   0 0000000 
18:  fa4115f6   0  3210 3210 3210 3210 3210 3210 3210 
19: 0   0   0  7654 7654 7654 7654 7654 7654 7654 
1a:  5550   0  42 0000000 
1b:   1faf869   0  8000 8000 8000 8000 8000 8000 8000 
1c: 0   0   0 0000000 
1d:  5ce0   0   0 0000000 
1e: 0   0   0 0000000 
1f: 0   0   0 0000000 

Andrew


Re: [Xen-devel] [PATCH 1/3] xen-netback: fix license ident used in MODULE_LICENSE

2016-01-23 Thread Wei Liu
On Fri, Jan 22, 2016 at 08:25:21PM +, One Thousand Gnomes wrote:
> > The fact what include/linux/license.h:license_is_gpl_compatible includes
> > "Dual MIT/GPL" as an option seems to suggest that it is enough of a thing
> > to be validly used as the contents of a MODULE_LICENSE() thing.
> 
> Yes. The MIT licence most definitely exists, and people know what it
> means.
> 
> Also nobody should be changing the licence on anything unless they have
> the written permission of all rights holders on record, so it's best to
> leave it be 8)
> 

I knew from the beginning anything related to license will be fun. :-)

In this particular case, I don't think I need to get confirmation from
all rights holder because they've agreed to the licenses listed in
the comment. I'm merely fixing a bug in code.

I understand people have different opinion on how this should be
interpreted. And I'm not a lawyer. Let's just leave it be now and divert
our energy to more useful things in life.

Wei.

> Alan


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Russell King - ARM Linux
On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote:
> I'm testing on a 6172. But 6172 and 6176 are both in the same family
> 6352, and share the same driver.

Hmm, can't be that then.

> So you initially have lan1 in an bridge. I don't.

Okay, I've disabled br0 in the debian network/interfaces and rebooted.
This means eth0 (connected to the bridge) is initially down, along with
all the lan interfaces.

root@clearfog:~# brctl show br0
bridge name bridge id   STP enabled interfaces
br0 can't get info No such device
root@clearfog:~# ip link set eth0 up
root@clearfog:~# ip link set lan5 up
root@clearfog:~# ip addr add 192.168.254.3/24 dev lan5
root@clearfog:~# ping 192.168.254.254
PING 192.168.254.254 (192.168.254.254) 56(84) bytes of data.
64 bytes from 192.168.254.254: icmp_seq=1 ttl=64 time=1.15 ms
64 bytes from 192.168.254.254: icmp_seq=2 ttl=64 time=0.686 ms
^C
--- 192.168.254.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.686/0.921/1.157/0.237 ms
root@clearfog:~# ip addr del 192.168.254.3/24 dev lan5
root@clearfog:~# brctl addbr br0
root@clearfog:~# brctl addif br0 lan5
root@clearfog:~# ip link set br0 up
root@clearfog:~# ip addr add 192.168.254.3/24 dev br0
root@clearfog:~# ping 192.168.254.254
PING 192.168.254.254 (192.168.254.254) 56(84) bytes of data.
64 bytes from 192.168.254.254: icmp_seq=1 ttl=64 time=1.18 ms
^C
--- 192.168.254.254 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.189/1.189/1.189/0.000 ms

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Andy Gospodarek
On Fri, Jan 22, 2016 at 02:11:22PM -0500, Jarod Wilson wrote:
> The network core tries to keep track of dropped packets, but some packets
> you wouldn't really call dropped, so much as intentionally ignored, under
> certain circumstances. One such case is that of bonding and team device
> slaves that are currently inactive. Their respective rx_handler functions
> return RX_HANDLER_EXACT (the only places in the kernel that return that),
> which ends up tracking into the network core's __netif_receive_skb_core()
> function's drop path, with no pt_prev set. On a noisy network, this can
> result in a very rapidly incrementing rx_dropped counter, not only on the
> inactive slave(s), but also on the master device, such as the following:
> 
> Inter-|   Receive|  Transmit
>  face |bytespackets errs drop fifo frame compressed multicast|bytes
> packets errs drop fifo colls carrier compressed
>   p7p1: 14783346  1404300 1404280 0  0  2040  680 
>   8000 0   0  0
>   p7p2: 14805198  140648000 0  0  20340   
> 0000 0   0  0
>  bond0: 53365248  5327980 4211600 0  0115151 2040 
>  24000 0   0  0
> lo:5420  54000 0  0 0 5420
>   54000 0   0  0
>   p5p1: 19292195  1961970 1403680 0  0 56564  680 
>   8000 0   0  0
>   p5p2: 19289707  1961710 1403640 0  0 56547  680 
>   8000 0   0  0
>em3: 20996626  158214000 0  0   3830   
> 0000 0   0  0
>em2: 14065122  138462000 0  0   3100   
> 0000 0   0  0
>em1: 14063162  138440000 0  0   3080   
> 0000 0   0  0
>em4: 21050830  158729000 0  0   38571662   
>   469000 0   0  0
>ib0:   0   0000 0  0 00
>0000 0   0  0
> 
> In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
> active-backup bond0, and you can see that all three have high drop counts,
> with the master bond0 showing a tally of all three.
> 
> I know that this was previously discussed some here:
> 
> http://www.spinics.net/lists/netdev/msg226341.html
> 
> It seems additional counters never came to fruition, but honestly, for
> this particular case, I'm not even sure they're warranted, I'd be inclined
> to say just silently drop these packets without incrementing a counter. At
> least, that's probably what would make someone who has complained loudly
> about this issue happy, as they have monitoring tools that are squaking
> loudly at any increments to rx_dropped.

I completely agree.

> CC: "David S. Miller" 
> CC: Eric Dumazet 
> CC: Jiri Pirko 
> CC: Daniel Borkmann 
> CC: Tom Herbert 
> CC: Jay Vosburgh 
> CC: Veaceslav Falico 
> CC: Andy Gospodarek 
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 

Acked-by: Andy Gospodarek 

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
> + if (deliver_exact)
> + goto inactive; /* bond or team inactive slave */
>  drop:
>   atomic_long_inc(>dev->rx_dropped);
> +inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)
> -- 
> 1.8.3.1
> 


Re: net: GPF in netlink_getsockbyportid

2016-01-23 Thread Daniel Borkmann

On 01/23/2016 08:25 PM, Florian Westphal wrote:

Dmitry Vyukov  wrote:

[ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ]


The following program causes GPF in netlink_getsockbyportid:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 

int main()
{
   syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul,
  0xul, 0x0ul);
   int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0);
   *(uint32_t*)0x20e64000 = (uint32_t)0x28;
   *(uint32_t*)0x20e64004 = (uint32_t)0x10;
   *(uint64_t*)0x20e64008 = (uint64_t)0x0;
   *(uint64_t*)0x20e64010 = (uint64_t)0x3;
   *(uint64_t*)0x20e64018 = (uint64_t)0xfff;
   *(uint16_t*)0x20e64020 = (uint16_t)0x5;
   syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0);
   return 0;
}


CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/

root cause is in nfnetlink_rcv_batch():

296 replay:
297 status = 0;
298
299 skb = netlink_skb_clone(oskb, GFP_KERNEL);

The clone op doesn't copy oskb->sk, so we oops in
__netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch
tries to send netlink ack.


If indeed oskb is the mmap'ed netlink skb, then it's not even allowed
to call into skb_clone() as it would access skb shared info data that
can be controlled by the user space mmap buffer, iirc, we had that in
the past with nlmon where skb_clone() was accidentally used.


Re: struct pid memory leak

2016-01-23 Thread Willy Tarreau
On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote:
> I've attached my .config.
> Also run this program in a parallel loop. I think it's leaking not
> every time, probably some race is involved.

Thank you. Just in order to confirm, am I supposed to see the
messages you quoted in dmesg ?

Thanks,
Willy



Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Andrew Lunn
On Sat, Jan 23, 2016 at 07:06:22PM +, Russell King - ARM Linux wrote:
> On Sat, Jan 23, 2016 at 07:15:26PM +0100, Andrew Lunn wrote:
> > Thanks for digging into this.
> 
> I hope you saw v2, which is functionally identical.

No, not yet. I will go look for it.
 
> > I think this is a step towards a solution, but does not solve all the
> > problems.
> > 
> > e.g. I have a switch interface lan0 with the IP address
> > 192.168.10.2. I can ping this address from another host. I then take
> > the IP address off the interface, create a br0 device, add lan0 to the
> > bridge, and put 192.168.10.2 onto the bridge. I should be able to then
> > ping the address. But it does not work.
> 
> That works for me.  Maybe it's differences in the switch device?  I
> seem to remember you said your switch was an older generation than
> mine (88E6176).

I'm testing on a 6172. But 6172 and 6176 are both in the same family
6352, and share the same driver.

> root@clearfog:~# brctl show br0
> bridge name bridge id   STP enabled interfaces
> br0 8000.005043020202   no  lan1
> lan2
> lan3
> lan4
> lan5
> lan6

So you initially have lan1 in an bridge. I don't.

root@370-rd:~# ip link set lan0 up  
   
root@370-rd:~# ip addr add 192.168.10.2/24 dev lan0 
   
root@370-rd:~# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.  
64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=1.69 ms  
64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.953 ms 
64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=0.926 ms

root@370-rd:~# ip addr del 192.168.10.2/24 dev lan0
root@370-rd:~# brctl addbr br0
root@370-rd:~# brctl addif br0 lan0
root@370-rd:~# ip link set br0 up
root@370-rd:~# ip addr add 192.168.10.2/24 dev br0  
root@370-rd:~# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.  
^C  
--- 192.168.10.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2001ms

  Andrew


Re: struct pid memory leak

2016-01-23 Thread Eric Dumazet
On Sat, Jan 23, 2016 at 10:46 AM, Dmitry Vyukov  wrote:
> On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreau  wrote:
>> On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote:
>>> I've attached my .config.
>>> Also run this program in a parallel loop. I think it's leaking not
>>> every time, probably some race is involved.
>>
>> Thank you. Just in order to confirm, am I supposed to see the
>> messages you quoted in dmesg ?
>
>
> I think the simplest way to confirm that you can reproduce it locally
> is to check /proc/slabinfo. When I run this program in a parallel
> loop, number of objects in pid cache was constantly growing:
>
> # cat /proc/slabinfo | grep pid
> pid  297532576   284 : tunables00
>   0 : slabdata 19 19  0
> ...
> pid  412532576   284 : tunables00
>   0 : slabdata 19 19  0
> ...
> pid 1107   1176576   284 : tunables00
>   0 : slabdata 42 42  0
> ...
> pid 1545   1652576   284 : tunables00
>   0 : slabdata 59 59  0
>
>
> If you want to use kmemleak, then you need to run this program in a
> parallel loop for some time, then stop it and then:
>
> $ echo scan > /sys/kernel/debug/kmemleak
> $ cat /sys/kernel/debug/kmemleak
>
> If kmemleak has detected any leaks, cat will show them. I noticed that
> kmemleak can delay leaks with significant delay, so usually I do scan
> at least 5 times.

Note that kmemleak is not needed.

Just run a normal kernel (eventually using slab_nomerge=1 boot cmd to
make sure 'pid' slab is not shared)

It seems that bug is rather old, as linux-4.0 has it.


net: GPF in netlink_getsockbyportid

2016-01-23 Thread Dmitry Vyukov
Hello,

The following program causes GPF in netlink_getsockbyportid:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 

int main()
{
  syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul,
 0xul, 0x0ul);
  int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0);
  *(uint32_t*)0x20e64000 = (uint32_t)0x28;
  *(uint32_t*)0x20e64004 = (uint32_t)0x10;
  *(uint64_t*)0x20e64008 = (uint64_t)0x0;
  *(uint64_t*)0x20e64010 = (uint64_t)0x3;
  *(uint64_t*)0x20e64018 = (uint64_t)0xfff;
  *(uint16_t*)0x20e64020 = (uint16_t)0x5;
  syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0);
  return 0;
}


kasan: GPF could be caused by NULL-ptr deref or user memory
accessgeneral protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 1 PID: 7519 Comm: syz-executor Not tainted 4.4.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800625e17c0 ti: 880062b5 task.ti: 880062b5
RIP: 0010:[]  []
netlink_getsockbyportid+0x30/0x1b0
RSP: 0018:880062b57818  EFLAGS: 00010206
RAX: dc00 RBX: 0002 RCX: 0002
RDX: 0048 RSI: 0002 RDI: 0240
RBP: 880062b57838 R08: 024000c0 R09: 
R10:  R11:  R12: 
R13: 024000c0 R14:  R15: 0002
FS:  7fe4328d7700() GS:88003ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7fff7c3ecd80 CR3: 3b0b3000 CR4: 06e0
Stack:
 880062b578d0  003c 024000c0
 880062b578b8 8536b300 00e0 8175fe26
 8800625e1fe0 7379736275732d6b 302d 8764d868
Call Trace:
 [] __netlink_alloc_skb+0x30/0x790
net/netlink/af_netlink.c:1890
 [< inline >] netlink_alloc_skb include/linux/netlink.h:79
 [] netlink_ack+0x153/0x520 net/netlink/af_netlink.c:2968
 [< inline >] nfnetlink_rcv_batch net/netfilter/nfnetlink.c:321
 [] nfnetlink_rcv+0xbad/0x10a0 net/netfilter/nfnetlink.c:477
 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1834
 [] netlink_unicast+0x47a/0x700 net/netlink/af_netlink.c:1860
 [] netlink_sendmsg+0x1086/0x1760
net/netlink/af_netlink.c:2511
 [< inline >] sock_sendmsg_nosec net/socket.c:611
 [] sock_sendmsg+0xca/0x110 net/socket.c:621
 [] sock_write_iter+0x216/0x3a0 net/socket.c:820
 [< inline >] new_sync_write fs/read_write.c:517
 [] __vfs_write+0x302/0x480 fs/read_write.c:530
 [] vfs_write+0x167/0x4a0 fs/read_write.c:577
 [< inline >] SYSC_write fs/read_write.c:624
 [] SyS_write+0x111/0x220 fs/read_write.c:616
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
Code: 55 41 54 53 49 89 fc 89 f3 48 83 ec 08 e8 39 b7 20 fc 49 8d bc
24 40 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f>
b6 04 02 84 c0 74 08 3c 03 0f 8e 30 01 00 00 49 8d 7c 24 30
RIP  [] netlink_getsockbyportid+0x30/0x1b0
net/netlink/af_netlink.c:1658
 RSP 
---[ end trace f4ac9332ef80a14f ]---

On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.


Re: net: GPF in netlink_getsockbyportid

2016-01-23 Thread Florian Westphal
Dmitry Vyukov  wrote:

[ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ]

> The following program causes GPF in netlink_getsockbyportid:
> 
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main()
> {
>   syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul,
>  0xul, 0x0ul);
>   int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0);
>   *(uint32_t*)0x20e64000 = (uint32_t)0x28;
>   *(uint32_t*)0x20e64004 = (uint32_t)0x10;
>   *(uint64_t*)0x20e64008 = (uint64_t)0x0;
>   *(uint64_t*)0x20e64010 = (uint64_t)0x3;
>   *(uint64_t*)0x20e64018 = (uint64_t)0xfff;
>   *(uint16_t*)0x20e64020 = (uint16_t)0x5;
>   syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0);
>   return 0;
> }

CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/

root cause is in nfnetlink_rcv_batch():

296 replay:
297 status = 0;
298 
299 skb = netlink_skb_clone(oskb, GFP_KERNEL);

The clone op doesn't copy oskb->sk, so we oops in
__netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch
tries to send netlink ack.


Re: struct pid memory leak

2016-01-23 Thread Dmitry Vyukov
On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreau  wrote:
> On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote:
>> I've attached my .config.
>> Also run this program in a parallel loop. I think it's leaking not
>> every time, probably some race is involved.
>
> Thank you. Just in order to confirm, am I supposed to see the
> messages you quoted in dmesg ?


I think the simplest way to confirm that you can reproduce it locally
is to check /proc/slabinfo. When I run this program in a parallel
loop, number of objects in pid cache was constantly growing:

# cat /proc/slabinfo | grep pid
pid  297532576   284 : tunables00
  0 : slabdata 19 19  0
...
pid  412532576   284 : tunables00
  0 : slabdata 19 19  0
...
pid 1107   1176576   284 : tunables00
  0 : slabdata 42 42  0
...
pid 1545   1652576   284 : tunables00
  0 : slabdata 59 59  0


If you want to use kmemleak, then you need to run this program in a
parallel loop for some time, then stop it and then:

$ echo scan > /sys/kernel/debug/kmemleak
$ cat /sys/kernel/debug/kmemleak

If kmemleak has detected any leaks, cat will show them. I noticed that
kmemleak can delay leaks with significant delay, so usually I do scan
at least 5 times.


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Andrew Lunn
On Sat, Jan 23, 2016 at 10:51:16AM +, Russell King wrote:
> Since commit 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del
> ops"), the Marvell 88E6xxx switch has been unable to pass traffic
> between ports - any received traffic is discarded by the switch.
> Taking a port out of bridge mode and configuring a vlan on it also the
> port to start passing traffic.
> 
> With the debugfs files re-instated to allow debug of this issue by
> comparing the register settings between the working and non-working
> case, the reason becomes clear:
> 
>  GLOBAL GLOBAL2 SERDES   0123456
> - 7:  707f2001 2222202
> + 7:  707f2001 1111101
> 
> Register 7 for the ports is the default vlan tag register, and in the
> non-working setup, it has been set to 2, despite vlan 2 not being
> configured.  This causes the switch to drop all packets coming in to
> these ports.  The working setup has the default vlan tag register set
> to 1, which is the default vlan when none is configured.
> 
> Inspection of the code reveals why.  The code prior to this commit
> was:
> 
> - for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
> ...
> - if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID)
> - err = ds->drv->port_pvid_set(ds, p->port, vid);
> 
> but the new code is:
> 
> + for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
> ...
> + }
> ...
> + if (pvid)
> + err = _mv88e6xxx_port_pvid_set(ds, port, vid);
> 
> This causes the new code to always set the default vlan to one higher
> than the old code.
> 
> Fix this.
> 
> Fixes: 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops")
> Cc: 
> Signed-off-by: Russell King 

Hi Russell

Thanks for digging into this.

I think this is a step towards a solution, but does not solve all the
problems.

e.g. I have a switch interface lan0 with the IP address
192.168.10.2. I can ping this address from another host. I then take
the IP address off the interface, create a br0 device, add lan0 to the
bridge, and put 192.168.10.2 onto the bridge. I should be able to then
ping the address. But it does not work.

 Andrew


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Russell King - ARM Linux
On Sat, Jan 23, 2016 at 07:15:26PM +0100, Andrew Lunn wrote:
> Thanks for digging into this.

I hope you saw v2, which is functionally identical.

> I think this is a step towards a solution, but does not solve all the
> problems.
> 
> e.g. I have a switch interface lan0 with the IP address
> 192.168.10.2. I can ping this address from another host. I then take
> the IP address off the interface, create a br0 device, add lan0 to the
> bridge, and put 192.168.10.2 onto the bridge. I should be able to then
> ping the address. But it does not work.

That works for me.  Maybe it's differences in the switch device?  I
seem to remember you said your switch was an older generation than
mine (88E6176).

root@clearfog:~# brctl show br0
bridge name bridge id   STP enabled interfaces
br0 8000.005043020202   no  lan1
lan2
lan3
lan4
lan5
lan6
root@clearfog:~# ip addr show|grep 192.168.254
root@clearfog:~# ip addr add 192.168.254.3/24 dev br0
root@clearfog:~# ping 192.168.254.2
PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data.
64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.648 ms
^C
--- 192.168.254.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.648/0.648/0.648/0.000 ms
root@clearfog:~# ip addr del 192.168.254.3/24 dev br0
root@clearfog:~# brctl delif br0 lan1
root@clearfog:~# ip addr add 192.168.254.3/24 dev lan1
root@clearfog:~# ping 192.168.254.2
PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data.
64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.331 ms
^C
--- 192.168.254.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.331/0.331/0.331/0.000 ms
root@clearfog:~# ip addr del 192.168.254.3/24 dev lan1
root@clearfog:~# brctl addif br0 lan1
root@clearfog:~# ip addr add 192.168.254.3/24 dev br0
root@clearfog:~# ping 192.168.254.2
PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data.
64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.630 ms
^C
--- 192.168.254.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.630/0.630/0.630/0.000 ms

I have no other patches for DSA other than the v2 I posted on this
kernel.

I'd suggest adding back the debugfs register dumping so you can get
some diagnostics, and work out what's different in the configuration
between a working kernel and a non-working kernel:

https://github.com/vivien/linux/commit/b23c6568011c84b80a8f5632ac819a5dbdca2dc1.patch

However, what I do notice is that trying to use vlans is a complete
fail - the switch strips _all_ vlan tags off the outgoing packets
no matter how I setup a vlan, whether it's on br0, or on a switch
port:

Packet as observed on a host connected to the switch port with no
vlan capability:
 0x:     0050 4302 0202 0806 0001
 0x0010:  0800 0604 0001 0050 4302 0202 c0a8 fe03 ...

Packet observed on Armada 388's eth interface connected to the 88E6176
switch:
 0x:     0050 4302 0202 dada 
 0x0010:  6030 0800 0806 0001 0800 0604 0001 0050 ...

The packet should appear on the wire with the 8100 0800 vlan protocol
ID + tag, but it appears with the vlan protocol and tag stripped.  So
there's yet more issues here.

Incoming packets from the interface appear to be dropped as far as I
can tell - the response to those ARP packets doesn't appear, though
I'm not convinced that I've been able to get the only vlan-capable
device on that side of the network to be setup correctly for vlans -
but obviously it's a total loss if vlan transmission isn't working.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Andrew Lunn
> This shows port 0 is on vlan 0, but it should default to vlan 1 when
> no vlans are configured.  The patch below should at least allow some
> diagnosis of what's being requested, and when.
> 
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index a43354ed0607..8a9cf67eb16d 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int 
> port,
>   u16 vid;
>   int err = 0;
>  
> + printk("%s: port %d vid %u-%u flags %x\n",
> + __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags);
> +

Hi Russell

Never called.

So i guess we have a kernel configuration difference.

I don't have CONFIG_BRIDGE_VLAN_FILTERING.

But DSA should not rely on this option for correct operation.

  Andrew


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Andrew Lunn
On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote:
> > This shows port 0 is on vlan 0, but it should default to vlan 1 when
> > no vlans are configured.  The patch below should at least allow some
> > diagnosis of what's being requested, and when.
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> > index a43354ed0607..8a9cf67eb16d 100644
> > --- a/drivers/net/dsa/mv88e6xxx.c
> > +++ b/drivers/net/dsa/mv88e6xxx.c
> > @@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, 
> > int port,
> > u16 vid;
> > int err = 0;
> >  
> > +   printk("%s: port %d vid %u-%u flags %x\n",
> > +   __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags);
> > +
> 
> Hi Russell
> 
> Never called.
> 
> So i guess we have a kernel configuration difference.
> 
> I don't have CONFIG_BRIDGE_VLAN_FILTERING.

Yep, confirmed. When i enable this option, and VLAN_8021Q which it
depends on, i then have a working bridge.

Andrew


Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Russell King - ARM Linux
On Sat, Jan 23, 2016 at 11:23:26PM +0100, Andrew Lunn wrote:
> On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote:
> > Hi Russell
> > 
> > Never called.
> > 
> > So i guess we have a kernel configuration difference.
> > 
> > I don't have CONFIG_BRIDGE_VLAN_FILTERING.
> 
> Yep, confirmed. When i enable this option, and VLAN_8021Q which it
> depends on, i then have a working bridge.

It sounds like Vivien urgently needs to do some extra work to fix
the patches that created all this broken-ness, or we need to ask
davem to revert back to a known good working state.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: struct pid memory leak

2016-01-23 Thread Willy Tarreau
On Sat, Jan 23, 2016 at 07:46:45PM +0100, Dmitry Vyukov wrote:
> On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreau  wrote:
> > On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote:
> >> I've attached my .config.
> >> Also run this program in a parallel loop. I think it's leaking not
> >> every time, probably some race is involved.
> >
> > Thank you. Just in order to confirm, am I supposed to see the
> > messages you quoted in dmesg ?
> 
> 
> I think the simplest way to confirm that you can reproduce it locally
> is to check /proc/slabinfo. When I run this program in a parallel
> loop, number of objects in pid cache was constantly growing:
> 
> # cat /proc/slabinfo | grep pid
> pid  297532576   284 : tunables00
>   0 : slabdata 19 19  0
> ...
> pid  412532576   284 : tunables00
>   0 : slabdata 19 19  0
> ...
> pid 1107   1176576   284 : tunables00
>   0 : slabdata 42 42  0
> ...
> pid 1545   1652576   284 : tunables00
>   0 : slabdata 59 59  0

OK got it and indeed I can see it grow. In fact, the active column grows and
once it reaches the num objects, this one grows in turn, which makes sense.

All I can say now is that it doesn't need to run over multiple processes
to leak, though that makes it easier. SMP is not needed either.

> If you want to use kmemleak, then you need to run this program in a
> parallel loop for some time, then stop it and then:
> 
> $ echo scan > /sys/kernel/debug/kmemleak
> $ cat /sys/kernel/debug/kmemleak
> 
> If kmemleak has detected any leaks, cat will show them. I noticed that
> kmemleak can delay leaks with significant delay, so usually I do scan
> at least 5 times.

Thank you for these information.

I've tested on an older (3.14) kernel and I can see the effect there as well.
I don't have "pid" in slabinfo, but launching 1000 processes at a time uses
a few tens to hundreds kB of RAM on each round. 3.10 doesn't seem affected,
I'm seeing the memory grow to a fixed point if I increase the number of
parallel processes but then even after a few tens of thousands of processes,
the reported used memory doesn't seem to increase (remember no "pid" entry
here).

kmemleak indeed reports me something on 3.14 which seems to match your
trace as I'm seeing bash as the process (instead of syz-executor in your
case) and alloc_pid() calls kmem_cache_alloc() :

Unreferenced object 0x88003facd000 (size 128):
  comm "bash", pid 1822, jiffies 4294951223 (age 15.280s)
  hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmem_cache_alloc+0x92/0xe0
[] alloc_pid+0x24/0x4a0
[] cpumask_any_but+0x23/0x40
[] copy_process.part.66+0x1068/0x16e0
[] n_tty_write+0x37b/0x4f0
[] tty_write+0x1c1/0x2a0
[] do_fork+0xe0/0x340
[] __set_task_blocked+0x30/0x80
[] __set_current_blocked+0x38/0x60
[] stub_clone+0x69/0x90
[] system_call_fastpath+0x16/0x1b
[] 0x

It doesn't report this on 3.10.

Unfortunately I feel totally incompetent on the subject :-/

Willy



Re: struct pid memory leak

2016-01-23 Thread Willy Tarreau
On Sat, Jan 23, 2016 at 06:50:11PM -0800, Eric Dumazet wrote:
> On Sat, Jan 23, 2016 at 6:38 PM, Willy Tarreau  wrote:
> > On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote:
> >> It doesn't report this on 3.10.
> >
> > To be more precise, kmemleak reports the issue on 3.13 and not on 3.12.
> > I'm not sure if it's reliable enough to run a bisect though.
> >
> > Willy
> >
> 
> I have the leak on linux-3.11.
> 
> I believe even linux-3.3 gets the leak, although I had to wait about
> one hour to be confident the leak was there.

OK so I'm stopping my bisect. It's possible it's affected with some option
which changed along the various "make oldconfig" at each step. Thanks for
letting me know.

Willy



[PATCH] 82xx: FCC: Fixing a bug causing to FCC port lock-up (second try)

2016-01-23 Thread Martin Roth
This is an additional patch to the one already submitted recently.
The previous patch was not complete, and the FCC port lock-up scenario
has been reproduced in lab.
I had an opportunity to check the current patch in lab and the FCC
port lock no longer freezes, while the previous patch still locks-up the
FCC port.
The current patch fixes a pointer arithmetic bug (second bug in the same
line), which leads FCC port lock-up during underrun/collision handling.
Within the tx_startup() function in mac-fcc.c, the address of last BD is
not calculated correctly. As a result of wrong calculation of the last BD
address, the next transmitted BD may be set to an area out of the transmit
BD ring. This actually causes to port lock-up and it is not recoverable.

Signed-off-by: Martin Roth 
---
 drivers/net/ethernet/freescale/fs_enet/mac-fcc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c 
b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
index 52e0091..1ba359f 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
@@ -552,7 +552,7 @@ static void tx_restart(struct net_device *dev)
cbd_t __iomem *prev_bd;
cbd_t __iomem *last_tx_bd;
 
-   last_tx_bd = fep->tx_bd_base + ((fpi->tx_ring - 1) * sizeof(cbd_t));
+   last_tx_bd = fep->tx_bd_base + (fpi->tx_ring - 1);
 
/* get the current bd held in TBPTR  and scan back from this point */
recheck_bd = curr_tbptr = (cbd_t __iomem *)
-- 
1.7.9.5



Re: net: GPF in netlink_getsockbyportid

2016-01-23 Thread Florian Westphal
Daniel Borkmann  wrote:
> On 01/23/2016 08:25 PM, Florian Westphal wrote:
> >Dmitry Vyukov  wrote:
> >
> >[ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ]
> >
> >>The following program causes GPF in netlink_getsockbyportid:
[..]

> >CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/
> >
> >root cause is in nfnetlink_rcv_batch():
> >
> >296 replay:
> >297 status = 0;
> >298
> >299 skb = netlink_skb_clone(oskb, GFP_KERNEL);
> >
> >The clone op doesn't copy oskb->sk, so we oops in
> >__netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch
> >tries to send netlink ack.
> 
> If indeed oskb is the mmap'ed netlink skb, then it's not even allowed
> to call into skb_clone()

Right, but in this case there is no mmap'd netlink sk involved -- we
crash when we try to look up dst netlink socket to see if there is an
mmap'd ring attached.

[ and that code isn't there with CONFIG_NETLINK_MMAP=n ].


Re: struct pid memory leak

2016-01-23 Thread Willy Tarreau
On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote:
> It doesn't report this on 3.10.

To be more precise, kmemleak reports the issue on 3.13 and not on 3.12.
I'm not sure if it's reliable enough to run a bisect though.

Willy



Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Russell King - ARM Linux
On Sat, Jan 23, 2016 at 09:16:47PM +0100, Andrew Lunn wrote:
> On Sat, Jan 23, 2016 at 07:48:57PM +, Russell King - ARM Linux wrote:
> > On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote:
> > > I'm testing on a 6172. But 6172 and 6176 are both in the same family
> > > 6352, and share the same driver.
> > 
> > Hmm, can't be that then.
> > 
> > > So you initially have lan1 in an bridge. I don't.
> 
> Running tcpdump on the device i'm trying to ping, there are ARP
> requests and replies. But the replies are never received by the
> target, arp -a shows .
> 
> Looking at the stats counters in debugfs, the packets are counted in
> in_unicast, but also sw_in_filtered.
> 
> Port 0 is lan0 and port 5 is the cpu port.
> 
> root@370-rd:/sys/kernel/debug/dsa0# cat regs
> GLOBAL GLOBAL2 SERDES   0123456  
>  0:  c874   01940  1d0f 1d0f 1d0f 1d0f 100f  e076 
>  1:   fa0   0 149 33333 c03e3 
>  2:   fa0 141 0000000 
>  3: 0 ea1  1721 1721 1721 1721 1721 1721 1721 
>  4:  6000 258 1e0   433  433  433  433  433 373f  433 
>  5:  3000  ff   0 0000000 
>  6:   fa01f0f   47f   7e   7d   7c   7b   7a   79 
>  7:  3331707f2001 0  fa1  fa2  fa3  fa40  fa6 

This shows port 0 is on vlan 0, but it should default to vlan 1 when
no vlans are configured.  The patch below should at least allow some
diagnosis of what's being requested, and when.

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index a43354ed0607..8a9cf67eb16d 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int 
port,
u16 vid;
int err = 0;
 
+   printk("%s: port %d vid %u-%u flags %x\n",
+   __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags);
+
mutex_lock(>smi_mutex);
 
for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {


-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH/RFC v3 net-next] ravb: Add dma queue interrupt support

2016-01-23 Thread Sergei Shtylyov

Hello.

On 01/17/2016 01:55 PM, Yoshihiro Kaneko wrote:


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (descriptor, error, management)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.


   Not sure I got the last sentence right... Did you mean that we save on the 
register reads?
   Yet another reason could be that we don't want to depend on the boot 
loader's whim any more...



Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 
---

This patch is based on the master branch of David Miller's next networking
tree.

v3 [Yoshihiro Kaneko]
* compile tested only


   Doesn't sound very encouraging... couldn't you manage to re-test the patch 
before Dave merges it?


[...]

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 9fbe92a..385f06b 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h

[...]

@@ -556,6 +566,18 @@ enum ISS_BIT {
ISS_DPS15   = 0x8000,
  };

+/* CIE
+ * R-Car Gen3 only
+ */


   I'd be more happy with one line comment like:

/* CIE (R-Car Gen3 only) */

[...]

@@ -592,6 +614,212 @@ enum GIS_BIT {
GIS_PTMF= 0x0004,
  };

+/* GIE
+ * R-Car Gen3 only
+ */
+enum GIE_BIT {

[...]

+   GIE_ALL = 0x03ff,


   GIE_ALL isn't used, no need to define it.


+};
+
+/* GID
+ * R-Car Gen3 only
+ */
+enum GID_BIT {

[...]

+   GID_ALL = 0x03ff,


   Not used any more.


+};
+
+/* RIE0
+ * R-Car Gen3 only
+ */
+enum RIE0_BIT {

[...]

+   RIE0_ALL= 0x0003,


   Likewise.


+};
+
+/* RID0
+ * R-Car Gen3 only
+ */
+enum RID0_BIT {

[...]

+   RID0_ALL= 0x0003,


   Likewise.


+};
+
+/* RIE2
+ * R-Car Gen3 only
+ */
+enum RIE2_BIT {

[...]

+   RIE2_ALL= 0x8003,


   Likewise.


+};
+
+/* RID2
+ * R-Car Gen3 only
+ */
+enum RID2_BIT {



+   RID2_ALL= 0x8003,


Likewise.


+};
+
+/* TIE
+ * R-Car Gen3 only
+ */
+enum TIE_BIT {

[...]

+   TIE_ALL = 0x000f0f0f,


   Likewise.


+};
+
+/* TID
+ * R-Car Gen3 only
+ */
+enum TID_BIT {

[...]

+   TID_ALL = 0x000f0f0f,


   Likewise.

[...]

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index ac43ed9..4c4912e0 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -42,6 +42,16 @@
 NETIF_MSG_RX_ERR | \
 NETIF_MSG_TX_ERR)

+static const char *ravb_rx_irqs[NUM_RX_QUEUE] = {
+   "ch0", /* RAVB_BE */
+   "ch1", /* RAVB_NC */
+};
+
+static const char *ravb_tx_irqs[NUM_TX_QUEUE] = {
+   "ch18", /* RAVB_BE */
+   "ch19", /* RAVB_NC */
+};
+


   Do what you wish but I don't like these...

[...]

@@ -693,7 +722,7 @@ static void ravb_error_interrupt(struct net_device *ndev)
if (ris2 & RIS2_QFF0)
priv->stats[RAVB_BE].rx_over_errors++;

-   /* Receive Descriptor Empty int */
+   /* Receive Descriptor Empty int */


   It's not even a "drove-by" change, please don't do this here, submit a 
separate patch (or I can do it).


[...]

@@ -773,6 +829,55 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)
return result;
  }

+static irqreturn_t ravb_dmaq_interrupt(int irq, void *dev_id, int ravb_queue)
+{
+   struct net_device *ndev = dev_id;
+   struct ravb_private *priv = netdev_priv(ndev);
+   irqreturn_t result = IRQ_NONE;
+   u32 ris0, ric0, tis, tic;
+   int q = ravb_queue;
+
+   spin_lock(>lock);
+
+   ris0 = ravb_read(ndev, RIS0);
+   ric0 = ravb_read(ndev, RIC0);
+   tis  = ravb_read(ndev, TIS);
+   tic  = ravb_read(ndev, TIC);
+
+   /* Timestamp updated */
+   if (tis & TIS_TFUF) {
+   ravb_write(ndev, TID_TFUD, TID);
+   ravb_get_tx_tstamp(ndev);
+   result = IRQ_HANDLED;
+   }
+
+   /* Best effort queue RX/TX */
+   if (((ris0 & ric0) & BIT(q)) ||
+   ((tis  & tic)  & BIT(q))) {
+   if (napi_schedule_prep(>napi[q])) {
+   /* Mask RX and TX interrupts */
+   ravb_write(ndev, BIT(q), RID0);
+   ravb_write(ndev, BIT(q), TID);
+   __napi_schedule(>napi[q]);
+   }


   There was an *else* branch originally (napi_schedule_prep()'s failure is a 
serious issue deserving an explanation), not sure why you dropped it here...


[...]

@@ -1215,29 +1325,63 @@ static const struct 

Re: struct pid memory leak

2016-01-23 Thread Eric Dumazet
On Sat, Jan 23, 2016 at 6:38 PM, Willy Tarreau  wrote:
> On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote:
>> It doesn't report this on 3.10.
>
> To be more precise, kmemleak reports the issue on 3.13 and not on 3.12.
> I'm not sure if it's reliable enough to run a bisect though.
>
> Willy
>

I have the leak on linux-3.11.

I believe even linux-3.3 gets the leak, although I had to wait about
one hour to be confident the leak was there.


[PATCH] net: fec: make driver endian-safe

2016-01-23 Thread Johannes Berg
The driver treats the device descriptors as CPU-endian, which is
probably right on both ARM (little-endian) and PowerPC (big-endian)
but wrong on big-endian ARM.

Add the correct annotations and byteswaps.

This gets it working on i.MX6 hummingboard booted in big-endian mode.

Signed-off-by: Johannes Berg 
---
 drivers/net/ethernet/freescale/Makefile   |   2 +
 drivers/net/ethernet/freescale/fec.h  |  39 ++---
 drivers/net/ethernet/freescale/fec_main.c | 130 --
 3 files changed, 99 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/freescale/Makefile 
b/drivers/net/ethernet/freescale/Makefile
index 71debd1c18c9..64ddc0bd2735 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -4,6 +4,8 @@
 
 obj-$(CONFIG_FEC) += fec.o
 fec-objs :=fec_main.o fec_ptp.o
+CFLAGS_fec_main.o := -D__CHECK_ENDIAN__
+
 obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx.o
 ifeq ($(CONFIG_FEC_MPC52xx_MDIO),y)
obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx_phy.o
diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 99d33e2d35e6..801dcc7fa6e6 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -191,27 +191,44 @@
 /*
  * Define the buffer descriptor structure.
  */
+/* buffer endianness appears to be a mess ... ARM is usually LE but can be BE 
*/
+#if defined(CONFIG_ARM) && defined(CONFIG_CPU_BIG_ENDIAN)
+#define fec32_to_cpu le32_to_cpu
+#define fec16_to_cpu le16_to_cpu
+#define cpu_to_fec32 cpu_to_le32
+#define cpu_to_fec16 cpu_to_le16
+#define __fec32 __le32
+#define __fec16 __le16
+#else
+#define fec32_to_cpu be32_to_cpu
+#define fec16_to_cpu be16_to_cpu
+#define cpu_to_fec32 cpu_to_be32
+#define cpu_to_fec16 cpu_to_be16
+#define __fec32 __be32
+#define __fec16 __be16
+#endif
+
 #if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
 struct bufdesc {
-   unsigned short cbd_datlen;  /* Data length */
-   unsigned short cbd_sc;  /* Control and status info */
-   unsigned long cbd_bufaddr;  /* Buffer address */
+   __fec16 cbd_datlen; /* Data length */
+   __fec16 cbd_sc; /* Control and status info */
+   __fec32 cbd_bufaddr;/* Buffer address */
 };
 #else
 struct bufdesc {
-   unsigned short  cbd_sc; /* Control and status info */
-   unsigned short  cbd_datlen; /* Data length */
-   unsigned long   cbd_bufaddr;/* Buffer address */
+   __fec16 cbd_sc; /* Control and status info */
+   __fec16 cbd_datlen; /* Data length */
+   __fec32 cbd_bufaddr;/* Buffer address */
 };
 #endif
 
 struct bufdesc_ex {
struct bufdesc desc;
-   unsigned long cbd_esc;
-   unsigned long cbd_prot;
-   unsigned long cbd_bdu;
-   unsigned long ts;
-   unsigned short res0[4];
+   __fec32 cbd_esc;
+   __fec32 cbd_prot;
+   __fec32 cbd_bdu;
+   __fec32 ts;
+   __fec16 res0[4];
 };
 
 /*
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index d2328fc5da57..8e81c4de1f41 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -331,11 +331,13 @@ static void fec_dump(struct net_device *ndev)
bdp = txq->tx_bd_base;
 
do {
-   pr_info("%3u %c%c 0x%04x 0x%08lx %4u %p\n",
+   pr_info("%3u %c%c 0x%04x 0x%08x %4u %p\n",
index,
bdp == txq->cur_tx ? 'S' : ' ',
bdp == txq->dirty_tx ? 'H' : ' ',
-   bdp->cbd_sc, bdp->cbd_bufaddr, bdp->cbd_datlen,
+   fec16_to_cpu(bdp->cbd_sc),
+   fec32_to_cpu(bdp->cbd_bufaddr),
+   fec16_to_cpu(bdp->cbd_datlen),
txq->tx_skbuff[index]);
bdp = fec_enet_get_nextdesc(bdp, fep, 0);
index++;
@@ -388,7 +390,7 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
bdp = fec_enet_get_nextdesc(bdp, fep, queue);
ebdp = (struct bufdesc_ex *)bdp;
 
-   status = bdp->cbd_sc;
+   status = fec16_to_cpu(bdp->cbd_sc);
status &= ~BD_ENET_TX_STATS;
status |= (BD_ENET_TX_TC | BD_ENET_TX_READY);
frag_len = skb_shinfo(skb)->frags[frag].size;
@@ -410,7 +412,7 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
if (skb->ip_summed == CHECKSUM_PARTIAL)
estatus |= BD_ENET_TX_PINS | BD_ENET_TX_IINS;
ebdp->cbd_bdu = 0;
-   ebdp->cbd_esc = estatus;
+   ebdp->cbd_esc = cpu_to_fec32(estatus);
}
 
bufaddr = page_address(this_frag->page.p) + 
this_frag->page_offset;
@@ -434,9 +436,9 

Re: [PATCH] net: dsa: fix mv88e6xxx switches

2016-01-23 Thread Vivien Didelot
Hi Russell,

Russell King - ARM Linux  writes:

> On Sat, Jan 23, 2016 at 11:23:26PM +0100, Andrew Lunn wrote:
>> On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote:
>> > Hi Russell
>> > 
>> > Never called.
>> > 
>> > So i guess we have a kernel configuration difference.
>> > 
>> > I don't have CONFIG_BRIDGE_VLAN_FILTERING.
>> 
>> Yep, confirmed. When i enable this option, and VLAN_8021Q which it
>> depends on, i then have a working bridge.
>
> It sounds like Vivien urgently needs to do some extra work to fix
> the patches that created all this broken-ness, or we need to ask
> davem to revert back to a known good working state.

Indeed hardware bridging currently relies on 802.1Q and VLAN filtering
enabled. I will work on this asap.

Thanks,
-v


Re: net: fec: make driver endian-safe

2016-01-23 Thread Johannes Berg
On Sat, 2016-01-23 at 23:09 +0100, Johannes Berg wrote:
> 
> +/* buffer endianness appears to be a mess ... ARM is usually LE but
> can be BE */
> +#if defined(CONFIG_ARM) && defined(CONFIG_CPU_BIG_ENDIAN)

Just realized that this is, of course, completely wrong. If the ARM CPU
is little endian, the device is of course *still* little endian, it
doesn't magically change itself with the CONFIG_CPU_BIG_ENDIAN
setting... :)

Ergo, the #if part should be taken, rather than the #else part, to
still define the descriptors as __le and just have no byteswapping
done.

Also, as Arnd had pointed out last night but I wasn't coherent enough
to fully understand, it's exceedingly likely that

>  #if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)

this needs to be changed and that the device is simply "generated" as
little-endian for ARM SoCs. He also pointed out that there could be
multiple CONFIG_SOC or CONFIG_ARCH definitions (if I understood
correctly), so this would be wrong anyway.

It's tempting to combine both ifdefs and change them to just CONFIG_ARM
but I have no way of verifying that on anything other than i.MX6 (on
the hummingboard), nor do I even know which SoCs ship with this block.

johannes