Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
Fri, Jan 22, 2016 at 09:59:12PM CET, jay.vosbu...@canonical.com wrote: >Jarod Wilsonwrote: > >>The network core tries to keep track of dropped packets, but some packets >>you wouldn't really call dropped, so much as intentionally ignored, under >>certain circumstances. One such case is that of bonding and team device >>slaves that are currently inactive. Their respective rx_handler functions >>return RX_HANDLER_EXACT (the only places in the kernel that return that), >>which ends up tracking into the network core's __netif_receive_skb_core() >>function's drop path, with no pt_prev set. On a noisy network, this can >>result in a very rapidly incrementing rx_dropped counter, not only on the >>inactive slave(s), but also on the master device, such as the following: >[...] >>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an >>active-backup bond0, and you can see that all three have high drop counts, >>with the master bond0 showing a tally of all three. >> >>I know that this was previously discussed some here: >> >>http://www.spinics.net/lists/netdev/msg226341.html >> >>It seems additional counters never came to fruition, but honestly, for >>this particular case, I'm not even sure they're warranted, I'd be inclined >>to say just silently drop these packets without incrementing a counter. At >>least, that's probably what would make someone who has complained loudly >>about this issue happy, as they have monitoring tools that are squaking >>loudly at any increments to rx_dropped. In this case, it is delivered with exact delivery according to per-dev registered callback. We just have to avoid it gets to bond. So this case is not "to drop", but rather "to block skb to don't get where it does not belong". > > I don't think the kernel should silently drop packets; there >should be a counter somewhere. If a packet is being thrown away >deliberately, it should not just vanish into the screaming void of >space. Someday someone will try and track down where that packet is >being dropped. > > I've had that same conversation with customers who insist on >accounting for every packet drop (from the "any drop is an error" >mindset), so I understand the issue. > > Thinking about the prior discussion, the rx_drop_inactive is >still a good idea, but I'd actually today get good use from a >"rx_drop_unforwardable" (or an equivalent but shorter name) counter that >counts every time a packet is dropped due to is_skb_forwardable() >returning false. __dev_forward_skb does this (and hits rx_dropped), as >does the bridge (and does not count it). > > -J > >>CC: "David S. Miller" >>CC: Eric Dumazet >>CC: Jiri Pirko >>CC: Daniel Borkmann >>CC: Tom Herbert >>CC: Jay Vosburgh >>CC: Veaceslav Falico >>CC: Andy Gospodarek >>CC: netdev@vger.kernel.org >>Signed-off-by: Jarod Wilson >>--- >> net/core/dev.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >>diff --git a/net/core/dev.c b/net/core/dev.c >>index 8cba3d8..1354c7b 100644 >>--- a/net/core/dev.c >>+++ b/net/core/dev.c >>@@ -4153,8 +4153,11 @@ ncls: >> else >> ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); >> } else { >>+ if (deliver_exact) >>+ goto inactive; /* bond or team inactive slave */ >> drop: >> atomic_long_inc(>dev->rx_dropped); >>+inactive: >> kfree_skb(skb); >> /* Jamal, now you will not able to escape explaining >> * me how you were going to use this. :-) >>-- >>1.8.3.1 >> > >--- > -Jay Vosburgh, jay.vosbu...@canonical.com
[PATCH] net: dsa: fix mv88e6xxx switches
Since commit 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops"), the Marvell 88E6xxx switch has been unable to pass traffic between ports - any received traffic is discarded by the switch. Taking a port out of bridge mode and configuring a vlan on it also the port to start passing traffic. With the debugfs files re-instated to allow debug of this issue by comparing the register settings between the working and non-working case, the reason becomes clear: GLOBAL GLOBAL2 SERDES 0123456 - 7: 707f2001 2222202 + 7: 707f2001 1111101 Register 7 for the ports is the default vlan tag register, and in the non-working setup, it has been set to 2, despite vlan 2 not being configured. This causes the switch to drop all packets coming in to these ports. The working setup has the default vlan tag register set to 1, which is the default vlan when none is configured. Inspection of the code reveals why. The code prior to this commit was: - for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { ... - if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID) - err = ds->drv->port_pvid_set(ds, p->port, vid); but the new code is: + for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { ... + } ... + if (pvid) + err = _mv88e6xxx_port_pvid_set(ds, port, vid); This causes the new code to always set the default vlan to one higher than the old code. Fix this. Fixes: 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops") Cc:Signed-off-by: Russell King --- As this is a regression on 4.4 compared to 4.3, I've added the stable tag. drivers/net/dsa/mv88e6xxx.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index b06dba05594a..bf622675397d 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1515,11 +1515,12 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, err = _mv88e6xxx_port_vlan_add(ds, port, vid, untagged); if (err) goto unlock; + + /* no PVID with ranges, otherwise it's a bug */ + if (pvid) + err = _mv88e6xxx_port_pvid_set(ds, port, vid); } - /* no PVID with ranges, otherwise it's a bug */ - if (pvid) - err = _mv88e6xxx_port_pvid_set(ds, port, vid); unlock: mutex_unlock(>smi_mutex); -- 2.1.0
[PATCH] ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV
The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have to select CRYPTO_ECHAINIV in order to work properly. This solves the issues caused by a misconfiguration as described in [1]. The original approach, patching crypto/Kconfig was turned down by Herbert Xu [2]. [1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html [2] http://marc.info/?l=linux-crypto-vger=145224655809562=2 Signed-off-by: Thomas Egerer--- net/ipv4/Kconfig | 1 + net/ipv6/Kconfig | 1 + 2 files changed, 2 insertions(+) diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index c229205..7758247 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -353,6 +353,7 @@ config INET_ESP select CRYPTO_CBC select CRYPTO_SHA1 select CRYPTO_DES + select CRYPTO_ECHAINIV ---help--- Support for IPsec ESP. diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index bb7dabe..40c8975 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -69,6 +69,7 @@ config INET6_ESP select CRYPTO_CBC select CRYPTO_SHA1 select CRYPTO_DES
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
Fri, Jan 22, 2016 at 08:11:22PM CET, ja...@redhat.com wrote: >The network core tries to keep track of dropped packets, but some packets >you wouldn't really call dropped, so much as intentionally ignored, under >certain circumstances. One such case is that of bonding and team device >slaves that are currently inactive. Their respective rx_handler functions >return RX_HANDLER_EXACT (the only places in the kernel that return that), >which ends up tracking into the network core's __netif_receive_skb_core() >function's drop path, with no pt_prev set. On a noisy network, this can >result in a very rapidly incrementing rx_dropped counter, not only on the >inactive slave(s), but also on the master device, such as the following: > >Inter-| Receive| Transmit > face |bytespackets errs drop fifo frame compressed multicast|bytes > packets errs drop fifo colls carrier compressed > p7p1: 14783346 1404300 1404280 0 0 2040 680 > 8000 0 0 0 > p7p2: 14805198 140648000 0 0 20340 >0000 0 0 0 > bond0: 53365248 5327980 4211600 0 0115151 2040 > 24000 0 0 0 >lo:5420 54000 0 0 0 5420 > 54000 0 0 0 > p5p1: 19292195 1961970 1403680 0 0 56564 680 > 8000 0 0 0 > p5p2: 19289707 1961710 1403640 0 0 56547 680 > 8000 0 0 0 > em3: 20996626 158214000 0 0 3830 >0000 0 0 0 > em2: 14065122 138462000 0 0 3100 >0000 0 0 0 > em1: 14063162 138440000 0 0 3080 >0000 0 0 0 > em4: 21050830 158729000 0 0 38571662 > 469000 0 0 0 > ib0: 0 0000 0 0 00 > 0000 0 0 0 > >In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an >active-backup bond0, and you can see that all three have high drop counts, >with the master bond0 showing a tally of all three. > >I know that this was previously discussed some here: > >http://www.spinics.net/lists/netdev/msg226341.html > >It seems additional counters never came to fruition, but honestly, for >this particular case, I'm not even sure they're warranted, I'd be inclined >to say just silently drop these packets without incrementing a counter. At >least, that's probably what would make someone who has complained loudly >about this issue happy, as they have monitoring tools that are squaking >loudly at any increments to rx_dropped. > >CC: "David S. Miller">CC: Eric Dumazet >CC: Jiri Pirko >CC: Daniel Borkmann >CC: Tom Herbert >CC: Jay Vosburgh >CC: Veaceslav Falico >CC: Andy Gospodarek >CC: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson Acked-by: Jiri Pirko I think this should be considered as a bug and go to -net.
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote: > --- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 8cba3d8..1354c7b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { > + if (deliver_exact) > + goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); > +inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) Note that if you still have a kfree_skb() instead of consume_skb(), some tools will still give you a wrong signal (packet dropped ...). But then maybe the signal is telling some truth. We receive a packet, and decide to drop it because no one was willing to handle it. Maybe someone wants to know a particular slave receives 10,000 such frames per second and hurts performance with useless work. We should at least increment some counter and maybe dump it with "ethtool -S" or something.
Re: struct pid memory leak
Hi Eric, Dmitry, On Fri, Jan 22, 2016 at 08:50:01AM -0800, Eric Dumazet wrote: > CC netdev, as it looks some af_unix issue ... > > On Fri, 2016-01-22 at 16:08 +0100, Dmitry Vyukov wrote: > > Hello, > > > > The following program causes struct pid memory leak: > > > > // autogenerated by syzkaller (http://github.com/google/syzkaller) (...) > > unreferenced object 0x8800324af200 (size 112): > > comm "syz-executor", pid 18413, jiffies 4295500287 (age 14.321s) > > hex dump (first 32 bytes): > > 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > backtrace: > > [] kmemleak_alloc+0x63/0xa0 mm/kmemleak.c:916 > > [< inline >] kmemleak_alloc_recursive > > include/linux/kmemleak.h:47 (...) > > On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2 (Jan 20). I can't reproduce this with the indicated commit. I'm unsure how/what I'm supposed to see. Is a certain config needed ? I've enabled kmemleak in my .config but there are too few information here to go further unfortunately. Regards, Willy
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 07:48:57PM +, Russell King - ARM Linux wrote: > On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote: > > I'm testing on a 6172. But 6172 and 6176 are both in the same family > > 6352, and share the same driver. > > Hmm, can't be that then. > > > So you initially have lan1 in an bridge. I don't. Running tcpdump on the device i'm trying to ping, there are ARP requests and replies. But the replies are never received by the target, arp -a shows . Looking at the stats counters in debugfs, the packets are counted in in_unicast, but also sw_in_filtered. Port 0 is lan0 and port 5 is the cpu port. root@370-rd:/sys/kernel/debug/dsa0# cat regs GLOBAL GLOBAL2 SERDES 0123456 0: c874 01940 1d0f 1d0f 1d0f 1d0f 100f e076 1: fa0 0 149 33333 c03e3 2: fa0 141 0000000 3: 0 ea1 1721 1721 1721 1721 1721 1721 1721 4: 6000 258 1e0 433 433 433 433 433 373f 433 5: 3000 ff 0 0000000 6: fa01f0f 47f 7e 7d 7c 7b 7a 79 7: 3331707f2001 0 fa1 fa2 fa3 fa40 fa6 8: 3037800 0 2c80 2c80 2c80 2c80 2c80 2c80 2c80 9: 01600 0 1111111 a: 148 0 0 0000000 b: 20001000 0 1248 10 2000 40 c: f0f 7f 0 0000000 d: 0 5f1 3 0000000 e: 0 0 0 0000000 f: 0 f00c000 dada dada dada dada dada dada dada 10: 0 04005 0000000 11: 0 08000 0000000 12: 0 012000080 13: 0 0 00000 220 14: 400 0 0000000 15: 0 0 0000000 16: 0 1 700f 7002 7004 7008 33 330 17: 0 0 0000000 18: fa4115f6 0 3210 3210 3210 3210 3210 3210 3210 19: 0 0 0 7654 7654 7654 7654 7654 7654 7654 1a: 5550 0 42 0000000 1b: 1faf869 0 8000 8000 8000 8000 8000 8000 8000 1c: 0 0 0 0000000 1d: 5ce0 0 0 0000000 1e: 0 0 0 0000000 1f: 0 0 0 0000000 Andrew
Re: [Xen-devel] [PATCH 1/3] xen-netback: fix license ident used in MODULE_LICENSE
On Fri, Jan 22, 2016 at 08:25:21PM +, One Thousand Gnomes wrote: > > The fact what include/linux/license.h:license_is_gpl_compatible includes > > "Dual MIT/GPL" as an option seems to suggest that it is enough of a thing > > to be validly used as the contents of a MODULE_LICENSE() thing. > > Yes. The MIT licence most definitely exists, and people know what it > means. > > Also nobody should be changing the licence on anything unless they have > the written permission of all rights holders on record, so it's best to > leave it be 8) > I knew from the beginning anything related to license will be fun. :-) In this particular case, I don't think I need to get confirmation from all rights holder because they've agreed to the licenses listed in the comment. I'm merely fixing a bug in code. I understand people have different opinion on how this should be interpreted. And I'm not a lawyer. Let's just leave it be now and divert our energy to more useful things in life. Wei. > Alan
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote: > I'm testing on a 6172. But 6172 and 6176 are both in the same family > 6352, and share the same driver. Hmm, can't be that then. > So you initially have lan1 in an bridge. I don't. Okay, I've disabled br0 in the debian network/interfaces and rebooted. This means eth0 (connected to the bridge) is initially down, along with all the lan interfaces. root@clearfog:~# brctl show br0 bridge name bridge id STP enabled interfaces br0 can't get info No such device root@clearfog:~# ip link set eth0 up root@clearfog:~# ip link set lan5 up root@clearfog:~# ip addr add 192.168.254.3/24 dev lan5 root@clearfog:~# ping 192.168.254.254 PING 192.168.254.254 (192.168.254.254) 56(84) bytes of data. 64 bytes from 192.168.254.254: icmp_seq=1 ttl=64 time=1.15 ms 64 bytes from 192.168.254.254: icmp_seq=2 ttl=64 time=0.686 ms ^C --- 192.168.254.254 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.686/0.921/1.157/0.237 ms root@clearfog:~# ip addr del 192.168.254.3/24 dev lan5 root@clearfog:~# brctl addbr br0 root@clearfog:~# brctl addif br0 lan5 root@clearfog:~# ip link set br0 up root@clearfog:~# ip addr add 192.168.254.3/24 dev br0 root@clearfog:~# ping 192.168.254.254 PING 192.168.254.254 (192.168.254.254) 56(84) bytes of data. 64 bytes from 192.168.254.254: icmp_seq=1 ttl=64 time=1.18 ms ^C --- 192.168.254.254 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.189/1.189/1.189/0.000 ms -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Fri, Jan 22, 2016 at 02:11:22PM -0500, Jarod Wilson wrote: > The network core tries to keep track of dropped packets, but some packets > you wouldn't really call dropped, so much as intentionally ignored, under > certain circumstances. One such case is that of bonding and team device > slaves that are currently inactive. Their respective rx_handler functions > return RX_HANDLER_EXACT (the only places in the kernel that return that), > which ends up tracking into the network core's __netif_receive_skb_core() > function's drop path, with no pt_prev set. On a noisy network, this can > result in a very rapidly incrementing rx_dropped counter, not only on the > inactive slave(s), but also on the master device, such as the following: > > Inter-| Receive| Transmit > face |bytespackets errs drop fifo frame compressed multicast|bytes > packets errs drop fifo colls carrier compressed > p7p1: 14783346 1404300 1404280 0 0 2040 680 > 8000 0 0 0 > p7p2: 14805198 140648000 0 0 20340 > 0000 0 0 0 > bond0: 53365248 5327980 4211600 0 0115151 2040 > 24000 0 0 0 > lo:5420 54000 0 0 0 5420 > 54000 0 0 0 > p5p1: 19292195 1961970 1403680 0 0 56564 680 > 8000 0 0 0 > p5p2: 19289707 1961710 1403640 0 0 56547 680 > 8000 0 0 0 >em3: 20996626 158214000 0 0 3830 > 0000 0 0 0 >em2: 14065122 138462000 0 0 3100 > 0000 0 0 0 >em1: 14063162 138440000 0 0 3080 > 0000 0 0 0 >em4: 21050830 158729000 0 0 38571662 > 469000 0 0 0 >ib0: 0 0000 0 0 00 >0000 0 0 0 > > In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an > active-backup bond0, and you can see that all three have high drop counts, > with the master bond0 showing a tally of all three. > > I know that this was previously discussed some here: > > http://www.spinics.net/lists/netdev/msg226341.html > > It seems additional counters never came to fruition, but honestly, for > this particular case, I'm not even sure they're warranted, I'd be inclined > to say just silently drop these packets without incrementing a counter. At > least, that's probably what would make someone who has complained loudly > about this issue happy, as they have monitoring tools that are squaking > loudly at any increments to rx_dropped. I completely agree. > CC: "David S. Miller"> CC: Eric Dumazet > CC: Jiri Pirko > CC: Daniel Borkmann > CC: Tom Herbert > CC: Jay Vosburgh > CC: Veaceslav Falico > CC: Andy Gospodarek > CC: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson Acked-by: Andy Gospodarek > --- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 8cba3d8..1354c7b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { > + if (deliver_exact) > + goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); > +inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) > -- > 1.8.3.1 >
Re: net: GPF in netlink_getsockbyportid
On 01/23/2016 08:25 PM, Florian Westphal wrote: Dmitry Vyukovwrote: [ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ] The following program causes GPF in netlink_getsockbyportid: // autogenerated by syzkaller (http://github.com/google/syzkaller) #include #include #include #include #include int main() { syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul, 0xul, 0x0ul); int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0); *(uint32_t*)0x20e64000 = (uint32_t)0x28; *(uint32_t*)0x20e64004 = (uint32_t)0x10; *(uint64_t*)0x20e64008 = (uint64_t)0x0; *(uint64_t*)0x20e64010 = (uint64_t)0x3; *(uint64_t*)0x20e64018 = (uint64_t)0xfff; *(uint16_t*)0x20e64020 = (uint16_t)0x5; syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0); return 0; } CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/ root cause is in nfnetlink_rcv_batch(): 296 replay: 297 status = 0; 298 299 skb = netlink_skb_clone(oskb, GFP_KERNEL); The clone op doesn't copy oskb->sk, so we oops in __netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch tries to send netlink ack. If indeed oskb is the mmap'ed netlink skb, then it's not even allowed to call into skb_clone() as it would access skb shared info data that can be controlled by the user space mmap buffer, iirc, we had that in the past with nlmon where skb_clone() was accidentally used.
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote: > I've attached my .config. > Also run this program in a parallel loop. I think it's leaking not > every time, probably some race is involved. Thank you. Just in order to confirm, am I supposed to see the messages you quoted in dmesg ? Thanks, Willy
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 07:06:22PM +, Russell King - ARM Linux wrote: > On Sat, Jan 23, 2016 at 07:15:26PM +0100, Andrew Lunn wrote: > > Thanks for digging into this. > > I hope you saw v2, which is functionally identical. No, not yet. I will go look for it. > > I think this is a step towards a solution, but does not solve all the > > problems. > > > > e.g. I have a switch interface lan0 with the IP address > > 192.168.10.2. I can ping this address from another host. I then take > > the IP address off the interface, create a br0 device, add lan0 to the > > bridge, and put 192.168.10.2 onto the bridge. I should be able to then > > ping the address. But it does not work. > > That works for me. Maybe it's differences in the switch device? I > seem to remember you said your switch was an older generation than > mine (88E6176). I'm testing on a 6172. But 6172 and 6176 are both in the same family 6352, and share the same driver. > root@clearfog:~# brctl show br0 > bridge name bridge id STP enabled interfaces > br0 8000.005043020202 no lan1 > lan2 > lan3 > lan4 > lan5 > lan6 So you initially have lan1 in an bridge. I don't. root@370-rd:~# ip link set lan0 up root@370-rd:~# ip addr add 192.168.10.2/24 dev lan0 root@370-rd:~# ping 192.168.10.1 PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data. 64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=1.69 ms 64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.953 ms 64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=0.926 ms root@370-rd:~# ip addr del 192.168.10.2/24 dev lan0 root@370-rd:~# brctl addbr br0 root@370-rd:~# brctl addif br0 lan0 root@370-rd:~# ip link set br0 up root@370-rd:~# ip addr add 192.168.10.2/24 dev br0 root@370-rd:~# ping 192.168.10.1 PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data. ^C --- 192.168.10.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2001ms Andrew
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 10:46 AM, Dmitry Vyukovwrote: > On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreau wrote: >> On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote: >>> I've attached my .config. >>> Also run this program in a parallel loop. I think it's leaking not >>> every time, probably some race is involved. >> >> Thank you. Just in order to confirm, am I supposed to see the >> messages you quoted in dmesg ? > > > I think the simplest way to confirm that you can reproduce it locally > is to check /proc/slabinfo. When I run this program in a parallel > loop, number of objects in pid cache was constantly growing: > > # cat /proc/slabinfo | grep pid > pid 297532576 284 : tunables00 > 0 : slabdata 19 19 0 > ... > pid 412532576 284 : tunables00 > 0 : slabdata 19 19 0 > ... > pid 1107 1176576 284 : tunables00 > 0 : slabdata 42 42 0 > ... > pid 1545 1652576 284 : tunables00 > 0 : slabdata 59 59 0 > > > If you want to use kmemleak, then you need to run this program in a > parallel loop for some time, then stop it and then: > > $ echo scan > /sys/kernel/debug/kmemleak > $ cat /sys/kernel/debug/kmemleak > > If kmemleak has detected any leaks, cat will show them. I noticed that > kmemleak can delay leaks with significant delay, so usually I do scan > at least 5 times. Note that kmemleak is not needed. Just run a normal kernel (eventually using slab_nomerge=1 boot cmd to make sure 'pid' slab is not shared) It seems that bug is rather old, as linux-4.0 has it.
net: GPF in netlink_getsockbyportid
Hello, The following program causes GPF in netlink_getsockbyportid: // autogenerated by syzkaller (http://github.com/google/syzkaller) #include #include #include #include #include int main() { syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul, 0xul, 0x0ul); int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0); *(uint32_t*)0x20e64000 = (uint32_t)0x28; *(uint32_t*)0x20e64004 = (uint32_t)0x10; *(uint64_t*)0x20e64008 = (uint64_t)0x0; *(uint64_t*)0x20e64010 = (uint64_t)0x3; *(uint64_t*)0x20e64018 = (uint64_t)0xfff; *(uint16_t*)0x20e64020 = (uint16_t)0x5; syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0); return 0; } kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 1 PID: 7519 Comm: syz-executor Not tainted 4.4.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800625e17c0 ti: 880062b5 task.ti: 880062b5 RIP: 0010:[] [] netlink_getsockbyportid+0x30/0x1b0 RSP: 0018:880062b57818 EFLAGS: 00010206 RAX: dc00 RBX: 0002 RCX: 0002 RDX: 0048 RSI: 0002 RDI: 0240 RBP: 880062b57838 R08: 024000c0 R09: R10: R11: R12: R13: 024000c0 R14: R15: 0002 FS: 7fe4328d7700() GS:88003ed0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7fff7c3ecd80 CR3: 3b0b3000 CR4: 06e0 Stack: 880062b578d0 003c 024000c0 880062b578b8 8536b300 00e0 8175fe26 8800625e1fe0 7379736275732d6b 302d 8764d868 Call Trace: [] __netlink_alloc_skb+0x30/0x790 net/netlink/af_netlink.c:1890 [< inline >] netlink_alloc_skb include/linux/netlink.h:79 [] netlink_ack+0x153/0x520 net/netlink/af_netlink.c:2968 [< inline >] nfnetlink_rcv_batch net/netfilter/nfnetlink.c:321 [] nfnetlink_rcv+0xbad/0x10a0 net/netfilter/nfnetlink.c:477 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1834 [] netlink_unicast+0x47a/0x700 net/netlink/af_netlink.c:1860 [] netlink_sendmsg+0x1086/0x1760 net/netlink/af_netlink.c:2511 [< inline >] sock_sendmsg_nosec net/socket.c:611 [] sock_sendmsg+0xca/0x110 net/socket.c:621 [] sock_write_iter+0x216/0x3a0 net/socket.c:820 [< inline >] new_sync_write fs/read_write.c:517 [] __vfs_write+0x302/0x480 fs/read_write.c:530 [] vfs_write+0x167/0x4a0 fs/read_write.c:577 [< inline >] SYSC_write fs/read_write.c:624 [] SyS_write+0x111/0x220 fs/read_write.c:616 [] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 Code: 55 41 54 53 49 89 fc 89 f3 48 83 ec 08 e8 39 b7 20 fc 49 8d bc 24 40 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 30 01 00 00 49 8d 7c 24 30 RIP [] netlink_getsockbyportid+0x30/0x1b0 net/netlink/af_netlink.c:1658 RSP ---[ end trace f4ac9332ef80a14f ]--- On commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2.
Re: net: GPF in netlink_getsockbyportid
Dmitry Vyukovwrote: [ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ] > The following program causes GPF in netlink_getsockbyportid: > > // autogenerated by syzkaller (http://github.com/google/syzkaller) > #include > #include > #include > #include > #include > > int main() > { > syscall(SYS_mmap, 0x2000ul, 0xe65000ul, 0x3ul, 0x32ul, > 0xul, 0x0ul); > int fd = syscall(SYS_socket, 0x10ul, 0x803ul, 0xcul, 0, 0, 0); > *(uint32_t*)0x20e64000 = (uint32_t)0x28; > *(uint32_t*)0x20e64004 = (uint32_t)0x10; > *(uint64_t*)0x20e64008 = (uint64_t)0x0; > *(uint64_t*)0x20e64010 = (uint64_t)0x3; > *(uint64_t*)0x20e64018 = (uint64_t)0xfff; > *(uint16_t*)0x20e64020 = (uint16_t)0x5; > syscall(SYS_write, fd, 0x20e64000ul, 0x28ul, 0, 0, 0); > return 0; > } CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/ root cause is in nfnetlink_rcv_batch(): 296 replay: 297 status = 0; 298 299 skb = netlink_skb_clone(oskb, GFP_KERNEL); The clone op doesn't copy oskb->sk, so we oops in __netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch tries to send netlink ack.
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreauwrote: > On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote: >> I've attached my .config. >> Also run this program in a parallel loop. I think it's leaking not >> every time, probably some race is involved. > > Thank you. Just in order to confirm, am I supposed to see the > messages you quoted in dmesg ? I think the simplest way to confirm that you can reproduce it locally is to check /proc/slabinfo. When I run this program in a parallel loop, number of objects in pid cache was constantly growing: # cat /proc/slabinfo | grep pid pid 297532576 284 : tunables00 0 : slabdata 19 19 0 ... pid 412532576 284 : tunables00 0 : slabdata 19 19 0 ... pid 1107 1176576 284 : tunables00 0 : slabdata 42 42 0 ... pid 1545 1652576 284 : tunables00 0 : slabdata 59 59 0 If you want to use kmemleak, then you need to run this program in a parallel loop for some time, then stop it and then: $ echo scan > /sys/kernel/debug/kmemleak $ cat /sys/kernel/debug/kmemleak If kmemleak has detected any leaks, cat will show them. I noticed that kmemleak can delay leaks with significant delay, so usually I do scan at least 5 times.
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 10:51:16AM +, Russell King wrote: > Since commit 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del > ops"), the Marvell 88E6xxx switch has been unable to pass traffic > between ports - any received traffic is discarded by the switch. > Taking a port out of bridge mode and configuring a vlan on it also the > port to start passing traffic. > > With the debugfs files re-instated to allow debug of this issue by > comparing the register settings between the working and non-working > case, the reason becomes clear: > > GLOBAL GLOBAL2 SERDES 0123456 > - 7: 707f2001 2222202 > + 7: 707f2001 1111101 > > Register 7 for the ports is the default vlan tag register, and in the > non-working setup, it has been set to 2, despite vlan 2 not being > configured. This causes the switch to drop all packets coming in to > these ports. The working setup has the default vlan tag register set > to 1, which is the default vlan when none is configured. > > Inspection of the code reveals why. The code prior to this commit > was: > > - for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { > ... > - if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID) > - err = ds->drv->port_pvid_set(ds, p->port, vid); > > but the new code is: > > + for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { > ... > + } > ... > + if (pvid) > + err = _mv88e6xxx_port_pvid_set(ds, port, vid); > > This causes the new code to always set the default vlan to one higher > than the old code. > > Fix this. > > Fixes: 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops") > Cc:> Signed-off-by: Russell King Hi Russell Thanks for digging into this. I think this is a step towards a solution, but does not solve all the problems. e.g. I have a switch interface lan0 with the IP address 192.168.10.2. I can ping this address from another host. I then take the IP address off the interface, create a br0 device, add lan0 to the bridge, and put 192.168.10.2 onto the bridge. I should be able to then ping the address. But it does not work. Andrew
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 07:15:26PM +0100, Andrew Lunn wrote: > Thanks for digging into this. I hope you saw v2, which is functionally identical. > I think this is a step towards a solution, but does not solve all the > problems. > > e.g. I have a switch interface lan0 with the IP address > 192.168.10.2. I can ping this address from another host. I then take > the IP address off the interface, create a br0 device, add lan0 to the > bridge, and put 192.168.10.2 onto the bridge. I should be able to then > ping the address. But it does not work. That works for me. Maybe it's differences in the switch device? I seem to remember you said your switch was an older generation than mine (88E6176). root@clearfog:~# brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.005043020202 no lan1 lan2 lan3 lan4 lan5 lan6 root@clearfog:~# ip addr show|grep 192.168.254 root@clearfog:~# ip addr add 192.168.254.3/24 dev br0 root@clearfog:~# ping 192.168.254.2 PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data. 64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.648 ms ^C --- 192.168.254.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.648/0.648/0.648/0.000 ms root@clearfog:~# ip addr del 192.168.254.3/24 dev br0 root@clearfog:~# brctl delif br0 lan1 root@clearfog:~# ip addr add 192.168.254.3/24 dev lan1 root@clearfog:~# ping 192.168.254.2 PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data. 64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.331 ms ^C --- 192.168.254.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.331/0.331/0.331/0.000 ms root@clearfog:~# ip addr del 192.168.254.3/24 dev lan1 root@clearfog:~# brctl addif br0 lan1 root@clearfog:~# ip addr add 192.168.254.3/24 dev br0 root@clearfog:~# ping 192.168.254.2 PING 192.168.254.2 (192.168.254.2) 56(84) bytes of data. 64 bytes from 192.168.254.2: icmp_seq=1 ttl=255 time=0.630 ms ^C --- 192.168.254.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.630/0.630/0.630/0.000 ms I have no other patches for DSA other than the v2 I posted on this kernel. I'd suggest adding back the debugfs register dumping so you can get some diagnostics, and work out what's different in the configuration between a working kernel and a non-working kernel: https://github.com/vivien/linux/commit/b23c6568011c84b80a8f5632ac819a5dbdca2dc1.patch However, what I do notice is that trying to use vlans is a complete fail - the switch strips _all_ vlan tags off the outgoing packets no matter how I setup a vlan, whether it's on br0, or on a switch port: Packet as observed on a host connected to the switch port with no vlan capability: 0x: 0050 4302 0202 0806 0001 0x0010: 0800 0604 0001 0050 4302 0202 c0a8 fe03 ... Packet observed on Armada 388's eth interface connected to the 88E6176 switch: 0x: 0050 4302 0202 dada 0x0010: 6030 0800 0806 0001 0800 0604 0001 0050 ... The packet should appear on the wire with the 8100 0800 vlan protocol ID + tag, but it appears with the vlan protocol and tag stripped. So there's yet more issues here. Incoming packets from the interface appear to be dropped as far as I can tell - the response to those ARP packets doesn't appear, though I'm not convinced that I've been able to get the only vlan-capable device on that side of the network to be setup correctly for vlans - but obviously it's a total loss if vlan transmission isn't working. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [PATCH] net: dsa: fix mv88e6xxx switches
> This shows port 0 is on vlan 0, but it should default to vlan 1 when > no vlans are configured. The patch below should at least allow some > diagnosis of what's being requested, and when. > > diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c > index a43354ed0607..8a9cf67eb16d 100644 > --- a/drivers/net/dsa/mv88e6xxx.c > +++ b/drivers/net/dsa/mv88e6xxx.c > @@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int > port, > u16 vid; > int err = 0; > > + printk("%s: port %d vid %u-%u flags %x\n", > + __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags); > + Hi Russell Never called. So i guess we have a kernel configuration difference. I don't have CONFIG_BRIDGE_VLAN_FILTERING. But DSA should not rely on this option for correct operation. Andrew
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote: > > This shows port 0 is on vlan 0, but it should default to vlan 1 when > > no vlans are configured. The patch below should at least allow some > > diagnosis of what's being requested, and when. > > > > diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c > > index a43354ed0607..8a9cf67eb16d 100644 > > --- a/drivers/net/dsa/mv88e6xxx.c > > +++ b/drivers/net/dsa/mv88e6xxx.c > > @@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, > > int port, > > u16 vid; > > int err = 0; > > > > + printk("%s: port %d vid %u-%u flags %x\n", > > + __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags); > > + > > Hi Russell > > Never called. > > So i guess we have a kernel configuration difference. > > I don't have CONFIG_BRIDGE_VLAN_FILTERING. Yep, confirmed. When i enable this option, and VLAN_8021Q which it depends on, i then have a working bridge. Andrew
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 11:23:26PM +0100, Andrew Lunn wrote: > On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote: > > Hi Russell > > > > Never called. > > > > So i guess we have a kernel configuration difference. > > > > I don't have CONFIG_BRIDGE_VLAN_FILTERING. > > Yep, confirmed. When i enable this option, and VLAN_8021Q which it > depends on, i then have a working bridge. It sounds like Vivien urgently needs to do some extra work to fix the patches that created all this broken-ness, or we need to ask davem to revert back to a known good working state. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 07:46:45PM +0100, Dmitry Vyukov wrote: > On Sat, Jan 23, 2016 at 7:40 PM, Willy Tarreauwrote: > > On Sat, Jan 23, 2016 at 07:14:33PM +0100, Dmitry Vyukov wrote: > >> I've attached my .config. > >> Also run this program in a parallel loop. I think it's leaking not > >> every time, probably some race is involved. > > > > Thank you. Just in order to confirm, am I supposed to see the > > messages you quoted in dmesg ? > > > I think the simplest way to confirm that you can reproduce it locally > is to check /proc/slabinfo. When I run this program in a parallel > loop, number of objects in pid cache was constantly growing: > > # cat /proc/slabinfo | grep pid > pid 297532576 284 : tunables00 > 0 : slabdata 19 19 0 > ... > pid 412532576 284 : tunables00 > 0 : slabdata 19 19 0 > ... > pid 1107 1176576 284 : tunables00 > 0 : slabdata 42 42 0 > ... > pid 1545 1652576 284 : tunables00 > 0 : slabdata 59 59 0 OK got it and indeed I can see it grow. In fact, the active column grows and once it reaches the num objects, this one grows in turn, which makes sense. All I can say now is that it doesn't need to run over multiple processes to leak, though that makes it easier. SMP is not needed either. > If you want to use kmemleak, then you need to run this program in a > parallel loop for some time, then stop it and then: > > $ echo scan > /sys/kernel/debug/kmemleak > $ cat /sys/kernel/debug/kmemleak > > If kmemleak has detected any leaks, cat will show them. I noticed that > kmemleak can delay leaks with significant delay, so usually I do scan > at least 5 times. Thank you for these information. I've tested on an older (3.14) kernel and I can see the effect there as well. I don't have "pid" in slabinfo, but launching 1000 processes at a time uses a few tens to hundreds kB of RAM on each round. 3.10 doesn't seem affected, I'm seeing the memory grow to a fixed point if I increase the number of parallel processes but then even after a few tens of thousands of processes, the reported used memory doesn't seem to increase (remember no "pid" entry here). kmemleak indeed reports me something on 3.14 which seems to match your trace as I'm seeing bash as the process (instead of syz-executor in your case) and alloc_pid() calls kmem_cache_alloc() : Unreferenced object 0x88003facd000 (size 128): comm "bash", pid 1822, jiffies 4294951223 (age 15.280s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [] kmem_cache_alloc+0x92/0xe0 [] alloc_pid+0x24/0x4a0 [] cpumask_any_but+0x23/0x40 [] copy_process.part.66+0x1068/0x16e0 [] n_tty_write+0x37b/0x4f0 [] tty_write+0x1c1/0x2a0 [] do_fork+0xe0/0x340 [] __set_task_blocked+0x30/0x80 [] __set_current_blocked+0x38/0x60 [] stub_clone+0x69/0x90 [] system_call_fastpath+0x16/0x1b [] 0x It doesn't report this on 3.10. Unfortunately I feel totally incompetent on the subject :-/ Willy
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 06:50:11PM -0800, Eric Dumazet wrote: > On Sat, Jan 23, 2016 at 6:38 PM, Willy Tarreauwrote: > > On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote: > >> It doesn't report this on 3.10. > > > > To be more precise, kmemleak reports the issue on 3.13 and not on 3.12. > > I'm not sure if it's reliable enough to run a bisect though. > > > > Willy > > > > I have the leak on linux-3.11. > > I believe even linux-3.3 gets the leak, although I had to wait about > one hour to be confident the leak was there. OK so I'm stopping my bisect. It's possible it's affected with some option which changed along the various "make oldconfig" at each step. Thanks for letting me know. Willy
[PATCH] 82xx: FCC: Fixing a bug causing to FCC port lock-up (second try)
This is an additional patch to the one already submitted recently. The previous patch was not complete, and the FCC port lock-up scenario has been reproduced in lab. I had an opportunity to check the current patch in lab and the FCC port lock no longer freezes, while the previous patch still locks-up the FCC port. The current patch fixes a pointer arithmetic bug (second bug in the same line), which leads FCC port lock-up during underrun/collision handling. Within the tx_startup() function in mac-fcc.c, the address of last BD is not calculated correctly. As a result of wrong calculation of the last BD address, the next transmitted BD may be set to an area out of the transmit BD ring. This actually causes to port lock-up and it is not recoverable. Signed-off-by: Martin Roth--- drivers/net/ethernet/freescale/fs_enet/mac-fcc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c index 52e0091..1ba359f 100644 --- a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c @@ -552,7 +552,7 @@ static void tx_restart(struct net_device *dev) cbd_t __iomem *prev_bd; cbd_t __iomem *last_tx_bd; - last_tx_bd = fep->tx_bd_base + ((fpi->tx_ring - 1) * sizeof(cbd_t)); + last_tx_bd = fep->tx_bd_base + (fpi->tx_ring - 1); /* get the current bd held in TBPTR and scan back from this point */ recheck_bd = curr_tbptr = (cbd_t __iomem *) -- 1.7.9.5
Re: net: GPF in netlink_getsockbyportid
Daniel Borkmannwrote: > On 01/23/2016 08:25 PM, Florian Westphal wrote: > >Dmitry Vyukov wrote: > > > >[ CC nf-devel, not sure if its nfnetlink fault or NETLINK_MMAP ] > > > >>The following program causes GPF in netlink_getsockbyportid: [..] > >CONFIG_NETLINK_MMAP and nfnetlink batching strike in unison :-/ > > > >root cause is in nfnetlink_rcv_batch(): > > > >296 replay: > >297 status = 0; > >298 > >299 skb = netlink_skb_clone(oskb, GFP_KERNEL); > > > >The clone op doesn't copy oskb->sk, so we oops in > >__netlink_alloc_skb -> netlink_getsockbyportid() when nfnetlink_rcv_batch > >tries to send netlink ack. > > If indeed oskb is the mmap'ed netlink skb, then it's not even allowed > to call into skb_clone() Right, but in this case there is no mmap'd netlink sk involved -- we crash when we try to look up dst netlink socket to see if there is an mmap'd ring attached. [ and that code isn't there with CONFIG_NETLINK_MMAP=n ].
Re: struct pid memory leak
On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote: > It doesn't report this on 3.10. To be more precise, kmemleak reports the issue on 3.13 and not on 3.12. I'm not sure if it's reliable enough to run a bisect though. Willy
Re: [PATCH] net: dsa: fix mv88e6xxx switches
On Sat, Jan 23, 2016 at 09:16:47PM +0100, Andrew Lunn wrote: > On Sat, Jan 23, 2016 at 07:48:57PM +, Russell King - ARM Linux wrote: > > On Sat, Jan 23, 2016 at 08:37:05PM +0100, Andrew Lunn wrote: > > > I'm testing on a 6172. But 6172 and 6176 are both in the same family > > > 6352, and share the same driver. > > > > Hmm, can't be that then. > > > > > So you initially have lan1 in an bridge. I don't. > > Running tcpdump on the device i'm trying to ping, there are ARP > requests and replies. But the replies are never received by the > target, arp -a shows . > > Looking at the stats counters in debugfs, the packets are counted in > in_unicast, but also sw_in_filtered. > > Port 0 is lan0 and port 5 is the cpu port. > > root@370-rd:/sys/kernel/debug/dsa0# cat regs > GLOBAL GLOBAL2 SERDES 0123456 > 0: c874 01940 1d0f 1d0f 1d0f 1d0f 100f e076 > 1: fa0 0 149 33333 c03e3 > 2: fa0 141 0000000 > 3: 0 ea1 1721 1721 1721 1721 1721 1721 1721 > 4: 6000 258 1e0 433 433 433 433 433 373f 433 > 5: 3000 ff 0 0000000 > 6: fa01f0f 47f 7e 7d 7c 7b 7a 79 > 7: 3331707f2001 0 fa1 fa2 fa3 fa40 fa6 This shows port 0 is on vlan 0, but it should default to vlan 1 when no vlans are configured. The patch below should at least allow some diagnosis of what's being requested, and when. diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index a43354ed0607..8a9cf67eb16d 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1511,6 +1511,9 @@ int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, u16 vid; int err = 0; + printk("%s: port %d vid %u-%u flags %x\n", + __func__, port, vlan->vid_begin, vlan->vid_end, vlan->flags); + mutex_lock(>smi_mutex); for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [PATCH/RFC v3 net-next] ravb: Add dma queue interrupt support
Hello. On 01/17/2016 01:55 PM, Yoshihiro Kaneko wrote: From: Kazuya MizuguchiThis patch supports the following interrupts. - One interrupt for multiple (descriptor, error, management) - One interrupt for emac - Four interrupts for dma queue (best effort rx/tx, network control rx/tx) This patch improve efficiency of the interrupt handler by adding the interrupt handler corresponding to each interrupt source described above. Additionally, it reduces the number of times of the access to EthernetAVB IF. Not sure I got the last sentence right... Did you mean that we save on the register reads? Yet another reason could be that we don't want to depend on the boot loader's whim any more... Signed-off-by: Kazuya Mizuguchi Signed-off-by: Yoshihiro Kaneko --- This patch is based on the master branch of David Miller's next networking tree. v3 [Yoshihiro Kaneko] * compile tested only Doesn't sound very encouraging... couldn't you manage to re-test the patch before Dave merges it? [...] diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h index 9fbe92a..385f06b 100644 --- a/drivers/net/ethernet/renesas/ravb.h +++ b/drivers/net/ethernet/renesas/ravb.h [...] @@ -556,6 +566,18 @@ enum ISS_BIT { ISS_DPS15 = 0x8000, }; +/* CIE + * R-Car Gen3 only + */ I'd be more happy with one line comment like: /* CIE (R-Car Gen3 only) */ [...] @@ -592,6 +614,212 @@ enum GIS_BIT { GIS_PTMF= 0x0004, }; +/* GIE + * R-Car Gen3 only + */ +enum GIE_BIT { [...] + GIE_ALL = 0x03ff, GIE_ALL isn't used, no need to define it. +}; + +/* GID + * R-Car Gen3 only + */ +enum GID_BIT { [...] + GID_ALL = 0x03ff, Not used any more. +}; + +/* RIE0 + * R-Car Gen3 only + */ +enum RIE0_BIT { [...] + RIE0_ALL= 0x0003, Likewise. +}; + +/* RID0 + * R-Car Gen3 only + */ +enum RID0_BIT { [...] + RID0_ALL= 0x0003, Likewise. +}; + +/* RIE2 + * R-Car Gen3 only + */ +enum RIE2_BIT { [...] + RIE2_ALL= 0x8003, Likewise. +}; + +/* RID2 + * R-Car Gen3 only + */ +enum RID2_BIT { + RID2_ALL= 0x8003, Likewise. +}; + +/* TIE + * R-Car Gen3 only + */ +enum TIE_BIT { [...] + TIE_ALL = 0x000f0f0f, Likewise. +}; + +/* TID + * R-Car Gen3 only + */ +enum TID_BIT { [...] + TID_ALL = 0x000f0f0f, Likewise. [...] diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index ac43ed9..4c4912e0 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -42,6 +42,16 @@ NETIF_MSG_RX_ERR | \ NETIF_MSG_TX_ERR) +static const char *ravb_rx_irqs[NUM_RX_QUEUE] = { + "ch0", /* RAVB_BE */ + "ch1", /* RAVB_NC */ +}; + +static const char *ravb_tx_irqs[NUM_TX_QUEUE] = { + "ch18", /* RAVB_BE */ + "ch19", /* RAVB_NC */ +}; + Do what you wish but I don't like these... [...] @@ -693,7 +722,7 @@ static void ravb_error_interrupt(struct net_device *ndev) if (ris2 & RIS2_QFF0) priv->stats[RAVB_BE].rx_over_errors++; - /* Receive Descriptor Empty int */ + /* Receive Descriptor Empty int */ It's not even a "drove-by" change, please don't do this here, submit a separate patch (or I can do it). [...] @@ -773,6 +829,55 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id) return result; } +static irqreturn_t ravb_dmaq_interrupt(int irq, void *dev_id, int ravb_queue) +{ + struct net_device *ndev = dev_id; + struct ravb_private *priv = netdev_priv(ndev); + irqreturn_t result = IRQ_NONE; + u32 ris0, ric0, tis, tic; + int q = ravb_queue; + + spin_lock(>lock); + + ris0 = ravb_read(ndev, RIS0); + ric0 = ravb_read(ndev, RIC0); + tis = ravb_read(ndev, TIS); + tic = ravb_read(ndev, TIC); + + /* Timestamp updated */ + if (tis & TIS_TFUF) { + ravb_write(ndev, TID_TFUD, TID); + ravb_get_tx_tstamp(ndev); + result = IRQ_HANDLED; + } + + /* Best effort queue RX/TX */ + if (((ris0 & ric0) & BIT(q)) || + ((tis & tic) & BIT(q))) { + if (napi_schedule_prep(>napi[q])) { + /* Mask RX and TX interrupts */ + ravb_write(ndev, BIT(q), RID0); + ravb_write(ndev, BIT(q), TID); + __napi_schedule(>napi[q]); + } There was an *else* branch originally (napi_schedule_prep()'s failure is a serious issue deserving an explanation), not sure why you dropped it here... [...] @@ -1215,29 +1325,63 @@ static const struct
Re: struct pid memory leak
On Sat, Jan 23, 2016 at 6:38 PM, Willy Tarreauwrote: > On Sun, Jan 24, 2016 at 03:11:45AM +0100, Willy Tarreau wrote: >> It doesn't report this on 3.10. > > To be more precise, kmemleak reports the issue on 3.13 and not on 3.12. > I'm not sure if it's reliable enough to run a bisect though. > > Willy > I have the leak on linux-3.11. I believe even linux-3.3 gets the leak, although I had to wait about one hour to be confident the leak was there.
[PATCH] net: fec: make driver endian-safe
The driver treats the device descriptors as CPU-endian, which is probably right on both ARM (little-endian) and PowerPC (big-endian) but wrong on big-endian ARM. Add the correct annotations and byteswaps. This gets it working on i.MX6 hummingboard booted in big-endian mode. Signed-off-by: Johannes Berg--- drivers/net/ethernet/freescale/Makefile | 2 + drivers/net/ethernet/freescale/fec.h | 39 ++--- drivers/net/ethernet/freescale/fec_main.c | 130 -- 3 files changed, 99 insertions(+), 72 deletions(-) diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile index 71debd1c18c9..64ddc0bd2735 100644 --- a/drivers/net/ethernet/freescale/Makefile +++ b/drivers/net/ethernet/freescale/Makefile @@ -4,6 +4,8 @@ obj-$(CONFIG_FEC) += fec.o fec-objs :=fec_main.o fec_ptp.o +CFLAGS_fec_main.o := -D__CHECK_ENDIAN__ + obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx.o ifeq ($(CONFIG_FEC_MPC52xx_MDIO),y) obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx_phy.o diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 99d33e2d35e6..801dcc7fa6e6 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -191,27 +191,44 @@ /* * Define the buffer descriptor structure. */ +/* buffer endianness appears to be a mess ... ARM is usually LE but can be BE */ +#if defined(CONFIG_ARM) && defined(CONFIG_CPU_BIG_ENDIAN) +#define fec32_to_cpu le32_to_cpu +#define fec16_to_cpu le16_to_cpu +#define cpu_to_fec32 cpu_to_le32 +#define cpu_to_fec16 cpu_to_le16 +#define __fec32 __le32 +#define __fec16 __le16 +#else +#define fec32_to_cpu be32_to_cpu +#define fec16_to_cpu be16_to_cpu +#define cpu_to_fec32 cpu_to_be32 +#define cpu_to_fec16 cpu_to_be16 +#define __fec32 __be32 +#define __fec16 __be16 +#endif + #if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28) struct bufdesc { - unsigned short cbd_datlen; /* Data length */ - unsigned short cbd_sc; /* Control and status info */ - unsigned long cbd_bufaddr; /* Buffer address */ + __fec16 cbd_datlen; /* Data length */ + __fec16 cbd_sc; /* Control and status info */ + __fec32 cbd_bufaddr;/* Buffer address */ }; #else struct bufdesc { - unsigned short cbd_sc; /* Control and status info */ - unsigned short cbd_datlen; /* Data length */ - unsigned long cbd_bufaddr;/* Buffer address */ + __fec16 cbd_sc; /* Control and status info */ + __fec16 cbd_datlen; /* Data length */ + __fec32 cbd_bufaddr;/* Buffer address */ }; #endif struct bufdesc_ex { struct bufdesc desc; - unsigned long cbd_esc; - unsigned long cbd_prot; - unsigned long cbd_bdu; - unsigned long ts; - unsigned short res0[4]; + __fec32 cbd_esc; + __fec32 cbd_prot; + __fec32 cbd_bdu; + __fec32 ts; + __fec16 res0[4]; }; /* diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index d2328fc5da57..8e81c4de1f41 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -331,11 +331,13 @@ static void fec_dump(struct net_device *ndev) bdp = txq->tx_bd_base; do { - pr_info("%3u %c%c 0x%04x 0x%08lx %4u %p\n", + pr_info("%3u %c%c 0x%04x 0x%08x %4u %p\n", index, bdp == txq->cur_tx ? 'S' : ' ', bdp == txq->dirty_tx ? 'H' : ' ', - bdp->cbd_sc, bdp->cbd_bufaddr, bdp->cbd_datlen, + fec16_to_cpu(bdp->cbd_sc), + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), txq->tx_skbuff[index]); bdp = fec_enet_get_nextdesc(bdp, fep, 0); index++; @@ -388,7 +390,7 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, bdp = fec_enet_get_nextdesc(bdp, fep, queue); ebdp = (struct bufdesc_ex *)bdp; - status = bdp->cbd_sc; + status = fec16_to_cpu(bdp->cbd_sc); status &= ~BD_ENET_TX_STATS; status |= (BD_ENET_TX_TC | BD_ENET_TX_READY); frag_len = skb_shinfo(skb)->frags[frag].size; @@ -410,7 +412,7 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, if (skb->ip_summed == CHECKSUM_PARTIAL) estatus |= BD_ENET_TX_PINS | BD_ENET_TX_IINS; ebdp->cbd_bdu = 0; - ebdp->cbd_esc = estatus; + ebdp->cbd_esc = cpu_to_fec32(estatus); } bufaddr = page_address(this_frag->page.p) + this_frag->page_offset; @@ -434,9 +436,9
Re: [PATCH] net: dsa: fix mv88e6xxx switches
Hi Russell, Russell King - ARM Linuxwrites: > On Sat, Jan 23, 2016 at 11:23:26PM +0100, Andrew Lunn wrote: >> On Sat, Jan 23, 2016 at 11:12:32PM +0100, Andrew Lunn wrote: >> > Hi Russell >> > >> > Never called. >> > >> > So i guess we have a kernel configuration difference. >> > >> > I don't have CONFIG_BRIDGE_VLAN_FILTERING. >> >> Yep, confirmed. When i enable this option, and VLAN_8021Q which it >> depends on, i then have a working bridge. > > It sounds like Vivien urgently needs to do some extra work to fix > the patches that created all this broken-ness, or we need to ask > davem to revert back to a known good working state. Indeed hardware bridging currently relies on 802.1Q and VLAN filtering enabled. I will work on this asap. Thanks, -v
Re: net: fec: make driver endian-safe
On Sat, 2016-01-23 at 23:09 +0100, Johannes Berg wrote: > > +/* buffer endianness appears to be a mess ... ARM is usually LE but > can be BE */ > +#if defined(CONFIG_ARM) && defined(CONFIG_CPU_BIG_ENDIAN) Just realized that this is, of course, completely wrong. If the ARM CPU is little endian, the device is of course *still* little endian, it doesn't magically change itself with the CONFIG_CPU_BIG_ENDIAN setting... :) Ergo, the #if part should be taken, rather than the #else part, to still define the descriptors as __le and just have no byteswapping done. Also, as Arnd had pointed out last night but I wasn't coherent enough to fully understand, it's exceedingly likely that > #if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28) this needs to be changed and that the device is simply "generated" as little-endian for ARM SoCs. He also pointed out that there could be multiple CONFIG_SOC or CONFIG_ARCH definitions (if I understood correctly), so this would be wrong anyway. It's tempting to combine both ifdefs and change them to just CONFIG_ARM but I have no way of verifying that on anything other than i.MX6 (on the hummingboard), nor do I even know which SoCs ship with this block. johannes