Re: 4.19.4 nf_conntrack_count kernel panic
On 2018-11-26 21:46, Sami Farin wrote: 4.18.20 works OK, but unfortunately 4.18 series is EOL. I have Ryzen 1600X, 32 GB RAM, Fedora 28, gcc-8.2.1-5, nosmt=force, igb module for Intel I211, using XFS filesystems only. To reproduce, I only do this: connect to VPN using a tunnel (e.g. tun0), start downloading a file with qbittorrent (allow port for incoming TCP connections in qbittorrent and iptables) and wait a couple of minutes. I am also using ipset and connlimit modules. I reproduced this bug three times. With 4.18 I use fq+htb and with 4.19 I use CAKE for traffic control. Only this message in kernel log: [ 363.935074] TCP: request_sock_TCP: Possible SYN flooding on port 19044. Dropping request. Check SNMP counters. I get this message with both 4.18.20 and 4.19.4. RIP: 0010:rb_insert_color+0x64 Call Trace: nf_conntrack_count [nf_conncount] ip_set_test [ip_set] connlimit_mt [xt_connlimit] set_match_v4 [xt_set] ipt_do_table [ip_tables] ip_route_input_noref nf_hook_slow ip_local_deliver inet_add_protocol ip_rcv ip_rcv_finish_core __netif_receive_skb_one_core netif_receive_skb_internal tun_rx_batched tun_get_user __local_bh_enable_ip tun_get_user tun_chr_write_iter __vfs_write vfs_write ksys_write do_syscall_64 trace_hardirqs_off_thunk entry_SYSCALL_64_after_hwframe ... Kernel panic - not syncing: Fatal exception in interrupt Check this patches: https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=* Relevant discussion: https://marc.info/?l=linux-netdev&m=154211826106430&w=2
4.15.13 kernel panic, ip_rcv_finish, nf_xfrm_me_harder warnings continue to fill dmesg
Apr 11 18:01:34[99194.935520] general protection fault: [#1] SMP Apr 11 18:01:34[99194.935998] Modules linked in: pppoe pppox ppp_generic slhc ip_set_hash_net xt_nat xt_string xt_connmark xt_TCPMSS xt_mark xt_CT xt_set xt_tcpudp ip_set_bitmap_port ip_set nfnetlink iptable_raw iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables netconsole configfs 8021q garp mrp stp llc ixgbe dca ipv6 Apr 11 18:01:34[99194.938313] CPU: 23 PID: 150 Comm: ksoftirqd/23 Tainted: GW4.15.13-build-0135 #4 Apr 11 18:01:34[99194.939258] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014 Apr 11 18:01:34[99194.940189] RIP: 0010:ip_rcv_finish+0x2b5/0x2e5 Apr 11 18:01:34[99194.940716] RSP: 0018:c9000cad7cf8 EFLAGS: 00010286 Apr 11 18:01:34[99194.941214] RAX: 00e2476d RBX: 88178f944400 RCX: 88179a32a800 Apr 11 18:01:34[99194.941741] RDX: 88178f944400 RSI: RDI: 88178f944400 Apr 11 18:01:34[99194.942234] RBP: 882fd580d000 R08: 0001 R09: 882fd034ee00 Apr 11 18:01:34[99194.942771] R10: c9000cad7b58 R11: e92316b9 R12: 88179a32a8d6 Apr 11 18:01:34[99194.943286] R13: 882fd580d000 R14: 00ea0008 R15: 882fd580d078 Apr 11 18:01:34[99194.943821] FS: () GS:88303fcc() knlGS: Apr 11 18:01:34[99194.944779] CS: 0010 DS: ES: CR0: 80050033 Apr 11 18:01:34[99194.945287] CR2: 7f8bb37888f0 CR3: 00303e209003 CR4: 001606e0 Apr 11 18:01:34[99194.945808] Call Trace: Apr 11 18:01:34[99194.946307] ip_rcv+0x2f2/0x325 Apr 11 18:01:34[99194.946816] ? ip_local_deliver_finish+0x187/0x187 Apr 11 18:01:34[99194.947331] __netif_receive_skb_core+0x81c/0x89c Apr 11 18:01:34[99194.947872] ? napi_complete_done+0xb4/0xba Apr 11 18:01:34[99194.948391] ? ixgbe_poll+0xf96/0x104d [ixgbe] Apr 11 18:01:34[99194.948931] ? process_backlog+0x8b/0x10d Apr 11 18:01:34[99194.949424] process_backlog+0x8b/0x10d Apr 11 18:01:34[99194.949953] net_rx_action+0x127/0x2b5 Apr 11 18:01:34[99194.950445] __do_softirq+0xc1/0x1b1 Apr 11 18:01:34[99194.950951] ? sort_range+0x17/0x17 Apr 11 18:01:34[99194.951442] run_ksoftirqd+0x11/0x22 Apr 11 18:01:34[99194.951972] smpboot_thread_fn+0x121/0x136 Apr 11 18:01:34[99194.952489] kthread+0xfd/0x105 Apr 11 18:01:34[99194.953018] ? kthread_create_on_node+0x3a/0x3a Apr 11 18:01:34[99194.953528] ret_from_fork+0x1f/0x30 Apr 11 18:01:34[99194.954047] Code: 15 77 9e 99 00 83 7a 7c 00 75 37 83 b8 2c 01 00 00 00 75 2e 48 8b 43 58 48 89 df 5b 5d 48 83 e0 fe 41 5c 41 5d 41 5e 48 8b 40 50 e0 83 f8 ee 75 10 49 8b 84 24 90 01 00 00 65 48 ff 80 40 02 Apr 11 18:01:34[99194.955449] RIP: ip_rcv_finish+0x2b5/0x2e5 RSP: c9000cad7cf8 Apr 11 18:01:34[99194.956008] ---[ end trace 312b0bf537b4709a ]--- Apr 11 18:01:34[99195.007900] Kernel panic - not syncing: Fatal exception in interrupt Apr 11 18:01:34[99195.008400] Kernel Offset: disabled Apr 11 18:01:34[99195.013950] Rebooting in 5 seconds.. -- and i reported before about warnings in nf_frm_me_harder, but probably nobody have interest to take a look, and it is seems plaguing 4.15.x and nearby versions kernels . Here is one of such warnings. --- Apr 11 00:00:17[34320.802349] dst_release: dst:b32dca17 refcnt:-2 Apr 11 00:00:19[34323.018468] WARNING: CPU: 7 PID: 0 at ./include/net/dst.h:256 nf_xfrm_me_harder+0x62/0xfe [nf_nat] Apr 11 00:00:19[34323.019357] Modules linked in: pppoe pppox ppp_generic slhc ip_set_hash_net xt_nat xt_string xt_connmark xt_TCPMSS xt_mark xt_CT xt_set xt_tcpudp ip_set_bitmap_port ip_set nfnetlink iptable_raw iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables netconsole configfs 8021q garp mrp stp llc ixgbe dca ipv6 Apr 11 00:00:19[34323.021503] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W4.15.13-build-0135 #4 Apr 11 00:00:19[34323.022380] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014 Apr 11 00:00:19[34323.023261] RIP: 0010:nf_xfrm_me_harder+0x62/0xfe [nf_nat] Apr 11 00:00:19[34323.023737] RSP: 0018:88303fa43c90 EFLAGS: 00010246 Apr 11 00:00:19[34323.024218] RAX: RBX: 8817b2c35200 RCX: Apr 11 00:00:19[34323.024703] RDX: 0002 RSI: 88178fab3700 RDI: 88303fa43cd0 Apr 11 00:00:19[34323.025214] RBP: 822a6180 R08: 0005 R09: 0001 Apr 11 00:00:19[34323.025717] R10: 00d6 R11: 8817c945bca0 R12: 0001 Apr 11 00:00:19[34323.026214] R13: 88303fa43d60 R14: 00ce0008 R15: 8817b7477078 Apr 11 00:00:19[34323.026736] FS: () GS:88303fa4() knlGS: Apr 11 00:00:19[34323.027680] CS
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-03-02 19:43, Guillaume Nault wrote: On Thu, Mar 01, 2018 at 10:07:05PM +0200, Denys Fedoryshchenko wrote: On 2018-03-01 22:01, Guillaume Nault wrote: > diff --git a/drivers/net/ppp/ppp_generic.c > b/drivers/net/ppp/ppp_generic.c > index 255a5def56e9..2acf4b0eabd1 100644 > --- a/drivers/net/ppp/ppp_generic.c > +++ b/drivers/net/ppp/ppp_generic.c > @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int > unit) >goto outl; > >ppp_lock(ppp); > + spin_lock_bh(&pch->downl); > + if (!pch->chan) { > + /* Don't connect unregistered channels */ > + ppp_unlock(ppp); > + spin_unlock_bh(&pch->downl); This is obviously wrong. It should have been + spin_unlock_bh(&pch->downl); + ppp_unlock(ppp); Sorry, I shouldn't have hurried. This is fixed in the official version. > + ret = -ENOTCONN; > + goto outl; > + } > + spin_unlock_bh(&pch->downl); >if (pch->file.hdrlen > ppp->file.hdrlen) >ppp->file.hdrlen = pch->file.hdrlen; >hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */ Ok, i will try to test that at night. Thanks a lot! For me also problem solved anyway by removing unit-cache, just i think it's nice to have bug fixed :) I think this bug has been there forever, indeed it's good to have it fixed. Thanks a lot for your help (and patience!). FYI, if you see accel-ppp logs like "ioctl(PPPIOCCONNECT): Transport endpoint is not connected", then that means the patch prevented the scenario that was leading to the original crash. Out of curiosity, did unit-cache really bring performance improvements on your workload? On old kernels it definitely did, due local specifics (electricity outages) i might have few thousands of interfaces deleted and created again in short period of time. And before interfaces creation/deletion (especially when there is thousands of them) was very expensive.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-03-01 22:01, Guillaume Nault wrote: On Tue, Feb 27, 2018 at 07:56:27PM +0100, Guillaume Nault wrote: On Tue, Feb 27, 2018 at 12:58:55PM +0200, Denys Fedoryshchenko wrote: > On 2018-02-23 12:07, Guillaume Nault wrote: > > On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote: > > > On 2018-02-23 11:38, Guillaume Nault wrote: > > > > On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote: > > > > > I'm using accel-ppp that has unit-cache option, i guess for > > > > > "reusing" ppp > > > > > interfaces (because creating a lot of interfaces on BRAS with 8k > > > > > users quite > > > > > expensive). > > > > > Maybe it is somehow related and can be that scenario causing this bug? > > > > > > > > > Indeed, it'd be interesting to know if unit-cache is part of the > > > > equation (if it's workable for you to disable it). > > > Already did that and testing, unfortunately i had to disable KASAN > > > and full > > > refcount, as performance hit is too heavy for me. I will try to > > > enable KASAN > > > alone tomorrow. > > > > > Don't hesitate to post the result even if you can't afford enabling > > KASAN. > Till now 4 days and no reboots. > That unit-cache information was very useful. I can now reproduce the issue and work on a fix. You can try the following patch. Sorry for the delay, I'm a bit out of time these days. diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 255a5def56e9..2acf4b0eabd1 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int unit) goto outl; ppp_lock(ppp); + spin_lock_bh(&pch->downl); + if (!pch->chan) { + /* Don't connect unregistered channels */ + ppp_unlock(ppp); + spin_unlock_bh(&pch->downl); + ret = -ENOTCONN; + goto outl; + } + spin_unlock_bh(&pch->downl); if (pch->file.hdrlen > ppp->file.hdrlen) ppp->file.hdrlen = pch->file.hdrlen; hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */ Ok, i will try to test that at night. Thanks a lot! For me also problem solved anyway by removing unit-cache, just i think it's nice to have bug fixed :)
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-23 12:07, Guillaume Nault wrote: On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote: On 2018-02-23 11:38, Guillaume Nault wrote: > On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote: > > I'm using accel-ppp that has unit-cache option, i guess for > > "reusing" ppp > > interfaces (because creating a lot of interfaces on BRAS with 8k > > users quite > > expensive). > > Maybe it is somehow related and can be that scenario causing this bug? > > > Indeed, it'd be interesting to know if unit-cache is part of the > equation (if it's workable for you to disable it). Already did that and testing, unfortunately i had to disable KASAN and full refcount, as performance hit is too heavy for me. I will try to enable KASAN alone tomorrow. Don't hesitate to post the result even if you can't afford enabling KASAN. Till now 4 days and no reboots.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-23 12:07, Guillaume Nault wrote: On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote: On 2018-02-23 11:38, Guillaume Nault wrote: > On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote: > > I'm using accel-ppp that has unit-cache option, i guess for > > "reusing" ppp > > interfaces (because creating a lot of interfaces on BRAS with 8k > > users quite > > expensive). > > Maybe it is somehow related and can be that scenario causing this bug? > > > Indeed, it'd be interesting to know if unit-cache is part of the > equation (if it's workable for you to disable it). Already did that and testing, unfortunately i had to disable KASAN and full refcount, as performance hit is too heavy for me. I will try to enable KASAN alone tomorrow. Don't hesitate to post the result even if you can't afford enabling KASAN. Very likely unit-cache is major contributor to this reboots. After disabling it, it is almost 48h and no reboots yet.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-23 12:07, Guillaume Nault wrote: On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote: On 2018-02-23 11:38, Guillaume Nault wrote: > On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote: > > I'm using accel-ppp that has unit-cache option, i guess for > > "reusing" ppp > > interfaces (because creating a lot of interfaces on BRAS with 8k > > users quite > > expensive). > > Maybe it is somehow related and can be that scenario causing this bug? > > > Indeed, it'd be interesting to know if unit-cache is part of the > equation (if it's workable for you to disable it). Already did that and testing, unfortunately i had to disable KASAN and full refcount, as performance hit is too heavy for me. I will try to enable KASAN alone tomorrow. Don't hesitate to post the result even if you can't afford enabling KASAN. For sure, i am expecting it to crash even if KASAN not enabled (just i wont have clean message what is reason). Usually it happened for me within 6-10 hours after upgrade at night, when load started to increase, i prefer to wait 48h at least, even if no crash.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-23 11:38, Guillaume Nault wrote: On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote: I'm using accel-ppp that has unit-cache option, i guess for "reusing" ppp interfaces (because creating a lot of interfaces on BRAS with 8k users quite expensive). Maybe it is somehow related and can be that scenario causing this bug? Indeed, it'd be interesting to know if unit-cache is part of the equation (if it's workable for you to disable it). Already did that and testing, unfortunately i had to disable KASAN and full refcount, as performance hit is too heavy for me. I will try to enable KASAN alone tomorrow.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-22 20:30, Guillaume Nault wrote: On Wed, Feb 21, 2018 at 12:04:30PM -0800, Cong Wang wrote: On Thu, Feb 15, 2018 at 11:31 AM, Guillaume Nault wrote: > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: >> On 2018-02-15 17:55, Guillaume Nault wrote: >> > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: >> > > Here we go: >> > > >> > > [24558.921549] >> > > == >> > > [24558.922167] BUG: KASAN: use-after-free in >> > > ppp_ioctl+0xa6a/0x1522 >> > > [ppp_generic] >> > > [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task >> > > accel-pppd/12622 >> > > [24558.923113] >> > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G >> > > W >> > > 4.15.3-build-0134 #1 >> > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, >> > > BIOS P80 >> > > 04/02/2015 >> > > [24558.924406] Call Trace: >> > > [24558.924753] dump_stack+0x46/0x59 >> > > [24558.925103] print_address_description+0x6b/0x23b >> > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] >> > > [24558.925797] kasan_report+0x21b/0x241 >> > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] >> > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] >> > > [24558.926829] ? sock_sendmsg+0x89/0x99 >> > > [24558.927176] ? __vfs_write+0xd9/0x4ad >> > > [24558.927523] ? kernel_read+0xed/0xed >> > > [24558.927872] ? SyS_getpeername+0x18c/0x18c >> > > [24558.928213] ? bit_waitqueue+0x2a/0x2a >> > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 >> > > [24558.928898] vfs_ioctl+0x6e/0x81 >> > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 >> > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 >> > > [24558.929907] ? sigsuspend+0x13e/0x13e >> > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e >> > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 >> > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 >> > > [24558.931252] SyS_ioctl+0x39/0x55 >> > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 >> > > [24558.931942] do_syscall_64+0x1b1/0x31f >> > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> > > [24558.932627] RIP: 0033:0x7f302849d8a7 >> > > [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 >> > > ORIG_RAX: >> > > 0010 >> > > [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: >> > > 7f302849d8a7 >> > > [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: >> > > 3a67 >> > > [24558.934266] RBP: 7f3029a52b20 R08: R09: >> > > 55c8308d8e40 >> > > [24558.934607] R10: 0008 R11: 0206 R12: >> > > 7f3023f49358 >> > > [24558.934947] R13: 7ffe86e5723f R14: R15: >> > > 7f3029a53700 >> > > [24558.935288] >> > > [24558.935626] Allocated by task 12622: >> > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 >> > > [ppp_generic] >> > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] >> > > [24558.936640] SyS_connect+0x14b/0x1b7 >> > > [24558.936975] do_syscall_64+0x1b1/0x31f >> > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> > > [24558.937655] >> > > [24558.937993] Freed by task 12622: >> > > [24558.938321] kfree+0xb0/0x11d >> > > [24558.938658] ppp_release+0x111/0x120 [ppp_generic] >> > > [24558.938994] __fput+0x2ba/0x51a >> > > [24558.939332] task_work_run+0x11c/0x13d >> > > [24558.939676] exit_to_usermode_loop+0x7c/0xaf >> > > [24558.940022] do_syscall_64+0x2ea/0x31f >> > > [24558.940368] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> > > [24558.947099] >> > >> > Your first guess was right. It looks like we have an issue with >> > reference counting on the channels. Can you send me your ppp_generic.o? >> http://nuclearcat.com/ppp_generic.o >> Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3) >> > From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called > concurrently on the same ppp_file. Even if this pp
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-21 20:55, Guillaume Nault wrote: On Wed, Feb 21, 2018 at 12:26:51PM +0200, Denys Fedoryshchenko wrote: It seems even rebuilding seemingly stable version triggering crashes too (but different ones) Different ones? The trace following your message looks very similar to your first KASAN report. Or are you refering to the lockup you posted on Sun, 18 Feb 2018? Also, which stable versions are you refering to? Trace i sent in previous email - is latest kernel, vanilla, just more debug options and few options disabled. One of disabled was spitting some errors (it is obviously bug), CONFIG_XFRM, in nf_xfrm_me_harder (i reported about it). And i disabled namespaces, as they are often source of trouble. Today i will try to revert just: drivers, net, ppp: convert asyncppp.refcnt from atomic_t to refcount_t drivers, net, ppp: convert syncppp.refcnt from atomic_t to refcount_t drivers, net, ppp: convert ppp_file.refcnt from atomic_t to refcount_t Because i suspect previously, after reverting this patches i got different kernel panic (and i didn't noticed that, now too late to identify between other crashes), seems it was not KASAN. I will report results after testing, unfortunately i can't test it more than once per day. "Stable" for me was 4.14.2 - but it looks like on that kernel i am getting different issue now. I will paste it below. Another observation, just hour ago, i noticed on another server, where i am testing 4.15, and 4.14.20 (at moment of testing 4.14.20, but no debug at that moment), when i killed accel-pppd (pppoe server software), with 8k sessions online, i got some weird behaviour, accel-pppd process got stuck, same as ifconfig and "ip link", and even kexec -e didn't worked(got stuck too), unless i did kexec -e -x (so it wont try to make interfaces down on kexec). I will try to reproduce this bug as well, with debug enabled (lockdep and so) i hope it is not related. I'm interested in the ppp_generic.o file that produced the following trace. Just to be sure that the differences come from the new debugging options. Also kernel config: https://nuclearcat.com/bughunting/config.txt https://nuclearcat.com/bughunting/ppp_generic.o This is in 4.14.2, was seemingly stable before: [50401.388670] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 1 timed out [50401.389014] [ cut here ] [50401.389340] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x15c/0x1b9 [50401.389925] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca [50401.391869] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.2-build-0134 #4 [50401.392191] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [50401.392513] task: 880434d72640 task.stack: c90001914000 [50401.392836] RIP: 0010:dev_watchdog+0x15c/0x1b9 [50401.393155] RSP: 0018:8804364c3e90 EFLAGS: 00010286 [50401.393470] RAX: 0039 RBX: 88042f6e RCX: [50401.393787] RDX: 0001 RSI: 0002 RDI: 828dbc64 [50401.394103] RBP: 8804364c3eb0 R08: 0001 R09: [50401.394420] R10: 0002 R11: 8803fa075c00 R12: 0001 [50401.394739] R13: 0040 R14: 0003 R15: 81e05108 [50401.395064] FS: () GS:8804364c() knlGS: [50401.395645] CS: 0010 DS: ES: CR0: 80050033 [50401.395970] CR2: 7fff25fc20a8 CR3: 01e09005 CR4: 001606e0 [50401.396294] Call Trace: [50401.396613] [50401.396934] ? qdisc_rcu_free+0x3f/0x3f [50401.397255] call_timer_fn.isra.4+0x17/0x7b [50401.397576] expire_timers+0x6f/0x7e [50401.397899] run_timer_softirq+0x6d/0x8f [50401.398219] ? ktime_get+0x3b/0x8c [50401.398540] ? lapic_next_event+0x18/0x1c [50401.398862] ? clockevents_program_event+0xa3/0xbb [50401.399186] __do_softirq+0xbc/0x1ab [50401.399510] irq_exit+0x4d/0x8e [50401.399832] smp_apic_timer_interrupt+0x73/0x80 [50401.400157] apic_timer_interrupt+0x8d/0xa0 [50401.400480] [50401.400801] RIP: 0010:mwait_idle+0x4e/0x61 [50401.401123] RSP: 0018:c90001917ec0 EFLAGS: 0246 ORIG_RAX: ff10 [50401.401714] RAX: RBX: 880434d72640 RCX: [50401.402037] RDX: RSI: RDI: [50401.402362] RBP: c90001917ec0 R08: R09: 0001 [50401.402685] R10: c90001917e58 R11:
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
It seems even rebuilding seemingly stable version triggering crashes too (but different ones) Maybe it is coincidence, and bug reproducer appeared in network same time i decided to upgrade kernel, as it happened with xt_MSS(and that bug existed for years). Deleted quoting, i added more debug options (as much as performance degradation allows me). This is vanilla again: [14834.090421] == [14834.091157] BUG: KASAN: use-after-free in __list_add_valid+0x69/0xad [14834.091521] Read of size 8 at addr 8803dbeb8660 by task accel-pppd/12636 [14834.091905] [14834.092282] CPU: 0 PID: 12636 Comm: accel-pppd Not tainted 4.15.4-build-0134 #1 [14834.092930] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [14834.093320] Call Trace: [14834.093680] dump_stack+0xb3/0x13e [14834.094050] ? _atomic_dec_and_lock+0x10f/0x10f [14834.094434] print_address_description+0x69/0x236 [14834.094814] ? __list_add_valid+0x69/0xad [14834.095197] kasan_report+0x219/0x23f [14834.095570] __list_add_valid+0x69/0xad [14834.095957] ppp_ioctl+0x1216/0x2201 [ppp_generic] [14834.096348] ? ppp_write+0x1cc/0x1cc [ppp_generic] [14834.096723] ? get_usage_char.isra.2+0x36/0x36 [14834.097094] ? packet_poll+0x362/0x362 [14834.097455] ? lock_downgrade+0x4d0/0x4d0 [14834.097811] ? rcu_irq_enter_disabled+0x8/0x8 [14834.098187] ? get_usage_char.isra.2+0x36/0x36 [14834.098561] ? __fget+0x3b8/0x3eb [14834.098936] ? get_usage_char.isra.2+0x36/0x36 [14834.099309] ? __fget+0x3a0/0x3eb [14834.099682] ? get_usage_char.isra.2+0x36/0x36 [14834.100069] ? __fget+0x3a0/0x3eb [14834.100443] ? lock_downgrade+0x4d0/0x4d0 [14834.100814] ? rcu_irq_enter_disabled+0x8/0x8 [14834.101203] ? __fget+0x3b8/0x3eb [14834.101581] ? expand_files+0x62f/0x62f [14834.101945] ? kernel_read+0xed/0xed [14834.102322] ? SyS_getpeername+0x28b/0x28b [14834.102690] vfs_ioctl+0x6e/0x81 [14834.103049] do_vfs_ioctl+0xe2f/0xe62 [14834.103413] ? ioctl_preallocate+0x211/0x211 [14834.103778] ? __fget_light+0x28c/0x2ca [14834.104150] ? iterate_fd+0x2a8/0x2a8 [14834.104526] ? SyS_rt_sigprocmask+0x12e/0x181 [14834.104876] ? sigprocmask+0x23f/0x23f [14834.105231] ? SyS_write+0x148/0x173 [14834.105580] ? SyS_read+0x173/0x173 [14834.105943] SyS_ioctl+0x39/0x55 [14834.106316] ? do_vfs_ioctl+0xe62/0xe62 [14834.106694] do_syscall_64+0x262/0x594 [14834.107076] ? syscall_return_slowpath+0x351/0x351 [14834.107447] ? up_read+0x17/0x2c [14834.107806] ? __do_page_fault+0x68a/0x763 [14834.108171] ? entry_SYSCALL_64_after_hwframe+0x36/0x9b [14834.108550] ? trace_hardirqs_off_thunk+0x1a/0x1c [14834.108937] entry_SYSCALL_64_after_hwframe+0x26/0x9b [14834.109293] RIP: 0033:0x7fc9be3758a7 [14834.109652] RSP: 002b:7fc9bf92aaf8 EFLAGS: 0206 ORIG_RAX: 0010 [14834.110313] RAX: ffda RBX: 7fc9bdc5e1e3 RCX: 7fc9be3758a7 [14834.110707] RDX: 7fc9b7ad13e8 RSI: 4004743a RDI: 4b9f [14834.111082] RBP: 7fc9bf92ab20 R08: R09: 55f07a27fe40 [14834.111471] R10: 0008 R11: 0206 R12: 7fc9b7ad12d8 [14834.111845] R13: 7ffd06346a6f R14: R15: 7fc9bf92b700 [14834.112231] [14834.112589] Allocated by task 12636: [14834.112962] ppp_register_net_channel+0xc4/0x610 [ppp_generic] [14834.113331] pppoe_connect+0xe6d/0x1097 [pppoe] [14834.113691] SyS_connect+0x19c/0x274 [14834.114054] do_syscall_64+0x262/0x594 [14834.114421] entry_SYSCALL_64_after_hwframe+0x26/0x9b [14834.114792] [14834.115139] Freed by task 12636: [14834.115504] kfree+0xe2/0x154 [14834.115866] ppp_release+0x11b/0x12a [ppp_generic] [14834.116240] __fput+0x342/0x5ba [14834.116611] task_work_run+0x15d/0x198 [14834.116973] exit_to_usermode_loop+0xc7/0x153 [14834.117320] do_syscall_64+0x53d/0x594 [14834.117694] entry_SYSCALL_64_after_hwframe+0x26/0x9b [14834.118067] [14834.118426] The buggy address belongs to the object at 8803dbeb8480 [14834.119087] The buggy address is located 480 bytes inside of [14834.119755] The buggy address belongs to the page: [14834.120138] page:ea000f6fae00 count:1 mapcount:0 mapping: (null) index:0x8803dbebd580 compound_mapcount: 0 [14834.120817] flags: 0x17ffe0008100(slab|head) [14834.121171] raw: 17ffe0008100 8803dbebd580 0001001c001b [14834.121800] raw: ea000d718020 ea000d32d620 8803f080ee80 [14834.122415] page dumped because: kasan: bad access detected [14834.122787] [14834.123140] Memory state around the buggy address: [14834.123503] 8803dbeb8500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [14834.124150] 8803dbeb8580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [14834.124806] >8803dbeb8600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [14834.125467]^ [14834.125848] 8803dbeb8680: fb fb fb fb fb
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-16 20:48, Guillaume Nault wrote: On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote: On 2018-02-15 21:42, Guillaume Nault wrote: > On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote: > > On 2018-02-15 21:31, Guillaume Nault wrote: > > > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: > > > > On 2018-02-15 17:55, Guillaume Nault wrote: > > > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: > > > > > > Here we go: > > > > > > > > > > > > [24558.921549] > > > > > > == > > > > > > [24558.922167] BUG: KASAN: use-after-free in > > > > > > ppp_ioctl+0xa6a/0x1522 > > > > > > [ppp_generic] > > > > > > [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task > > > > > > accel-pppd/12622 > > > > > > [24558.923113] > > > > > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G > > > > > > W > > > > > > 4.15.3-build-0134 #1 > > > > > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, > > > > > > BIOS P80 > > > > > > 04/02/2015 > > > > > > [24558.924406] Call Trace: > > > > > > [24558.924753] dump_stack+0x46/0x59 > > > > > > [24558.925103] print_address_description+0x6b/0x23b > > > > > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > > > [24558.925797] kasan_report+0x21b/0x241 > > > > > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] > > > > > > [24558.926829] ? sock_sendmsg+0x89/0x99 > > > > > > [24558.927176] ? __vfs_write+0xd9/0x4ad > > > > > > [24558.927523] ? kernel_read+0xed/0xed > > > > > > [24558.927872] ? SyS_getpeername+0x18c/0x18c > > > > > > [24558.928213] ? bit_waitqueue+0x2a/0x2a > > > > > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 > > > > > > [24558.928898] vfs_ioctl+0x6e/0x81 > > > > > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 > > > > > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 > > > > > > [24558.929907] ? sigsuspend+0x13e/0x13e > > > > > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e > > > > > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 > > > > > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 > > > > > > [24558.931252] SyS_ioctl+0x39/0x55 > > > > > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 > > > > > > [24558.931942] do_syscall_64+0x1b1/0x31f > > > > > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > > > [24558.932627] RIP: 0033:0x7f302849d8a7 > > > > > > [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 > > > > > > ORIG_RAX: > > > > > > 0010 > > > > > > [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: > > > > > > 7f302849d8a7 > > > > > > [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: > > > > > > 3a67 > > > > > > [24558.934266] RBP: 7f3029a52b20 R08: R09: > > > > > > 55c8308d8e40 > > > > > > [24558.934607] R10: 0008 R11: 0206 R12: > > > > > > 7f3023f49358 > > > > > > [24558.934947] R13: 7ffe86e5723f R14: R15: > > > > > > 7f3029a53700 > > > > > > [24558.935288] > > > > > > [24558.935626] Allocated by task 12622: > > > > > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 > > > > > > [ppp_generic] > > > > > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] > > > > > > [24558.936640] SyS_connect+0x14b/0x1b7 > > > > > > [24558.936975] do_syscall_64+0x1b1/0x31f > > > > > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > > > [24558.937655] > > > > > > [24558.937993] Freed by task 12622: > > > > > > [24558.938321] kfree+
a lot of WARNING, nf_xfrm_me_harder in 4.15.x
Is there any bug with that or it is just some sort of spam? Cause i am troubleshooting at same time "hard to catch" bug in ppp/pppoe Workload: pppoe bras I am going to try last stable 4.14.x after 1-2 days as well, but probably i noticed this message appeared there as well, under some conditions. [ 49.784216] WARNING: CPU: 4 PID: 0 at ./include/net/dst.h:256 nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat] [ 49.784847] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca [ 49.786762] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.15.4-build-0134 #2 [ 49.787104] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [ 49.787448] RIP: 0010:nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat] [ 49.787782] RSP: 0018:8803f23078e0 EFLAGS: 00010246 [ 49.788114] RAX: RBX: 8803d8acad00 RCX: 11007a875b00 [ 49.788463] RDX: 11007b159500 RSI: RDI: 8803d8acad48 [ 49.788818] RBP: 8803d43ada40 R08: ed007e460f27 R09: 8803f2307900 [ 49.789175] R10: ed007e460f26 R11: 0001 R12: 11007e460f1c [ 49.789528] R13: 8803d43ada98 R14: 83e2b600 R15: 8803d8acad80 [ 49.789881] FS: () GS:8803f230() knlGS: [ 49.790500] CS: 0010 DS: ES: CR0: 80050033 [ 49.790850] CR2: 7f758e3aa490 CR3: 000445a0d001 CR4: 001606e0 [ 49.791192] Call Trace: [ 49.791517] [ 49.791845] ? __nf_nat_decode_session+0x108/0x108 [nf_nat] [ 49.792180] ? nf_nat_ipv4_fn+0x33d/0x4df [nf_nat_ipv4] [ 49.792515] ? iptable_nat_ipv4_fn+0xc/0xc [iptable_nat] [ 49.792849] nf_nat_ipv4_out+0x235/0x305 [nf_nat_ipv4] [ 49.793183] ? iptable_nat_ipv4_local_fn+0xc/0xc [iptable_nat] [ 49.793519] nf_hook_slow+0xb1/0x11b [ 49.793850] ip_output+0x205/0x243 [ 49.794180] ? ip_mc_output+0x548/0x548 [ 49.794508] ? ip_fragment.constprop.5+0x197/0x197 [ 49.794841] ? iptable_filter_net_init+0x1a/0x1a [iptable_filter] [ 49.795173] ? nf_hook_slow+0xb1/0x11b [ 49.795504] ip_forward+0xe9c/0xecb [ 49.795836] ? ip_forward_finish+0x110/0x110 [ 49.796166] ? ip_frag_mem+0x3d/0x3d [ 49.796493] ? ip_rcv_finish+0xcf8/0xd91 [ 49.796830] ip_rcv+0x985/0xa12 [ 49.797178] ? ip_local_deliver+0x225/0x225 [ 49.797536] ? ip_local_deliver_finish+0x599/0x599 [ 49.797893] ? ip_local_deliver+0x225/0x225 [ 49.798254] __netif_receive_skb_core+0x10ce/0x1c76 [ 49.798613] ? netif_set_xps_queue+0xbdb/0xbdb [ 49.798972] ? process_backlog+0x1c5/0x3c0 [ 49.799323] process_backlog+0x1c5/0x3c0 [ 49.799674] net_rx_action+0x3aa/0x840 [ 49.800026] ? napi_complete_done+0x22b/0x22b [ 49.800378] ? __tick_nohz_idle_enter+0x42b/0x9b3 [ 49.800733] ? get_cpu_iowait_time_us+0x16f/0x16f [ 49.801084] __do_softirq+0x17f/0x34a [ 49.801411] ? flush_smp_call_function_queue+0x16a/0x229 [ 49.801750] irq_exit+0x8f/0xf9 [ 49.802080] call_function_single_interrupt+0x92/0xa0 [ 49.802420] [ 49.802765] RIP: 0010:mwait_idle+0x99/0xac [ 49.803106] RSP: 0018:8803f0317ef8 EFLAGS: 0246 ORIG_RAX: ff04 [ 49.803709] RAX: RBX: 8803f02e4240 RCX: [ 49.804042] RDX: 11007e05c848 RSI: RDI: [ 49.804372] RBP: 8803f02e4240 R08: 55574f086bb0 R09: 7f78bd996700 [ 49.804705] R10: 8803f0317dd0 R11: 0293 R12: [ 49.805038] R13: dc00 R14: ed007e05c848 R15: 8803f02e4240 [ 49.805373] do_idle+0xe6/0x19a [ 49.805700] cpu_startup_entry+0x18/0x1a [ 49.806033] secondary_startup_64+0xa5/0xb0 [ 49.806359] Code: e0 07 83 c0 03 38 d0 7c 0c 84 d2 74 08 4c 89 ff e8 65 3f 26 e1 8b 83 80 00 00 00 85 c0 74 0c 8d 50 01 f0 41 0f b1 17 74 04 eb f0 <0f> ff 48 8d 7d 18 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 [ 49.807242] ---[ end trace 2654a347942730c3 ]--- [ 49.807580] dst_release: dst:24366567 refcnt:-1 [ 164.894058] WARNING: CPU: 5 PID: 22617 at ./include/net/dst.h:256 nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat] [ 164.894686] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_nat nf_conntrack_ipv4 nf_def
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-16 20:48, Guillaume Nault wrote: On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote: On 2018-02-15 21:42, Guillaume Nault wrote: > On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote: > > On 2018-02-15 21:31, Guillaume Nault wrote: > > > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: > > > > On 2018-02-15 17:55, Guillaume Nault wrote: > > > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: > > > > > > Here we go: > > > > > > > > > > > > [24558.921549] > > > > > > == > > > > > > [24558.922167] BUG: KASAN: use-after-free in > > > > > > ppp_ioctl+0xa6a/0x1522 > > > > > > [ppp_generic] > > > > > > [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task > > > > > > accel-pppd/12622 > > > > > > [24558.923113] > > > > > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G > > > > > > W > > > > > > 4.15.3-build-0134 #1 > > > > > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, > > > > > > BIOS P80 > > > > > > 04/02/2015 > > > > > > [24558.924406] Call Trace: > > > > > > [24558.924753] dump_stack+0x46/0x59 > > > > > > [24558.925103] print_address_description+0x6b/0x23b > > > > > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > > > [24558.925797] kasan_report+0x21b/0x241 > > > > > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] > > > > > > [24558.926829] ? sock_sendmsg+0x89/0x99 > > > > > > [24558.927176] ? __vfs_write+0xd9/0x4ad > > > > > > [24558.927523] ? kernel_read+0xed/0xed > > > > > > [24558.927872] ? SyS_getpeername+0x18c/0x18c > > > > > > [24558.928213] ? bit_waitqueue+0x2a/0x2a > > > > > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 > > > > > > [24558.928898] vfs_ioctl+0x6e/0x81 > > > > > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 > > > > > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 > > > > > > [24558.929907] ? sigsuspend+0x13e/0x13e > > > > > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e > > > > > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 > > > > > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 > > > > > > [24558.931252] SyS_ioctl+0x39/0x55 > > > > > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 > > > > > > [24558.931942] do_syscall_64+0x1b1/0x31f > > > > > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > > > [24558.932627] RIP: 0033:0x7f302849d8a7 > > > > > > [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 > > > > > > ORIG_RAX: > > > > > > 0010 > > > > > > [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: > > > > > > 7f302849d8a7 > > > > > > [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: > > > > > > 3a67 > > > > > > [24558.934266] RBP: 7f3029a52b20 R08: R09: > > > > > > 55c8308d8e40 > > > > > > [24558.934607] R10: 0008 R11: 0206 R12: > > > > > > 7f3023f49358 > > > > > > [24558.934947] R13: 7ffe86e5723f R14: R15: > > > > > > 7f3029a53700 > > > > > > [24558.935288] > > > > > > [24558.935626] Allocated by task 12622: > > > > > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 > > > > > > [ppp_generic] > > > > > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] > > > > > > [24558.936640] SyS_connect+0x14b/0x1b7 > > > > > > [24558.936975] do_syscall_64+0x1b1/0x31f > > > > > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > > > [24558.937655] > > > > > > [24558.937993] Freed by task 12622: > > > > > > [24558.938321] kfree+
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-15 21:42, Guillaume Nault wrote: On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote: On 2018-02-15 21:31, Guillaume Nault wrote: > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: > > On 2018-02-15 17:55, Guillaume Nault wrote: > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: > > > > Here we go: > > > > > > > > [24558.921549] > > > > == > > > > [24558.922167] BUG: KASAN: use-after-free in > > > > ppp_ioctl+0xa6a/0x1522 > > > > [ppp_generic] > > > > [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task > > > > accel-pppd/12622 > > > > [24558.923113] > > > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G > > > > W > > > > 4.15.3-build-0134 #1 > > > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, > > > > BIOS P80 > > > > 04/02/2015 > > > > [24558.924406] Call Trace: > > > > [24558.924753] dump_stack+0x46/0x59 > > > > [24558.925103] print_address_description+0x6b/0x23b > > > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > [24558.925797] kasan_report+0x21b/0x241 > > > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] > > > > [24558.926829] ? sock_sendmsg+0x89/0x99 > > > > [24558.927176] ? __vfs_write+0xd9/0x4ad > > > > [24558.927523] ? kernel_read+0xed/0xed > > > > [24558.927872] ? SyS_getpeername+0x18c/0x18c > > > > [24558.928213] ? bit_waitqueue+0x2a/0x2a > > > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 > > > > [24558.928898] vfs_ioctl+0x6e/0x81 > > > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 > > > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 > > > > [24558.929907] ? sigsuspend+0x13e/0x13e > > > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e > > > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 > > > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 > > > > [24558.931252] SyS_ioctl+0x39/0x55 > > > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 > > > > [24558.931942] do_syscall_64+0x1b1/0x31f > > > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > [24558.932627] RIP: 0033:0x7f302849d8a7 > > > > [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 > > > > ORIG_RAX: > > > > 0010 > > > > [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: > > > > 7f302849d8a7 > > > > [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: > > > > 3a67 > > > > [24558.934266] RBP: 7f3029a52b20 R08: R09: > > > > 55c8308d8e40 > > > > [24558.934607] R10: 0008 R11: 0206 R12: > > > > 7f3023f49358 > > > > [24558.934947] R13: 7ffe86e5723f R14: R15: > > > > 7f3029a53700 > > > > [24558.935288] > > > > [24558.935626] Allocated by task 12622: > > > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 > > > > [ppp_generic] > > > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] > > > > [24558.936640] SyS_connect+0x14b/0x1b7 > > > > [24558.936975] do_syscall_64+0x1b1/0x31f > > > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > [24558.937655] > > > > [24558.937993] Freed by task 12622: > > > > [24558.938321] kfree+0xb0/0x11d > > > > [24558.938658] ppp_release+0x111/0x120 [ppp_generic] > > > > [24558.938994] __fput+0x2ba/0x51a > > > > [24558.939332] task_work_run+0x11c/0x13d > > > > [24558.939676] exit_to_usermode_loop+0x7c/0xaf > > > > [24558.940022] do_syscall_64+0x2ea/0x31f > > > > [24558.940368] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > > > [24558.947099] > > > > > > Your first guess was right. It looks like we have an issue with > > > reference counting on the channels. Can you send me your ppp_generic.o? > > http://nuclearcat.com/ppp_generic.o > > Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3) > > > From what I can see, ppp_release() and ioct
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-15 21:31, Guillaume Nault wrote: On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: On 2018-02-15 17:55, Guillaume Nault wrote: > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: > > Here we go: > > > > [24558.921549] > > == > > [24558.922167] BUG: KASAN: use-after-free in > > ppp_ioctl+0xa6a/0x1522 > > [ppp_generic] > > [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task > > accel-pppd/12622 > > [24558.923113] > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G > > W > > 4.15.3-build-0134 #1 > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, > > BIOS P80 > > 04/02/2015 > > [24558.924406] Call Trace: > > [24558.924753] dump_stack+0x46/0x59 > > [24558.925103] print_address_description+0x6b/0x23b > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > [24558.925797] kasan_report+0x21b/0x241 > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] > > [24558.926829] ? sock_sendmsg+0x89/0x99 > > [24558.927176] ? __vfs_write+0xd9/0x4ad > > [24558.927523] ? kernel_read+0xed/0xed > > [24558.927872] ? SyS_getpeername+0x18c/0x18c > > [24558.928213] ? bit_waitqueue+0x2a/0x2a > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 > > [24558.928898] vfs_ioctl+0x6e/0x81 > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 > > [24558.929907] ? sigsuspend+0x13e/0x13e > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 > > [24558.931252] SyS_ioctl+0x39/0x55 > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 > > [24558.931942] do_syscall_64+0x1b1/0x31f > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > [24558.932627] RIP: 0033:0x7f302849d8a7 > > [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 > > ORIG_RAX: > > 0010 > > [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: > > 7f302849d8a7 > > [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: > > 3a67 > > [24558.934266] RBP: 7f3029a52b20 R08: R09: > > 55c8308d8e40 > > [24558.934607] R10: 0008 R11: 0206 R12: > > 7f3023f49358 > > [24558.934947] R13: 7ffe86e5723f R14: R15: > > 7f3029a53700 > > [24558.935288] > > [24558.935626] Allocated by task 12622: > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 > > [ppp_generic] > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] > > [24558.936640] SyS_connect+0x14b/0x1b7 > > [24558.936975] do_syscall_64+0x1b1/0x31f > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > [24558.937655] > > [24558.937993] Freed by task 12622: > > [24558.938321] kfree+0xb0/0x11d > > [24558.938658] ppp_release+0x111/0x120 [ppp_generic] > > [24558.938994] __fput+0x2ba/0x51a > > [24558.939332] task_work_run+0x11c/0x13d > > [24558.939676] exit_to_usermode_loop+0x7c/0xaf > > [24558.940022] do_syscall_64+0x2ea/0x31f > > [24558.940368] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > [24558.947099] > > Your first guess was right. It looks like we have an issue with > reference counting on the channels. Can you send me your ppp_generic.o? http://nuclearcat.com/ppp_generic.o Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3) From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called concurrently on the same ppp_file. Even if this ppp_file was pointed at by two different file descriptors, I can't see how this could defeat the reference counting mechanism. I'm going to think more about it. Can you test with CONFIG_REFCOUNT_FULL? (and keep d780cd44e3ce ("drivers, net, ppp: convert ppp_file.refcnt from atomic_t to refcount_t")). Ok, i will try that tonight. On vanilla kernel or reversing mentioned in previous email patch?
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-15 17:55, Guillaume Nault wrote: On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: Here we go: [24558.921549] == [24558.922167] BUG: KASAN: use-after-free in ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task accel-pppd/12622 [24558.923113] [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G W 4.15.3-build-0134 #1 [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [24558.924406] Call Trace: [24558.924753] dump_stack+0x46/0x59 [24558.925103] print_address_description+0x6b/0x23b [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.925797] kasan_report+0x21b/0x241 [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] [24558.926829] ? sock_sendmsg+0x89/0x99 [24558.927176] ? __vfs_write+0xd9/0x4ad [24558.927523] ? kernel_read+0xed/0xed [24558.927872] ? SyS_getpeername+0x18c/0x18c [24558.928213] ? bit_waitqueue+0x2a/0x2a [24558.928561] ? wake_atomic_t_function+0x115/0x115 [24558.928898] vfs_ioctl+0x6e/0x81 [24558.929228] do_vfs_ioctl+0xa00/0xb10 [24558.929571] ? sigprocmask+0x1a6/0x1d0 [24558.929907] ? sigsuspend+0x13e/0x13e [24558.930239] ? ioctl_preallocate+0x14e/0x14e [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 [24558.930904] ? sigprocmask+0x1d0/0x1d0 [24558.931252] SyS_ioctl+0x39/0x55 [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 [24558.931942] do_syscall_64+0x1b1/0x31f [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 [24558.932627] RIP: 0033:0x7f302849d8a7 [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 ORIG_RAX: 0010 [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: 7f302849d8a7 [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: 3a67 [24558.934266] RBP: 7f3029a52b20 R08: R09: 55c8308d8e40 [24558.934607] R10: 0008 R11: 0206 R12: 7f3023f49358 [24558.934947] R13: 7ffe86e5723f R14: R15: 7f3029a53700 [24558.935288] [24558.935626] Allocated by task 12622: [24558.935972] ppp_register_net_channel+0x5f/0x5c6 [ppp_generic] [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] [24558.936640] SyS_connect+0x14b/0x1b7 [24558.936975] do_syscall_64+0x1b1/0x31f [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 [24558.937655] [24558.937993] Freed by task 12622: [24558.938321] kfree+0xb0/0x11d [24558.938658] ppp_release+0x111/0x120 [ppp_generic] [24558.938994] __fput+0x2ba/0x51a [24558.939332] task_work_run+0x11c/0x13d [24558.939676] exit_to_usermode_loop+0x7c/0xaf [24558.940022] do_syscall_64+0x2ea/0x31f [24558.940368] entry_SYSCALL_64_after_hwframe+0x21/0x86 [24558.947099] Your first guess was right. It looks like we have an issue with reference counting on the channels. Can you send me your ppp_generic.o? http://nuclearcat.com/ppp_generic.o Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-14 19:25, Guillaume Nault wrote: On Wed, Feb 14, 2018 at 06:49:19PM +0200, Denys Fedoryshchenko wrote: On 2018-02-14 18:47, Guillaume Nault wrote: > On Wed, Feb 14, 2018 at 06:29:34PM +0200, Denys Fedoryshchenko wrote: > > On 2018-02-14 18:07, Guillaume Nault wrote: > > > On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote: > > > > Hi, > > > > > > > > Upgraded kernel to 4.15.3, still it crashes after while (several > > > > hours, > > > > cannot do bisect, as it is production server). > > > > > > > > dev ppp # gdb ppp_generic.o > > > > GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1 > > > > <> > > > > Reading symbols from ppp_generic.o...done. > > > > (gdb) list *ppp_push+0x73 > > > > 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663). > > > > 1658list = list->next; > > > > 1659pch = list_entry(list, struct channel, clist); > > > > 1660 > > > > 1661spin_lock(&pch->downl); > > > > 1662if (pch->chan) { > > > > 1663if (pch->chan->ops->start_xmit(pch->chan, skb)) > > > > 1664ppp->xmit_pending = NULL; > > > > 1665} else { > > > > 1666/* channel got unregistered */ > > > > 1667kfree_skb(skb); > > > > > > > > > > > I expect a memory corruption. Do you have the possibility to run with > > > KASAN by any chance? > > I will try to enable it tonight. For now i reverted "drivers, net, > > ppp: > > convert ppp_file.refcnt from atomic_t to refcount_t" for test. > > > This commit looks good to me. Do you have doubts about it because it's > new in 4.15? Does it mean that your last known-good kernel is 4.14? I am just doing "manual" bisect, checking all possibilities, and picking patch to revert randomly. Must be a painful process. Are all of your networking modules required? With luck, you might be able to isolate a faulty module in fewer steps. Yes, correct, my known-good is 4.14.2. Good to know. Let me know if you can get a KASAN trace. Here we go: [24558.921549] == [24558.922167] BUG: KASAN: use-after-free in ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task accel-pppd/12622 [24558.923113] [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G W4.15.3-build-0134 #1 [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [24558.924406] Call Trace: [24558.924753] dump_stack+0x46/0x59 [24558.925103] print_address_description+0x6b/0x23b [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.925797] kasan_report+0x21b/0x241 [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] [24558.926829] ? sock_sendmsg+0x89/0x99 [24558.927176] ? __vfs_write+0xd9/0x4ad [24558.927523] ? kernel_read+0xed/0xed [24558.927872] ? SyS_getpeername+0x18c/0x18c [24558.928213] ? bit_waitqueue+0x2a/0x2a [24558.928561] ? wake_atomic_t_function+0x115/0x115 [24558.928898] vfs_ioctl+0x6e/0x81 [24558.929228] do_vfs_ioctl+0xa00/0xb10 [24558.929571] ? sigprocmask+0x1a6/0x1d0 [24558.929907] ? sigsuspend+0x13e/0x13e [24558.930239] ? ioctl_preallocate+0x14e/0x14e [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 [24558.930904] ? sigprocmask+0x1d0/0x1d0 [24558.931252] SyS_ioctl+0x39/0x55 [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 [24558.931942] do_syscall_64+0x1b1/0x31f [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 [24558.932627] RIP: 0033:0x7f302849d8a7 [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 ORIG_RAX: 0010 [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: 7f302849d8a7 [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: 3a67 [24558.934266] RBP: 7f3029a52b20 R08: R09: 55c8308d8e40 [24558.934607] R10: 0008 R11: 0206 R12: 7f3023f49358 [24558.934947] R13: 7ffe86e5723f R14: R15: 7f3029a53700 [24558.935288] [24558.935626] Allocated by task 12622: [24558.935972] ppp_register_net_channel+0x5f/0x5c6 [ppp_generic] [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] [24558.936640] SyS_connect+0x14b/0x1b7 [24558.936975] do_syscall_64+0x1b1/0x31f [24558.937319] entry_S
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-14 18:47, Guillaume Nault wrote: On Wed, Feb 14, 2018 at 06:29:34PM +0200, Denys Fedoryshchenko wrote: On 2018-02-14 18:07, Guillaume Nault wrote: > On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote: > > Hi, > > > > Upgraded kernel to 4.15.3, still it crashes after while (several > > hours, > > cannot do bisect, as it is production server). > > > > dev ppp # gdb ppp_generic.o > > GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1 > > <> > > Reading symbols from ppp_generic.o...done. > > (gdb) list *ppp_push+0x73 > > 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663). > > 1658 list = list->next; > > 1659 pch = list_entry(list, struct channel, clist); > > 1660 > > 1661 spin_lock(&pch->downl); > > 1662 if (pch->chan) { > > 1663 if (pch->chan->ops->start_xmit(pch->chan, skb)) > > 1664 ppp->xmit_pending = NULL; > > 1665 } else { > > 1666 /* channel got unregistered */ > > 1667 kfree_skb(skb); > > > > > I expect a memory corruption. Do you have the possibility to run with > KASAN by any chance? I will try to enable it tonight. For now i reverted "drivers, net, ppp: convert ppp_file.refcnt from atomic_t to refcount_t" for test. This commit looks good to me. Do you have doubts about it because it's new in 4.15? Does it mean that your last known-good kernel is 4.14? I am just doing "manual" bisect, checking all possibilities, and picking patch to revert randomly. Yes, correct, my known-good is 4.14.2.
Re: ppp/pppoe, still panic 4.15.3 in ppp_push
On 2018-02-14 18:07, Guillaume Nault wrote: On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote: Hi, Upgraded kernel to 4.15.3, still it crashes after while (several hours, cannot do bisect, as it is production server). dev ppp # gdb ppp_generic.o GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1 <> Reading symbols from ppp_generic.o...done. (gdb) list *ppp_push+0x73 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663). 1658list = list->next; 1659pch = list_entry(list, struct channel, clist); 1660 1661spin_lock(&pch->downl); 1662if (pch->chan) { 1663if (pch->chan->ops->start_xmit(pch->chan, skb)) 1664ppp->xmit_pending = NULL; 1665} else { 1666/* channel got unregistered */ 1667kfree_skb(skb); I expect a memory corruption. Do you have the possibility to run with KASAN by any chance? I will try to enable it tonight. For now i reverted "drivers, net, ppp: convert ppp_file.refcnt from atomic_t to refcount_t" for test.
ppp/pppoe, still panic 4.15.3 in ppp_push
Hi, Upgraded kernel to 4.15.3, still it crashes after while (several hours, cannot do bisect, as it is production server). dev ppp # gdb ppp_generic.o GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1 <> Reading symbols from ppp_generic.o...done. (gdb) list *ppp_push+0x73 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663). 1658list = list->next; 1659pch = list_entry(list, struct channel, clist); 1660 1661spin_lock(&pch->downl); 1662if (pch->chan) { 1663if (pch->chan->ops->start_xmit(pch->chan, skb)) 1664ppp->xmit_pending = NULL; 1665} else { 1666/* channel got unregistered */ 1667kfree_skb(skb); Feb 14 08:32:00 [17937.863304] general protection fault: [#1] SMP Feb 14 08:32:00 [17937.863638] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca Feb 14 08:32:00 [17937.865619] CPU: 6 PID: 12543 Comm: accel-pppd Not tainted 4.15.3-build-0134 #4 Feb 14 08:32:00 [17937.866211] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 Feb 14 08:32:00 [17937.866542] RIP: 0010:ppp_push+0x73/0x4ec [ppp_generic] Feb 14 08:32:00 [17937.866865] RSP: 0018:c90001fa7d50 EFLAGS: 00010282 Feb 14 08:32:00 [17937.867191] RAX: 0fd54d16ec03 RBX: 8803eeb207b8 RCX: 0101 Feb 14 08:32:00 [17937.867517] RDX: RSI: 8803f9fb5000 RDI: 8803eed1e443 Feb 14 08:32:00 [17937.867844] RBP: 8803f9fb5000 R08: 0001 R09: Feb 14 08:32:00 [17937.868171] R10: 7f0a75fba758 R11: 0293 R12: 8021 Feb 14 08:32:00 [17937.868499] R13: 8804144c7880 R14: 8021 R15: 8804144c7800 Feb 14 08:32:00 [17937.868824] FS: 7f0a7ecd8700() GS:88043418() knlGS: Feb 14 08:32:00 [17937.869408] CS: 0010 DS: ES: CR0: 80050033 Feb 14 08:32:00 [17937.869729] CR2: 7fa87a187978 CR3: 00042a6cd005 CR4: 001606e0 Feb 14 08:32:00 [17937.870053] Call Trace: Feb 14 08:32:00 [17937.870375] ? __kmalloc_node_track_caller+0xb5/0xd6 Feb 14 08:32:00 [17937.870700] __ppp_xmit_process+0x35/0x4c6 [ppp_generic] Feb 14 08:32:00 [17937.871025] ppp_xmit_process+0x35/0x88 [ppp_generic] Feb 14 08:32:00 [17937.871350] ppp_write+0xb1/0xbb [ppp_generic] Feb 14 08:32:00 [17937.871678] __vfs_write+0x1c/0x118 Feb 14 08:32:00 [17937.872003] ? SyS_epoll_ctl+0x399/0x871 Feb 14 08:32:00 [17937.872328] vfs_write+0xc6/0x169 Feb 14 08:32:00 [17937.872654] SyS_write+0x48/0x81 Feb 14 08:32:00 [17937.872980] do_syscall_64+0x5f/0xea Feb 14 08:32:00 [17937.873310] entry_SYSCALL_64_after_hwframe+0x21/0x86 Feb 14 08:32:00 [17937.873638] RIP: 0033:0x7f0a7e4bfb2d Feb 14 08:32:00 [17937.873963] RSP: 002b:7f0a7ecd7b00 EFLAGS: 0293 ORIG_RAX: 0001 Feb 14 08:32:00 [17937.874554] RAX: ffda RBX: 7f0a7d00b1e3 RCX: 7f0a7e4bfb2d Feb 14 08:32:00 [17937.874881] RDX: 000c RSI: 7f0a74175c80 RDI: 3ef8 Feb 14 08:32:00 [17937.875207] RBP: 7f0a7ecd7b30 R08: R09: 55776e7a5e40 Feb 14 08:32:00 [17937.875536] R10: 7f0a75fba758 R11: 0293 R12: 7f0a7550dd18 Feb 14 08:32:00 [17937.875863] R13: 7ffd4c941eaf R14: R15: 7f0a7ecd8700 Feb 14 08:32:00 [17937.876190] Code: 94 00 00 00 49 89 ff 0f ba e0 0a 72 43 48 8b 5f 68 48 8d 7b e8 e8 88 4f 84 e1 48 8b 7b b8 48 85 ff 74 10 48 8b 47 08 48 8b 34 24 10 85 c0 75 0b eb 14 48 8b 3c 2 4 e8 d8 6c 76 e1 49 c7 87 c8 Feb 14 08:32:00 [17937.877071] RIP: ppp_push+0x73/0x4ec [ppp_generic] RSP: c90001fa7d50 Feb 14 08:32:00 [17937.877435] ---[ end trace 30a3cc6a49109783 ]--- Feb 14 08:32:00 [17937.878370] Kernel panic - not syncing: Fatal exception in interrupt Feb 14 08:32:00 [17937.878715] Kernel Offset: disabled Feb 14 08:32:00 [17937.879771] Rebooting in 5 seconds..
4.15.2 kernel panic, nat, ppp bug?
Hello, Got this and then server rebooted with panic (second message). Workload: pppoe BRAS, lost of shapers, ppp interfaces Please let me know if i need to provide more information Feb 12 06:00:58 [13750.606169] WARNING: CPU: 6 PID: 0 at ./include/net/dst.h:256 nf_xfrm_me_harder+0x52/0xd9 [nf_nat] Feb 12 06:00:58 [13750.606747] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptabl e_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca Feb 12 06:00:58 [13750.608695] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.15.2-build-0134 #5 Feb 12 06:00:58 [13750.609017] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 Feb 12 06:00:58 [13750.609345] RIP: 0010:nf_xfrm_me_harder+0x52/0xd9 [nf_nat] Feb 12 06:00:58 [13750.609667] RSP: 0018:880434183c88 EFLAGS: 00010246 Feb 12 06:00:58 [13750.609985] RAX: RBX: 8803f997ce00 RCX: Feb 12 06:00:58 [13750.610306] RDX: 0001 RSI: 880406c09a00 RDI: 880434183cc8 Feb 12 06:00:58 [13750.610629] RBP: 822a81c0 R08: 0005 R09: 0001 Feb 12 06:00:58 [13750.610949] R10: 00ce R11: 88043154c320 R12: 0001 Feb 12 06:00:58 [13750.611274] R13: 880434183d50 R14: 00e20008 R15: 88042e320078 Feb 12 06:00:58 [13750.611599] FS: () GS:88043418() knlGS: Feb 12 06:00:58 [13750.612181] CS: 0010 DS: ES: CR0: 80050033 Feb 12 06:00:58 [13750.612500] CR2: 7f12eed3a140 CR3: 000446209003 CR4: 001606e0 Feb 12 06:00:58 [13750.612823] Call Trace: Feb 12 06:00:58 [13750.613138] Feb 12 06:00:58 [13750.613457] nf_nat_ipv4_out+0xa5/0xb9 [nf_nat_ipv4] Feb 12 06:00:58 [13750.613780] nf_hook_slow+0x31/0x93 Feb 12 06:00:58 [13750.614101] ip_output+0x93/0xaf Feb 12 06:00:58 [13750.614417] ? ip_fragment.constprop.5+0x6e/0x6e Feb 12 06:00:58 [13750.614739] ip_forward+0x36d/0x378 Feb 12 06:00:58 [13750.615057] ? ip_frag_mem+0x7/0x7 Feb 12 06:00:58 [13750.615376] ip_rcv+0x2f0/0x325 Feb 12 06:00:58 [13750.615698] ? ip_local_deliver_finish+0x1a8/0x1a8 Feb 12 06:00:58 [13750.616019] __netif_receive_skb_core+0x535/0x8b5 Feb 12 06:00:58 [13750.616340] ? kmem_cache_free_bulk+0x21b/0x233 Feb 12 06:00:58 [13750.616661] ? process_backlog+0x99/0x115 Feb 12 06:00:58 [13750.616981] process_backlog+0x99/0x115 Feb 12 06:00:58 [13750.617300] net_rx_action+0x11c/0x28a Feb 12 06:00:58 [13750.617620] __do_softirq+0xc8/0x1bf Feb 12 06:00:58 [13750.617941] irq_exit+0x49/0x88 Feb 12 06:00:58 [13750.618262] call_function_single_interrupt+0x92/0xa0 Feb 12 06:00:58 [13750.618587] Feb 12 06:00:58 [13750.618907] RIP: 0010:mwait_idle+0x4c/0x5e Feb 12 06:00:58 [13750.619227] RSP: 0018:c9000192bf08 EFLAGS: 0246 ORIG_RAX: ff04 Feb 12 06:00:58 [13750.619803] RAX: RBX: 88043296cc80 RCX: Feb 12 06:00:58 [13750.620120] RDX: RSI: RDI: Feb 12 06:00:58 [13750.620436] RBP: R08: 00525ccae333e271 R09: 00023738 Feb 12 06:00:58 [13750.620750] R10: c9000192be98 R11: 000236e0 R12: Feb 12 06:00:58 [13750.621065] R13: 0006 R14: 88043296cc80 R15: 88043296cc80 Feb 12 06:00:58 [13750.621394] ? rcu_eqs_enter_common.constprop.54+0x57/0x5f Feb 12 06:00:58 [13750.621714] do_idle+0xa8/0x130 Feb 12 06:00:58 [13750.622032] cpu_startup_entry+0x18/0x1a Feb 12 06:00:58 [13750.622349] secondary_startup_64+0xa5/0xb0 Feb 12 06:00:58 [13750.622667] Code: 48 83 e6 fe 48 83 7e 48 00 74 07 48 8b b6 80 01 00 00 8b 86 80 00 00 00 85 c0 74 0f 8d 50 01 f0 0f b1 96 80 00 00 00 74 04 eb ed <0f> ff 48 8b 4b 18 48 8d 54 24 08 45 31 c0 48 89 ef e8 44 91 8d Feb 12 06:00:58 [13750.623533] ---[ end trace 807c68f3da1711db ]--- Feb 12 06:00:58 [13750.623863] dst_release: dst:ad86ddff refcnt:-1 Feb 12 09:40:45 [26937.094365] WARNING: CPU: 5 PID: 0 at ./include/net/dst.h:256 nf_xfrm_me_harder+0x52/0xd9 [nf_nat] Feb 12 09:40:45 [26937.094958] Modules linked in: pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptabl e_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_
Re: e1000e hardware unit hangs
On 2018-01-24 20:31, Ben Greear wrote: On 01/24/2018 08:34 AM, Neftin, Sasha wrote: On 1/24/2018 18:11, Alexander Duyck wrote: On Tue, Jan 23, 2018 at 3:46 PM, Ben Greear wrote: Hello, Anyone have any more suggestions for making e1000e work better? This is from a 4.9.65+ kernel, with these additional e1000e patches applied: e1000e: Fix error path in link detection e1000e: Fix wrong comment related to link detection e1000e: Fix return value test e1000e: Separate signaling for link check/link up e1000e: Avoid receiver overrun interrupt bursts Most of these patches shouldn't address anything that would trigger Tx hangs. They are mostly related to just link detection. Test case is simply to run 3 tcp connections each trying to send 56Kbps of bi-directional data between a pair of e1000e interfaces :) No OOM related issues are seen on this kernel...similar test on 4.13 showed some OOM issues, but I have not debugged that yet... Really a question like this probably belongs on e1000-devel or intel-wired-lan so I have added those lists and the e1000e maintainer to the thread. It would be useful if you could provide more information about the device itself such as the ID and the kind of test you are running. Keep in mind the e1000e driver supports a pretty broad swath of devices so we need to narrow things down a bit. please, also re-check if your kernel include: e1000e: fix buffer overrun while the I219 is processing DMA transactions e1000e: fix the use of magic numbers for buffer overrun issue where you take fresh version of kernel? Hello, I tried adding those two patches, but I still see this splat shortly after starting my test. The kernel I am using is here: https://github.com/greearb/linux-ct-4.13 I've seen similar issues at least back to the 4.0 kernel, including stock kernels and my own kernels with additional patches. Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4295298499, wd-timeout: 5000 jiffies: 4295304192 tx-queues: 1 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: [ cut here ] Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 0 PID: 0 at /home/greearb/git/linux-4.13.dev.y/net/sched/sch_generic.c:322 dev_watchdog+0x228/0x250 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c cfg80211 macvlan wanlink(O) pktgen bnep bluetooth f...ss tpm_tis ip Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O4.13.16+ #22 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: task: 81e104c0 task.stack: 81e0 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RIP: 0010:dev_watchdog+0x228/0x250 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RSP: 0018:88042fc03e50 EFLAGS: 00010282 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RAX: 0086 RBX: RCX: Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RDX: 88042fc15b40 RSI: 88042fc0dbf8 RDI: 88042fc0dbf8 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RBP: 88042fc03e98 R08: 0001 R09: 03c4 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R10: R11: 03c4 R12: 1388 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R13: 000100050dc3 R14: 88041767 R15: 000100052400 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: FS: () GS:88042fc0() knlGS: Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CS: 0010 DS: ES: CR0: 80050033 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CR2: 01d14000 CR3: 01e09000 CR4: 001406f0 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Call Trace: Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: ? qdisc_rcu_free+0x40/0x40 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: call_timer_fn+0x30/0x160 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: ? qdisc_rcu_free+0x40/0x40 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: run_timer_softirq+0x1f0/0x450 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: ? lapic_next_deadline+0x21/0x30 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: ? clockevents_program_event+0x78/0xf0 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: __do_softirq+0xc1/0x2c0 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: irq_exit+0xb1/0xc0 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: smp_apic_timer_interrupt+0x38/0x50 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: apic_timer_interrupt+0x89/0x90 Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
On 2017-10-07 15:09, SviMik wrote: Unfortunately, netconsole has managed to send a kernel panic trace only once, and it's not related to this bug. Looks like something crashes really hard to make netconsole unusable. In some cases i had luck with pstore, when netconsole failed me (especially networking bugs), it stores panic messages more reliably, especially on recent platforms who have ERST and EFI. https://www.kernel.org/doc/Documentation/ABI/testing/pstore
Question about "prevent dst uses after free" and WARNING in nf_xfrm_me_harder / refcnt / 4.13.3
Hi, I'm running now 4.13.3, is this patch required for 4.13 as well? (it doesnt apply cleanly, as in 4.13 tcp_prequeue use skb_dst_force_safe, so i just renamed it there to skb_dst_force ) This is what i get on PPPoE BRAS on this kernel, patch applied (no idea if its related to patch, but just mentioning i applied it, as it's not vanilla 4.13.3) [ 7858.579600] [ cut here ] [ 7858.579818] WARNING: CPU: 2 PID: 0 at ./include/net/dst.h:254 nf_xfrm_me_harder+0x61/0xec [nf_nat] [ 7858.580160] Modules linked in: cls_fw act_police cls_u32 sch_ingress sch_htb pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set ts_bm xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca [ 7858.581255] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.13.3-build-0133 #27 [ 7858.581456] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [ 7858.581659] task: 880434e6a700 task.stack: c90001904000 [ 7858.581862] RIP: 0010:nf_xfrm_me_harder+0x61/0xec [nf_nat] [ 7858.582061] RSP: 0018:880436483bc0 EFLAGS: 00010246 [ 7858.582259] RAX: RBX: 822df000 RCX: 8803ee9028ce [ 7858.582461] RDX: 0014 RSI: 88041cd82900 RDI: 880436483bf8 [ 7858.582661] RBP: 880436483c20 R08: 81e0b400 R09: b916 [ 7858.582865] R10: 8803ee9028e8 R11: R12: 880401e92100 [ 7858.583068] R13: 0001 R14: 822df000 R15: 88042e280078 [ 7858.583269] FS: () GS:88043648() knlGS: [ 7858.583608] CS: 0010 DS: ES: CR0: 80050033 [ 7858.583809] CR2: 7f9b2886fc9c CR3: 000429223000 CR4: 001406e0 [ 7858.584013] Call Trace: [ 7858.584209] [ 7858.584408] ? nf_nat_ipv4_fn+0x12e/0x189 [nf_nat_ipv4] [ 7858.584605] nf_nat_ipv4_out+0xb6/0xd3 [nf_nat_ipv4] [ 7858.584807] iptable_nat_ipv4_out+0x15/0x17 [iptable_nat] [ 7858.585010] nf_hook_slow+0x2a/0x9a [ 7858.585209] ip_output+0x96/0xb4 [ 7858.585410] ? ip_fragment.constprop.5+0x7c/0x7c [ 7858.585610] ip_forward_finish+0x5b/0x60 [ 7858.585811] ip_forward+0x36d/0x37a [ 7858.586010] ? ip_frag_mem+0x11/0x11 [ 7858.586207] ip_rcv_finish+0x2f9/0x304 [ 7858.586406] ip_rcv+0x32a/0x337 [ 7858.586604] ? ip_local_deliver_finish+0x1bb/0x1bb [ 7858.586808] __netif_receive_skb_core+0x4f0/0x847 [ 7858.587009] __netif_receive_skb+0x18/0x5a [ 7858.587208] ? __netif_receive_skb+0x18/0x5a [ 7858.587407] process_backlog+0xa4/0x127 [ 7858.587606] net_rx_action+0x11e/0x2d8 [ 7858.587811] ? sched_clock_cpu+0x15/0x9b [ 7858.588013] __do_softirq+0xe7/0x23a [ 7858.588210] irq_exit+0x52/0x93 [ 7858.588408] smp_call_function_single_interrupt+0x33/0x35 [ 7858.588610] call_function_single_interrupt+0x83/0x90 [ 7858.588811] RIP: 0010:mwait_idle+0x93/0x13c [ 7858.589007] RSP: 0018:c90001907eb0 EFLAGS: 0246 ORIG_RAX: ff04 [ 7858.589347] RAX: RBX: 880434e6a700 RCX: [ 7858.589548] RDX: RSI: RDI: [ 7858.589750] RBP: c90001907ec0 R08: R09: 0001 [ 7858.589952] R10: c90001907e58 R11: 024d R12: 0002 [ 7858.590149] R13: R14: 880434e6a700 R15: 880434e6a700 [ 7858.590347] [ 7858.590541] arch_cpu_idle+0xf/0x11 [ 7858.590738] default_idle_call+0x25/0x27 [ 7858.590938] do_idle+0xb8/0x150 [ 7858.591133] cpu_startup_entry+0x1f/0x21 [ 7858.591332] start_secondary+0xe8/0xeb [ 7858.591531] secondary_startup_64+0x9f/0x9f [ 7858.591729] Code: 83 7e 48 00 74 07 48 8b b6 80 01 00 00 8b 86 80 00 00 00 85 c0 74 14 8d 50 01 f0 0f b1 96 80 00 00 00 0f 94 c2 84 d2 75 04 eb e8 <0f> ff 49 8b 4c 24 18 48 8d 55 a0 45 31 c0 48 89 df e8 d9 de 95 [ 7858.592239] ---[ end trace c089174999ff4fc3 ]--- [ 7858.592448] dst_release: dst:88041cd82900 refcnt:-1 [ 8139.130003] igb :07:00.0 eth0: igb: eth0 NIC Link is Down [ 8139.130309] igb :07:00.0 eth0: Reset adapter [ 8164.431523] igb :07:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 9149.190518] perf: interrupt took too long (3132 > 3128), lowering kernel.perf_event_max_sample_rate to 63000 [17205.528640] [ cut here ] [17205.528855] WARNING: CPU: 0 PID: 0 at ./include/net/dst.h:254 nf_xfrm_me_harder+0x61/0xec [nf_nat] [17205.529197] Modules linked in: cls_fw act_police cls_u32 sch_ingress sch_htb pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_REDIREC
Re: [PATCH] bgmac: Remove all offloading features, including GRO.
On 2017-09-16 03:18, Eric Dumazet wrote: On Fri, 2017-09-15 at 17:10 -0700, ros...@gmail.com wrote: Ok fair enough. Will only disable GRO in the driver. Well, do not even try. NETIF_F_SOFT_FEATURES is set by core networking stack in register_netdevice(), ( commit 212b573f5552c60265da721ff9ce32e3462a2cdd ) Absolutely no driver disables GRO (excepts the ones playing with LRO) I believe also iperf is definitely inconclusive test. Except iperf there is lot of different workloads and configurations, that might have different results.
Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
On 2017-09-13 20:20, Eric Dumazet wrote: On Wed, 2017-09-13 at 20:12 +0300, Denys Fedoryshchenko wrote: For my case, as load increased now, i am hitting same issue (i tried to play with quantum / bursts as well, didnt helped): tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class show dev eth3.777 classid 1:111 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 0 Sent 864151559 bytes 730566 pkt (dropped 15111, overlimits 0 requeues 0) backlog 73968000b 39934p requeues 0 lended: 499867 borrowed: 0 giants: 0 tokens: 608 ctokens: 121 You have drops (and ~40,000 packets in backlog) class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 0 Sent 1469352160 bytes 1243649 pkt (dropped 42933, overlimits 0 requeues 0) backlog 82536047b 39963p requeues 0 lended: 810475 borrowed: 0 giants: 0 tokens: 612 ctokens: 122 (1469352160-864151559)/5*8 968320961.6000 Less than 1Gbit and it's being throttled It is not : "overlimits 0" means this class was not throttled. Overlimits never appear in HTB as i know, here is simulation on this class that have constant "at least" 1G traffic, i throttled it to 1Kbit to simulate forced drops: shapernew ~ # sh /etc/shaper.cfg;sleep 1;tc -s -d class show dev eth3.777 classid 1:111;tc qdisc del dev eth3.777 root class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 1Kbit ceil 1Kbit linklayer ethernet burst 31280b/1 mpu 0b cburst 31280b/1 mpu 0b level 0 Sent 134350019 bytes 117520 pkt (dropped 7819, overlimits 0 requeues 0) backlog 7902126b 4976p requeues 0 lended: 86694 borrowed: 0 giants: 0 tokens: -93750 ctokens: -93750
Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
On 2017-09-13 19:55, Eric Dumazet wrote: On Wed, 2017-09-13 at 09:42 -0700, Eric Dumazet wrote: On Wed, 2017-09-13 at 19:27 +0300, Denys Fedoryshchenko wrote: > On 2017-09-13 19:16, Eric Dumazet wrote: > > On Wed, 2017-09-13 at 18:34 +0300, Denys Fedoryshchenko wrote: > >> Well, probably i am answering my own question, removing estimator from > >> classes seems drastically improve situation. > >> It seems estimator has some issues that cause shaper to behave > >> incorrectly (throttling traffic while it should not). > >> But i guess thats a bug? > >> As i was not able to predict such bottleneck by CPU load measurements. > > > > Well, there was a reason we disabled HTB class estimators by default ;) > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=64153ce0a7b61b2a5cacb01805cbf670142339e9 > > As soon as disabling it solve my problem - i'm fine, hehe, but i guess > other people who might hit this problem, should be aware how to find > reason. > They should not be disappointed in Linux :) Well, if they enable rate estimators while kernel does not set them by default, they get what they want, at a cost. > Because i can't measure this bottleneck before it happens, i'm seeing on > mpstat all cpu's are idle, and same time traffic is throttled. Normally things were supposed to get much better in linux-4.10 ( https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1c0d32fde5bdf1184bc274f864c09799278a1114 ) But I apparently added a scaling bug. I will try : diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c index 0385dece1f6fe5e26df1ce5f40956a79a2eebbf4..7c1ffd6f950172c1915d8e5fa2b5e3f77e4f4c78 100644 --- a/net/core/gen_estimator.c +++ b/net/core/gen_estimator.c @@ -83,10 +83,10 @@ static void est_timer(unsigned long arg) u64 rate, brate; est_fetch_counters(est, &b); - brate = (b.bytes - est->last_bytes) << (8 - est->ewma_log); + brate = (b.bytes - est->last_bytes) << (10 - est->ewma_log - est->intvl_log); brate -= (est->avbps >> est->ewma_log); - rate = (u64)(b.packets - est->last_packets) << (8 - est->ewma_log); + rate = (u64)(b.packets - est->last_packets) << (10 - est->ewma_log - est->intvl_log); rate -= (est->avpps >> est->ewma_log); write_seqcount_begin(&est->seq); Much better indeed # tc -s -d class sh dev eth0 classid 7002:11 ; sleep 10 ;tc -s -d class sh dev eth0 classid 7002:11 class htb 7002:11 parent 7002:1 prio 5 quantum 20 rate 5Gbit ceil 5Gbit linklayer ethernet burst 8b/1 mpu 0b cburst 8b/1 mpu 0b level 0 rate_handle 1 Sent 389085117074 bytes 256991500 pkt (dropped 0, overlimits 5926926 requeues 0) rate 4999Mbit 412762pps backlog 136260b 2p requeues 0 TCP pkts/rtx 256991584/0 bytes 389085252840/0 lended: 5961250 borrowed: 0 giants: 0 tokens: -1664 ctokens: -1664 class htb 7002:11 parent 7002:1 prio 5 quantum 20 rate 5Gbit ceil 5Gbit linklayer ethernet burst 8b/1 mpu 0b cburst 8b/1 mpu 0b level 0 rate_handle 1 Sent 395336315580 bytes 261120429 pkt (dropped 0, overlimits 6021776 requeues 0) rate 4999Mbit 412788pps backlog 68Kb 2p requeues 0 TCP pkts/rtx 261120469/0 bytes 395336384730/0 lended: 6056793 borrowed: 0 giants: 0 tokens: -1478 ctokens: -1478 echo "(395336315580-389085117074)/10*8" | bc 5000958800 For my case, as load increased now, i am hitting same issue (i tried to play with quantum / bursts as well, didnt helped): tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class show dev eth3.777 classid 1:111 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 0 Sent 864151559 bytes 730566 pkt (dropped 15111, overlimits 0 requeues 0) backlog 73968000b 39934p requeues 0 lended: 499867 borrowed: 0 giants: 0 tokens: 608 ctokens: 121 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 0 Sent 1469352160 bytes 1243649 pkt (dropped 42933, overlimits 0 requeues 0) backlog 82536047b 39963p requeues 0 lended: 810475 borrowed: 0 giants: 0 tokens: 612 ctokens: 122 (1469352160-864151559)/5*8 968320961.6000 Less than 1Gbit and it's being throttled Total bandwidth: class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 7 Sent 7839730635 bytes 8537393 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: 123 ctokens: 123 class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 mpu 0b level 7 Sent 11043190453 bytes 12008366 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: 124 ctokens: 124 694kpps 5.1Gbit
Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
On 2017-09-13 19:16, Eric Dumazet wrote: On Wed, 2017-09-13 at 18:34 +0300, Denys Fedoryshchenko wrote: Well, probably i am answering my own question, removing estimator from classes seems drastically improve situation. It seems estimator has some issues that cause shaper to behave incorrectly (throttling traffic while it should not). But i guess thats a bug? As i was not able to predict such bottleneck by CPU load measurements. Well, there was a reason we disabled HTB class estimators by default ;) https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=64153ce0a7b61b2a5cacb01805cbf670142339e9 As soon as disabling it solve my problem - i'm fine, hehe, but i guess other people who might hit this problem, should be aware how to find reason. They should not be disappointed in Linux :) Because i can't measure this bottleneck before it happens, i'm seeing on mpstat all cpu's are idle, and same time traffic is throttled.
Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
On 2017-09-13 18:51, Eric Dumazet wrote: On Wed, 2017-09-13 at 18:20 +0300, Denys Fedoryshchenko wrote: Hi, I noticed after increasing bandwidth over some amount HTB started to throttle classes it should not throttle. Also estimated rate in htb totally wrong, while byte counters is correct. Is there any overflow or something? Thanks Denys for the report, I will take a look at this, since I probably introduced some regression. It's definitely not something recent, this system was on older kernel with uptime over 200 days, and this bottleneck was present, i noticed it long time before. But never tried to remove estimators (increasing burst/cburst to insane values saved me for a while).
Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
Well, probably i am answering my own question, removing estimator from classes seems drastically improve situation. It seems estimator has some issues that cause shaper to behave incorrectly (throttling traffic while it should not). But i guess thats a bug? As i was not able to predict such bottleneck by CPU load measurements. On 2017-09-13 18:20, Denys Fedoryshchenko wrote: Hi, I noticed after increasing bandwidth over some amount HTB started to throttle classes it should not throttle. Also estimated rate in htb totally wrong, while byte counters is correct. Is there any overflow or something? X520 card (but XL710 same) br1 8000.90e2ba86c38c no eth3.1777 eth3.777 br2 8000.90e2ba86c38d no eth3.360 eth3.361 Inbound traffic is coming over one vlan, leaving another vlan. Shaper is just bunch of classes and u32 filters, with few fw filters. qdisc is pie I put totally high values to not reach them, tried to change quantum/burst/cburst but... stats below. First, "root" class is 1:1 showing rate 18086Mbit, which is physically impossible. Class 1:111 showing 5355Mbit, while real traffic is ~1.5Gbit shaper /etc # tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class show dev eth3.777 classid 1:111 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 6487632263 bytes 5235525 pkt (dropped 0, overlimits 0 requeues 0) rate 5529Mbit 557534pps backlog 0b 0p requeues 0 lended: 2423323 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 7438601731 bytes 6003811 pkt (dropped 0, overlimits 0 requeues 0) rate 5631Mbit 568214pps backlog 36624b 8p requeues 0 lended: 2772486 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 (7438601731-6487632263)/5*8 = 1.521.551.148 And most important some classes suffering, while they should not (not reaching limits) class htb 1:95 parent 1:1 leaf 95: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 13556762059 bytes 17474559 pkt (dropped 16017, overlimits 0 requeues 0) rate 2524Mbit 414197pps backlog 31969245b 34513p requeues 0 lended: 13995723 borrowed: 0 giants: 0 tokens: 111 ctokens: -2 Full classes stats: class htb 1:100 parent 1:1 leaf 100: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 116 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) rate 8bit 0pps backlog 0b 0p requeues 0 lended: 2 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 class htb 1:120 parent 1:1 leaf 120: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 531230043 bytes 782130 pkt (dropped 0, overlimits 0 requeues 0) rate 132274Kbit 25240pps backlog 0b 0p requeues 0 lended: 540693 borrowed: 0 giants: 0 tokens: 109 ctokens: -2 class htb 1:50 parent 1:1 leaf 50: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 773472109 bytes 587335 pkt (dropped 0, overlimits 0 requeues 0) rate 215929Kbit 20503pps backlog 0b 0p requeues 0 lended: 216614 borrowed: 0 giants: 0 tokens: 91 ctokens: -4 class htb 1:70 parent 1:1 leaf 70: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 1574768 bytes 6194 pkt (dropped 0, overlimits 0 requeues 0) rate 406272bit 214pps backlog 0b 0p requeues 0 lended: 5758 borrowed: 0 giants: 0 tokens: 101 ctokens: -3 class htb 1:90 parent 1:1 leaf 90: prio 0 quantum 5 rate 1Kbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 3206 bytes 53 pkt (dropped 0, overlimits 0 requeues 0) rate 848bit 1pps backlog 0b 0p requeues 0 lended: 53 borrowed: 0 giants: 0 class htb 1:110 parent 1:1 leaf 110: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 17205952113 bytes 12926008 pkt (dropped 239, overlimits 0 requeues 0) rate 4433Mbit 416825pps backlog 5847785b 2394p requeues 0 lended: 7021696 borrowed: 0 giants: 0 tokens: 91 ctokens: -4 class htb 1:45 root leaf 45: prio 0 quantum 5 rate 80Mbit ceil 80Mbit linklayer ethernet burst 1b/1 mpu 0b cburst 1b/1 mpu 0b level 0 Sent 2586 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) rate 456bit 1pps backlog 0b 0p requeues 0 lended: 45 borrowed: 0 giants: 0 tokens: 15540 ctokens: 15540 class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 0b/1 mpu 0b cburst 0b/1 mpu 0b level 7 Sent 7227721
HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)
Hi, I noticed after increasing bandwidth over some amount HTB started to throttle classes it should not throttle. Also estimated rate in htb totally wrong, while byte counters is correct. Is there any overflow or something? X520 card (but XL710 same) br1 8000.90e2ba86c38c no eth3.1777 eth3.777 br2 8000.90e2ba86c38d no eth3.360 eth3.361 Inbound traffic is coming over one vlan, leaving another vlan. Shaper is just bunch of classes and u32 filters, with few fw filters. qdisc is pie I put totally high values to not reach them, tried to change quantum/burst/cburst but... stats below. First, "root" class is 1:1 showing rate 18086Mbit, which is physically impossible. Class 1:111 showing 5355Mbit, while real traffic is ~1.5Gbit shaper /etc # tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class show dev eth3.777 classid 1:111 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 6487632263 bytes 5235525 pkt (dropped 0, overlimits 0 requeues 0) rate 5529Mbit 557534pps backlog 0b 0p requeues 0 lended: 2423323 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 7438601731 bytes 6003811 pkt (dropped 0, overlimits 0 requeues 0) rate 5631Mbit 568214pps backlog 36624b 8p requeues 0 lended: 2772486 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 (7438601731-6487632263)/5*8 = 1.521.551.148 And most important some classes suffering, while they should not (not reaching limits) class htb 1:95 parent 1:1 leaf 95: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 13556762059 bytes 17474559 pkt (dropped 16017, overlimits 0 requeues 0) rate 2524Mbit 414197pps backlog 31969245b 34513p requeues 0 lended: 13995723 borrowed: 0 giants: 0 tokens: 111 ctokens: -2 Full classes stats: class htb 1:100 parent 1:1 leaf 100: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 116 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) rate 8bit 0pps backlog 0b 0p requeues 0 lended: 2 borrowed: 0 giants: 0 tokens: 124 ctokens: -1 class htb 1:120 parent 1:1 leaf 120: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 531230043 bytes 782130 pkt (dropped 0, overlimits 0 requeues 0) rate 132274Kbit 25240pps backlog 0b 0p requeues 0 lended: 540693 borrowed: 0 giants: 0 tokens: 109 ctokens: -2 class htb 1:50 parent 1:1 leaf 50: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 773472109 bytes 587335 pkt (dropped 0, overlimits 0 requeues 0) rate 215929Kbit 20503pps backlog 0b 0p requeues 0 lended: 216614 borrowed: 0 giants: 0 tokens: 91 ctokens: -4 class htb 1:70 parent 1:1 leaf 70: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 1574768 bytes 6194 pkt (dropped 0, overlimits 0 requeues 0) rate 406272bit 214pps backlog 0b 0p requeues 0 lended: 5758 borrowed: 0 giants: 0 tokens: 101 ctokens: -3 class htb 1:90 parent 1:1 leaf 90: prio 0 quantum 5 rate 1Kbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 3206 bytes 53 pkt (dropped 0, overlimits 0 requeues 0) rate 848bit 1pps backlog 0b 0p requeues 0 lended: 53 borrowed: 0 giants: 0 class htb 1:110 parent 1:1 leaf 110: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 17205952113 bytes 12926008 pkt (dropped 239, overlimits 0 requeues 0) rate 4433Mbit 416825pps backlog 5847785b 2394p requeues 0 lended: 7021696 borrowed: 0 giants: 0 tokens: 91 ctokens: -4 class htb 1:45 root leaf 45: prio 0 quantum 5 rate 80Mbit ceil 80Mbit linklayer ethernet burst 1b/1 mpu 0b cburst 1b/1 mpu 0b level 0 Sent 2586 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) rate 456bit 1pps backlog 0b 0p requeues 0 lended: 45 borrowed: 0 giants: 0 tokens: 15540 ctokens: 15540 class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 0b/1 mpu 0b cburst 0b/1 mpu 0b level 7 Sent 72277215121 bytes 72693012 pkt (dropped 0, overlimits 0 requeues 0) rate 18086Mbit 2304729pps backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: -4 ctokens: -4 class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b level 0 Sent 21977384237 bytes 17697345
Re: ipset losing entries on its own
On 2017-09-06 13:08, Akshat Kakkar wrote: I am having ipset 6.32 The hash type is hash:ip I am adding/deleting IP addresses to it dynamically using scripts. However, it has been observed that at times few IPs (3-4 out of 4000) are not found in the set though it was added. Also, logs show there was not request for deletion of that IP from IPSet. Is it a bug? I think you should try to make script to create at least reproducible scenario And sure post more info about your setup (kernel version, vanilla or distro)
Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
On 2017-08-25 08:21, Florian Westphal wrote: Denys Fedoryshchenko wrote: >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling >>> approx 2gbps of pppoe users traffic) and noticed that after while server >>> rebooting(i have set reboot on panic and etc). >>> I can't run serial console, and in pstore / netconsole there is nothing. >>> Best i got is some very short message about softlockup in ipmi, but as >>> storage very limited there - it is near useless. >>> >>> By preliminary testing (can't do it much, as it's production) - it seems >>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3. >> >>Wild guess here, does this help? >> >>diff --git a/net/netfilter/nf_conntrack_helper.c >>b/net/netfilter/nf_conntrack_helper.c >>--- a/net/netfilter/nf_conntrack_helper.c >>+++ b/net/netfilter/nf_conntrack_helper.c >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, >>struct nf_conn *tmpl, >>help = nf_ct_helper_ext_add(ct, helper, flags); >>if (help == NULL) >>return -ENOMEM; >>+ if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags)); > >sigh, stupid typo, should be no ';' at the end above. Sorry, is there any plans to push this to 4.12 stable queue? No, sorry, this patch adds the extension for all connections that use a helper, but the nat extension is only used/required by pptp helper (and masquerade). Thing is that this patch should not be needed, I will have to review pptp again, maybe i missed a case where the extension is not added. Do you happen to have an oops backtrace? That might speed this up a bit. There is nothing in netconsole, and also nothing ERST pstore, i found reason just by guessing. Its totally headless also (no screen, no serial console). I can try to attach USB serial for serial console, but not sure it will help. If there is any other way to catch - i can try it, but as it's production server, so i can't "crash it" more than once per day.
Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
On 2017-07-24 19:20, Florian Westphal wrote: Florian Westphal wrote: Denys Fedoryshchenko wrote: > Hi, > > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling > approx 2gbps of pppoe users traffic) and noticed that after while server > rebooting(i have set reboot on panic and etc). > I can't run serial console, and in pstore / netconsole there is nothing. > Best i got is some very short message about softlockup in ipmi, but as > storage very limited there - it is near useless. > > By preliminary testing (can't do it much, as it's production) - it seems > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3. Wild guess here, does this help? diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, help = nf_ct_helper_ext_add(ct, helper, flags); if (help == NULL) return -ENOMEM; + if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags)); sigh, stupid typo, should be no ';' at the end above. Sorry, is there any plans to push this to 4.12 stable queue?
Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
On 2017-07-24 19:20, Florian Westphal wrote: Florian Westphal wrote: Denys Fedoryshchenko wrote: > Hi, > > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling > approx 2gbps of pppoe users traffic) and noticed that after while server > rebooting(i have set reboot on panic and etc). > I can't run serial console, and in pstore / netconsole there is nothing. > Best i got is some very short message about softlockup in ipmi, but as > storage very limited there - it is near useless. > > By preliminary testing (can't do it much, as it's production) - it seems > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3. Wild guess here, does this help? diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, help = nf_ct_helper_ext_add(ct, helper, flags); if (help == NULL) return -ENOMEM; + if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags)); sigh, stupid typo, should be no ';' at the end above. Tested-by: Denys Fedoryshchenko Tested and no more hangs for 2 days, definitely improvement. Any chance it will go to stable 4.12.x and new kernel? Thank you very much!
Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
On 2017-07-24 19:20, Florian Westphal wrote: Florian Westphal wrote: Denys Fedoryshchenko wrote: > Hi, > > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling > approx 2gbps of pppoe users traffic) and noticed that after while server > rebooting(i have set reboot on panic and etc). > I can't run serial console, and in pstore / netconsole there is nothing. > Best i got is some very short message about softlockup in ipmi, but as > storage very limited there - it is near useless. > > By preliminary testing (can't do it much, as it's production) - it seems > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3. Wild guess here, does this help? diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, help = nf_ct_helper_ext_add(ct, helper, flags); if (help == NULL) return -ENOMEM; + if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags)); sigh, stupid typo, should be no ';' at the end above. Tested, it looks like not hanging anymore (before it was hanging within 10 minutes) Probably i will wait 24h testing cycle.
nf_nat_pptp 4.12.3 kernel lockup/reboot
Hi, I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling approx 2gbps of pppoe users traffic) and noticed that after while server rebooting(i have set reboot on panic and etc). I can't run serial console, and in pstore / netconsole there is nothing. Best i got is some very short message about softlockup in ipmi, but as storage very limited there - it is near useless. By preliminary testing (can't do it much, as it's production) - it seems following lines causing issue, they worked in 4.11.8 and no more in 4.12.3. iptables -t raw -A PREROUTING -p tcp -m tcp --dport 1723 -j CT --helper pptp iptables -t raw -A PREROUTING -p tcp -m tcp --sport 1723 -j CT --helper pptp (there is no solid examples for helpers, not sure second line is necessary) I will try to do more tests tonight (lockdep debug and etc), but maybe someone have idea what might be wrong?
Re: [PATCH net] netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
On 2017-04-08 23:24, Pablo Neira Ayuso wrote: On Mon, Apr 03, 2017 at 10:55:11AM -0700, Eric Dumazet wrote: From: Eric Dumazet Denys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Applied to nf.git, thanks! Any plans to queue it to stable trees? It seems affected kernel for years.
Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
On 2017-04-03 15:09, Eric Dumazet wrote: On Mon, 2017-04-03 at 11:10 +0300, Denys Fedoryshchenko wrote: I modified patch a little as: if (th->doff * 4 < sizeof(_tcph)) { par->hotdrop = true; WARN_ON_ONCE(!tcpinfo->option); return false; } And it did triggered WARN once at morning, and didn't hit KASAN. I will run for a while more, to see if it is ok, and then if stable, will try to enable SFQ again. Excellent news ! We will post an official fix today, thanks a lot for this detective work Denys. I am not sure it is finally fixed, maybe we need test more? I'm doing extensive tests today with identical configuration (i had to run fifo, because customer cannot afford anymore outages). I've dded sfq now different way, and identical config i will run after 3 hours approx.
Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
On 2017-04-02 20:26, Eric Dumazet wrote: On Sun, 2017-04-02 at 10:14 -0700, Eric Dumazet wrote: Could that be that netfilter does not abort earlier if TCP header is completely wrong ? Yes, I wonder if this patch would be better, unless we replicate the th->doff sanity check in all netfilter modules dissecting TCP frames. diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c index ade024c90f4f129a7c384e9e1cbfdb8ffe73065f..8cb4eadd5ba1c20e74bc27ee52a0bc36a5b26725 100644 --- a/net/netfilter/xt_tcpudp.c +++ b/net/netfilter/xt_tcpudp.c @@ -103,11 +103,11 @@ static bool tcp_mt(const struct sk_buff *skb, struct xt_action_param *par) if (!NF_INVF(tcpinfo, XT_TCP_INV_FLAGS, (((unsigned char *)th)[13] & tcpinfo->flg_mask) == tcpinfo->flg_cmp)) return false; + if (th->doff * 4 < sizeof(_tcph)) { + par->hotdrop = true; + return false; + } if (tcpinfo->option) { - if (th->doff * 4 < sizeof(_tcph)) { - par->hotdrop = true; - return false; - } if (!tcp_find_option(tcpinfo->option, skb, par->thoff, th->doff*4 - sizeof(_tcph), tcpinfo->invflags & XT_TCP_INV_OPTION, I modified patch a little as: if (th->doff * 4 < sizeof(_tcph)) { par->hotdrop = true; WARN_ON_ONCE(!tcpinfo->option); return false; } And it did triggered WARN once at morning, and didn't hit KASAN. I will run for a while more, to see if it is ok, and then if stable, will try to enable SFQ again.
Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
On 2017-04-02 15:32, Eric Dumazet wrote: On Sun, 2017-04-02 at 15:25 +0300, Denys Fedoryshchenko wrote: > */ I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for curiosity, if this condition are triggered. Is it fine like that? Sure. It didnt triggered WARN_ON, and with both patches here is one more KASAN. What i noticed also after this KASAN, there is many others start to trigger in TCPMSS and locking up server by flood. There is heavy netlink activity, it is pppoe server with lot of shapers. I noticed there left sfq by mistake, usually i am removing it, because it may trigger kernel panic too (and hard to trace reason). I will try with pfifo instead, after 6 hours. Here is full log with others: https://nuclearcat.com/kasan.txt [ 2033.914478] == [ 2033.914855] BUG: KASAN: slab-out-of-bounds in tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS] at addr 8802bfe18140 [ 2033.915218] Read of size 1 by task swapper/1/0 [ 2033.915437] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.8-build-0136-debug #7 [ 2033.915787] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [ 2033.916010] Call Trace: [ 2033.916229] [ 2033.916449] dump_stack+0x99/0xd4 [ 2033.916662] ? _atomic_dec_and_lock+0x15d/0x15d [ 2033.916886] ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS] [ 2033.917110] kasan_object_err+0x21/0x81 [ 2033.917335] kasan_report+0x527/0x69d [ 2033.917557] ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS] [ 2033.917772] __asan_report_load1_noabort+0x19/0x1b [ 2033.917995] tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS] [ 2033.918222] ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS] [ 2033.918451] ? udp_mt+0x45a/0x45a [xt_tcpudp] [ 2033.918669] ? __fib_validate_source+0x46b/0xcd1 [ 2033.918895] ipt_do_table+0x1432/0x1573 [ip_tables] [ 2033.919114] ? ip_tables_net_init+0x15/0x15 [ip_tables] [ 2033.919338] ? ip_route_input_slow+0xe9f/0x17e3 [ 2033.919562] ? rt_set_nexthop+0x9a7/0x9a7 [ 2033.919790] ? ip_tables_net_exit+0xe/0x15 [ip_tables] [ 2033.920008] ? tcf_action_exec+0x14a/0x18c [ 2033.920227] ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle] [ 2033.920451] ? iptable_filter_net_exit+0x92/0x92 [iptable_filter] [ 2033.920667] iptable_filter_hook+0xc0/0x1c8 [iptable_filter] [ 2033.920882] nf_hook_slow+0x7d/0x121 [ 2033.921105] ip_forward+0x1183/0x11c6 [ 2033.921321] ? ip_forward_finish+0x168/0x168 [ 2033.921542] ? ip_frag_mem+0x43/0x43 [ 2033.921755] ? iptable_nat_net_exit+0x92/0x92 [iptable_nat] [ 2033.921981] ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4] [ 2033.922199] ip_rcv_finish+0xf4c/0xf5b [ 2033.922420] ip_rcv+0xb41/0xb72 [ 2033.922635] ? ip_local_deliver+0x282/0x282 [ 2033.922847] ? ip_local_deliver_finish+0x6e6/0x6e6 [ 2033.923073] ? ip_local_deliver+0x282/0x282 [ 2033.923291] __netif_receive_skb_core+0x1b27/0x21bf [ 2033.923510] ? netdev_rx_handler_register+0x1a6/0x1a6 [ 2033.923736] ? kasan_slab_free+0x137/0x154 [ 2033.923954] ? save_stack_trace+0x1b/0x1d [ 2033.924170] ? kasan_slab_free+0xaa/0x154 [ 2033.924387] ? net_rx_action+0x6ad/0x6dc [ 2033.924611] ? __do_softirq+0x22b/0x5df [ 2033.924826] ? irq_exit+0x8a/0xfe [ 2033.925048] ? do_IRQ+0x13d/0x155 [ 2033.925269] ? common_interrupt+0x83/0x83 [ 2033.925483] ? mwait_idle+0x15a/0x30d [ 2033.925704] ? napi_gro_flush+0x1d0/0x1d0 [ 2033.925928] ? start_secondary+0x2cc/0x2d5 [ 2033.926142] ? start_cpu+0x14/0x14 [ 2033.926354] __netif_receive_skb+0x5e/0x191 [ 2033.926576] process_backlog+0x295/0x573 [ 2033.926799] ? __netif_receive_skb+0x191/0x191 [ 2033.927022] napi_poll+0x311/0x745 [ 2033.927245] ? napi_complete_done+0x3b4/0x3b4 [ 2033.927460] ? igb_msix_ring+0x2d/0x35 [ 2033.927679] net_rx_action+0x2e8/0x6dc [ 2033.927903] ? napi_poll+0x745/0x745 [ 2033.928133] ? sched_clock_cpu+0x1f/0x18c [ 2033.928360] ? rps_trigger_softirq+0x181/0x1e4 [ 2033.928592] ? __tick_nohz_idle_enter+0x465/0xa6d [ 2033.928817] ? rps_may_expire_flow+0x29b/0x29b [ 2033.929038] ? irq_work_run+0x2c/0x2e [ 2033.929253] __do_softirq+0x22b/0x5df [ 2033.929464] ? smp_call_function_single_async+0x17d/0x17d [ 2033.929680] irq_exit+0x8a/0xfe [ 2033.929905] smp_call_function_single_interrupt+0x8d/0x90 [ 2033.930136] call_function_single_interrupt+0x83/0x90 [ 2033.930365] RIP: 0010:mwait_idle+0x15a/0x30d [ 2033.930581] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: ff04 [ 2033.930934] RAX: RBX: 8802d1000c80 RCX: [ 2033.931160] RDX: 11005a200190 RSI: RDI: [ 2033.931383] RBP: 8802d1017e98 R08: ed00583c4fc1 R09: 0080 [ 2033.931596] R10: 8802d1017d80 R11: ed00583c4fc1 R12: 0001 [ 2033.931808] R13: R14: 8802d1000c80 R15: dc00 [ 2033.932031] [ 2033.932247] arch_cpu_idle+0xf/0x11 [ 2033.932472] default_idle_call+0x59/0x5c [ 2033.932686] do_idle+0x11c/0x217 [ 2033.932906] cpu_startup_entry+0x1
Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
On 2017-04-02 15:19, Eric Dumazet wrote: On Sun, 2017-04-02 at 04:54 -0700, Eric Dumazet wrote: On Sun, 2017-04-02 at 13:45 +0200, Florian Westphal wrote: > Eric Dumazet wrote: > > - for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) { > > + for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) { > > if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) { > > u_int16_t oldmss; > > maybe I am low on caffeeine but this looks fine, for tcp header with > only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets 20-23 which seems ok. I am definitely low on caffeine ;) An issue in this function is that we might add the missing MSS option, without checking that TCP options are already full. But this should not cause a KASAN splat, only some malformed TCP packet (tcph->doff would wrap) Something like that maybe. diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c index 27241a767f17b4b27d24095a31e5e9a2d3e29ce4..1465aaf0e3a15d69d105d0a50b0429b11b6439d3 100644 --- a/net/netfilter/xt_TCPMSS.c +++ b/net/netfilter/xt_TCPMSS.c @@ -151,7 +151,9 @@ tcpmss_mangle_packet(struct sk_buff *skb, */ if (len > tcp_hdrlen) return 0; - + /* tcph->doff is 4 bits wide, do not wrap its value to 0 */ + if (tcp_hdrlen >= 15 * 4) + return 0; /* * MSS Option not found ?! add it.. */ I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for curiosity, if this condition are triggered. Is it fine like that?
Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
On 2017-04-02 14:45, Florian Westphal wrote: Eric Dumazet wrote: - for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) { + for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) { if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) { u_int16_t oldmss; maybe I am low on caffeeine but this looks fine, for tcp header with only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets 20-23 which seems ok. It seems some non-standard(or corrupted) packets are passing, because even on ~1G server it might cause corruption once per several days, KASAN seems need less time to trigger. I am not aware how things working, but: [25181.875696] Memory state around the buggy address: [25181.875919] 8802975fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [25181.876275] 88029760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [25181.876628] >880297600080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [25181.876984] ^ [25181.877203] 880297600100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [25181.877569] 880297600180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Why all data here is zero? I guess it should be some packet data?
KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8
Repost, due being sleepy missed few important points. I am searching reasons of crashes for multiple conntrack enabled servers, usually they point to conntrack, but i suspect use after free might be somewhere else, so i tried to enable KASAN. And seems i got something after few hours, and it looks related to all crashes, because on all that servers who rebooted i had MSS adjustment (--clamp-mss-to-pmtu or --set-mss). Please let me know if any additional information needed. [25181.855611] == [25181.855985] BUG: KASAN: use-after-free in tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] at addr 8802976000ea [25181.856344] Read of size 1 by task swapper/1/0 [25181.856555] page:ea000a5d8000 count:0 mapcount:0 mapping: (null) index:0x0 [25181.856909] flags: 0x1000() [25181.857123] raw: 1000 [25181.857630] raw: ea000b0444a0 ea000a0b1f60 [25181.857996] page dumped because: kasan: bad access detected [25181.858214] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.8-build-0133-debug #3 [25181.858571] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [25181.858786] Call Trace: [25181.859000] [25181.859215] dump_stack+0x99/0xd4 [25181.859423] ? _atomic_dec_and_lock+0x15d/0x15d [25181.859644] ? __dump_page+0x447/0x4e3 [25181.859859] ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.860080] kasan_report+0x577/0x69d [25181.860291] ? __ip_route_output_key_hash+0x14ce/0x1503 [25181.860512] ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.860736] __asan_report_load1_noabort+0x19/0x1b [25181.860956] tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.861180] ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS] [25181.861407] ? udp_mt+0x45a/0x45a [xt_tcpudp] [25181.861634] ? __fib_validate_source+0x46b/0xcd1 [25181.861860] ipt_do_table+0x1432/0x1573 [ip_tables] [25181.862088] ? igb_msix_ring+0x2d/0x35 [25181.862318] ? ip_tables_net_init+0x15/0x15 [ip_tables] [25181.862537] ? ip_route_input_slow+0xe9f/0x17e3 [25181.862759] ? handle_irq_event_percpu+0x141/0x141 [25181.862985] ? rt_set_nexthop+0x9a7/0x9a7 [25181.863203] ? ip_tables_net_exit+0xe/0x15 [ip_tables] [25181.863419] ? tcf_action_exec+0xce/0x18c [25181.863628] ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle] [25181.863856] ? iptable_filter_net_exit+0x92/0x92 [iptable_filter] [25181.864084] iptable_filter_hook+0xc0/0x1c8 [iptable_filter] [25181.864311] nf_hook_slow+0x7d/0x121 [25181.864536] ip_forward+0x1183/0x11c6 [25181.864752] ? ip_forward_finish+0x168/0x168 [25181.864967] ? ip_frag_mem+0x43/0x43 [25181.865194] ? iptable_nat_net_exit+0x92/0x92 [iptable_nat] [25181.865423] ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4] [25181.865648] ip_rcv_finish+0xf4c/0xf5b [25181.865861] ip_rcv+0xb41/0xb72 [25181.866086] ? ip_local_deliver+0x282/0x282 [25181.866308] ? ip_local_deliver_finish+0x6e6/0x6e6 [25181.866524] ? ip_local_deliver+0x282/0x282 [25181.866752] __netif_receive_skb_core+0x1b27/0x21bf [25181.866971] ? netdev_rx_handler_register+0x1a6/0x1a6 [25181.867186] ? enqueue_hrtimer+0x232/0x240 [25181.867401] ? hrtimer_start_range_ns+0xd1c/0xd4b [25181.867630] ? __ppp_xmit_process+0x101f/0x104e [ppp_generic] [25181.867852] ? hrtimer_cancel+0x20/0x20 [25181.868081] ? ppp_push+0x1402/0x1402 [ppp_generic] [25181.868301] ? __pskb_pull_tail+0xb0f/0xb25 [25181.868523] ? ppp_xmit_process+0x47/0xaf [ppp_generic] [25181.868749] __netif_receive_skb+0x5e/0x191 [25181.868968] process_backlog+0x295/0x573 [25181.869180] ? __netif_receive_skb+0x191/0x191 [25181.869401] napi_poll+0x311/0x745 [25181.869611] ? napi_complete_done+0x3b4/0x3b4 [25181.869836] ? __qdisc_run+0x4ec/0xb7f [25181.870061] ? sch_direct_xmit+0x60b/0x60b [25181.870286] net_rx_action+0x2e8/0x6dc [25181.870512] ? napi_poll+0x745/0x745 [25181.870732] ? rps_trigger_softirq+0x181/0x1e4 [25181.870956] ? rps_may_expire_flow+0x29b/0x29b [25181.871184] ? irq_work_run+0x2c/0x2e [25181.871411] __do_softirq+0x22b/0x5df [25181.871629] ? smp_call_function_single_async+0x17d/0x17d [25181.871854] irq_exit+0x8a/0xfe [25181.872069] smp_call_function_single_interrupt+0x8d/0x90 [25181.872297] call_function_single_interrupt+0x83/0x90 [25181.872519] RIP: 0010:mwait_idle+0x15a/0x30d [25181.872733] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: ff04 [25181.873091] RAX: RBX: 8802d1000c80 RCX: [25181.873311] RDX: 11005a200190 RSI: RDI: [25181.873532] RBP: 8802d1017e98 R08: 003f R09: 7f75f7fff700 [25181.873751] R10: 8802d1017d80 R11: 8802c9b0 R12: 0001 [25181.873971] R13: R14: 8802d1000c80 R15: dc00 [25181.874182] [25181.874393] arch_cpu_idle+0xf/0x11 [25181.874602] default_idle_call+0x59/0x5c [25181.874818] do_idle+0x11c/0x217 [2
finally found nasty use-after-free bug? 4.10.8
I am searching reasons of crashes for multiple NAT servers, and tried to enable KASAN. It seems i got something, and it looks very possible related to all crashes, because on all that servers i have MSS. [25181.855611] == [25181.855985] BUG: KASAN: use-after-free in tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] at addr 8802976000ea [25181.856344] Read of size 1 by task swapper/1/0 [25181.856555] page:ea000a5d8000 count:0 mapcount:0 mapping: (null) index:0x0 [25181.856909] flags: 0x1000() [25181.857123] raw: 1000 [25181.857630] raw: ea000b0444a0 ea000a0b1f60 [25181.857996] page dumped because: kasan: bad access detected [25181.858214] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.8-build-0133-debug #3 [25181.858571] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [25181.858786] Call Trace: [25181.859000] [25181.859215] dump_stack+0x99/0xd4 [25181.859423] ? _atomic_dec_and_lock+0x15d/0x15d [25181.859644] ? __dump_page+0x447/0x4e3 [25181.859859] ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.860080] kasan_report+0x577/0x69d [25181.860291] ? __ip_route_output_key_hash+0x14ce/0x1503 [25181.860512] ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.860736] __asan_report_load1_noabort+0x19/0x1b [25181.860956] tcpmss_tg4+0x682/0xe9c [xt_TCPMSS] [25181.861180] ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS] [25181.861407] ? udp_mt+0x45a/0x45a [xt_tcpudp] [25181.861634] ? __fib_validate_source+0x46b/0xcd1 [25181.861860] ipt_do_table+0x1432/0x1573 [ip_tables] [25181.862088] ? igb_msix_ring+0x2d/0x35 [25181.862318] ? ip_tables_net_init+0x15/0x15 [ip_tables] [25181.862537] ? ip_route_input_slow+0xe9f/0x17e3 [25181.862759] ? handle_irq_event_percpu+0x141/0x141 [25181.862985] ? rt_set_nexthop+0x9a7/0x9a7 [25181.863203] ? ip_tables_net_exit+0xe/0x15 [ip_tables] [25181.863419] ? tcf_action_exec+0xce/0x18c [25181.863628] ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle] [25181.863856] ? iptable_filter_net_exit+0x92/0x92 [iptable_filter] [25181.864084] iptable_filter_hook+0xc0/0x1c8 [iptable_filter] [25181.864311] nf_hook_slow+0x7d/0x121 [25181.864536] ip_forward+0x1183/0x11c6 [25181.864752] ? ip_forward_finish+0x168/0x168 [25181.864967] ? ip_frag_mem+0x43/0x43 [25181.865194] ? iptable_nat_net_exit+0x92/0x92 [iptable_nat] [25181.865423] ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4] [25181.865648] ip_rcv_finish+0xf4c/0xf5b [25181.865861] ip_rcv+0xb41/0xb72 [25181.866086] ? ip_local_deliver+0x282/0x282 [25181.866308] ? ip_local_deliver_finish+0x6e6/0x6e6 [25181.866524] ? ip_local_deliver+0x282/0x282 [25181.866752] __netif_receive_skb_core+0x1b27/0x21bf [25181.866971] ? netdev_rx_handler_register+0x1a6/0x1a6 [25181.867186] ? enqueue_hrtimer+0x232/0x240 [25181.867401] ? hrtimer_start_range_ns+0xd1c/0xd4b [25181.867630] ? __ppp_xmit_process+0x101f/0x104e [ppp_generic] [25181.867852] ? hrtimer_cancel+0x20/0x20 [25181.868081] ? ppp_push+0x1402/0x1402 [ppp_generic] [25181.868301] ? __pskb_pull_tail+0xb0f/0xb25 [25181.868523] ? ppp_xmit_process+0x47/0xaf [ppp_generic] [25181.868749] __netif_receive_skb+0x5e/0x191 [25181.868968] process_backlog+0x295/0x573 [25181.869180] ? __netif_receive_skb+0x191/0x191 [25181.869401] napi_poll+0x311/0x745 [25181.869611] ? napi_complete_done+0x3b4/0x3b4 [25181.869836] ? __qdisc_run+0x4ec/0xb7f [25181.870061] ? sch_direct_xmit+0x60b/0x60b [25181.870286] net_rx_action+0x2e8/0x6dc [25181.870512] ? napi_poll+0x745/0x745 [25181.870732] ? rps_trigger_softirq+0x181/0x1e4 [25181.870956] ? rps_may_expire_flow+0x29b/0x29b [25181.871184] ? irq_work_run+0x2c/0x2e [25181.871411] __do_softirq+0x22b/0x5df [25181.871629] ? smp_call_function_single_async+0x17d/0x17d [25181.871854] irq_exit+0x8a/0xfe [25181.872069] smp_call_function_single_interrupt+0x8d/0x90 [25181.872297] call_function_single_interrupt+0x83/0x90 [25181.872519] RIP: 0010:mwait_idle+0x15a/0x30d [25181.872733] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: ff04 [25181.873091] RAX: RBX: 8802d1000c80 RCX: [25181.873311] RDX: 11005a200190 RSI: RDI: [25181.873532] RBP: 8802d1017e98 R08: 003f R09: 7f75f7fff700 [25181.873751] R10: 8802d1017d80 R11: 8802c9b0 R12: 0001 [25181.873971] R13: R14: 8802d1000c80 R15: dc00 [25181.874182] [25181.874393] arch_cpu_idle+0xf/0x11 [25181.874602] default_idle_call+0x59/0x5c [25181.874818] do_idle+0x11c/0x217 [25181.875039] cpu_startup_entry+0x1f/0x21 [25181.875258] start_secondary+0x2cc/0x2d5 [25181.875481] start_cpu+0x14/0x14 [25181.875696] Memory state around the buggy address: [25181.875919] 8802975fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [25181.876275] fff
Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo
I am not sure if it is same issue, but panics still happen, but much less. Same server, nat. I will upgrade to latest 4.10.x build, because for this one i dont have files anymore (for symbols and etc). [864288.511464] Modules linked in: nf_conntrack_netlink nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca [864288.512740] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 4.10.1-build-0132 #2 [864288.513005] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 [864288.513454] task: 881038cb6000 task.stack: c9000c678000 [864288.513719] RIP: 0010:nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat] [864288.513980] RSP: 0018:88103fc43ba0 EFLAGS: 00010206 [864288.514237] RAX: 140504021ad8 RBX: 881004021ad8 RCX: 0100 [864288.514677] RDX: 140504021ad8 RSI: 88103279628c RDI: 88103279628c [864288.515117] RBP: 88103fc43be0 R08: c9003b47b558 R09: 0004 [864288.515558] R10: 8820083d00ce R11: 881038480b00 R12: 881004021a40 [864288.515998] R13: R14: a00d406e R15: c90036e11000 [864288.516438] FS: () GS:88103fc4() knlGS: [864288.516882] CS: 0010 DS: ES: CR0: 80050033 [864288.517142] CR2: 7fbfc303f978 CR3: 00202267c000 CR4: 001406e0 [864288.517580] Call Trace: [864288.517831] [864288.518090] __nf_ct_ext_destroy+0x3f/0x57 [nf_conntrack] [864288.518352] nf_conntrack_free+0x25/0x55 [nf_conntrack] [864288.518615] destroy_conntrack+0x80/0x8c [nf_conntrack] [864288.518880] nf_conntrack_destroy+0x19/0x1b [864288.519137] nf_ct_gc_expired+0x6e/0x71 [nf_conntrack] [864288.519400] __nf_conntrack_find_get+0x89/0x2ab [nf_conntrack] [864288.519663] nf_conntrack_in+0x1ec/0x877 [nf_conntrack] [864288.519925] ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4] [864288.520185] nf_hook_slow+0x2a/0x9a [864288.520439] ip_rcv+0x318/0x337 [864288.520692] ? ip_local_deliver_finish+0x1ba/0x1ba [864288.520953] __netif_receive_skb_core+0x607/0x852 [864288.521213] ? kmem_cache_free_bulk+0x232/0x274 [864288.521471] __netif_receive_skb+0x18/0x5a [864288.521727] process_backlog+0x90/0x113 [864288.521981] net_rx_action+0x114/0x2dc [864288.522238] ? sched_clock_cpu+0x15/0x94 [864288.522496] __do_softirq+0xe7/0x259 [864288.522753] irq_exit+0x52/0x93 [864288.523006] smp_call_function_single_interrupt+0x33/0x35 [864288.523267] call_function_single_interrupt+0x83/0x90 [864288.523531] RIP: 0010:mwait_idle+0x9e/0x125 [864288.523786] RSP: 0018:c9000c67beb0 EFLAGS: 0246 ORIG_RAX: ff04 [864288.524229] RAX: RBX: 881038cb6000 RCX: [864288.524669] RDX: RSI: RDI: [864288.525110] RBP: c9000c67bec0 R08: 0001 R09: [864288.525551] R10: c9000c67be50 R11: R12: 0011 [864288.525991] R13: R14: 881038cb6000 R15: 881038cb6000 [864288.526429] [864288.526682] arch_cpu_idle+0xf/0x11 [864288.526937] default_idle_call+0x25/0x27 [864288.527193] do_idle+0xb6/0x15d [864288.527446] cpu_startup_entry+0x1f/0x21 [864288.527702] start_secondary+0xe8/0xeb [864288.527961] start_cpu+0x14/0x14 [864288.528212] Code: 48 89 f7 48 89 75 c8 e8 6e e8 8f e1 8b 45 c4 48 8b 75 c8 48 83 c0 08 4d 8d 04 c7 49 8b 04 c7 a8 01 75 46 48 39 c3 74 1e 48 89 c2 <48> 8b 7a 08 48 85 ff 0f 84 b3 00 00 00 48 39 fb 0f 84 9e 00 00 [864288.528905] RIP: nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat] RSP: 88103fc43ba0 [864288.529362] ---[ end trace e3c40a5e4bf43e26 ]--- [864288.567835] Kernel panic - not syncing: Fatal exception in interrupt [864288.568122] Kernel Offset: disabled [864288.587619] Rebooting in 5 seconds..
__nf_conntrack_find_get - NMI watchdog, 4.10.5
Hi, While applying/removing shapers on few thousands of ppp interfaces got pppoe server rebooted with this message: [51306.144984] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] [51306.145319] Modules linked in: sch_sfq cls_fw act_police cls_u32 sch_ingress sch_htb pppoe pppox ppp_generic slhc netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntr ack_pptp nf_conntrack_proto_gre tun xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set ts_bm xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnet link iptable_mangle iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc [51306.146381] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.5-build-0132 #2 [51306.146577] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [51306.146775] task: 8200e4c0 task.stack: 8200 [51306.146976] RIP: 0010:__nf_conntrack_find_get+0x23/0x2ab [nf_conntrack] [51306.147173] RSP: 0018:880436403c50 EFLAGS: 0203 ORIG_RAX: ff10 [51306.147505] RAX: 60fbc8a03408 RBX: 7fffc4020277d00a RCX: b9a11ba7 [51306.147703] RDX: 88042a40 RSI: 81c03e00 RDI: 820d2340 [51306.147900] RBP: 880436403c80 R08: 4f33ff15 R09: 820d2340 [51306.148098] R10: 88041032bd10 R11: R12: 880436403cd8 [51306.148295] R13: 820d2340 R14: 81c03e00 R15: 0005cd08 [51306.148492] FS: () GS:88043640() knlGS: [51306.148824] CS: 0010 DS: ES: CR0: 80050033 [51306.149020] CR2: 7f455449f768 CR3: 00042bd64000 CR4: 001406f0 [51306.149217] Call Trace: [51306.149410] [51306.149605] nf_conntrack_in+0x1ec/0x877 [nf_conntrack] [51306.149804] ? _raw_read_unlock_bh+0x20/0x22 [51306.15] ? ppp_input+0x14c/0x157 [ppp_generic] [51306.150196] ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4] [51306.150394] nf_hook_slow+0x2a/0x9a [51306.150589] ip_rcv+0x318/0x337 [51306.150782] ? ip_local_deliver_finish+0x1ba/0x1ba [51306.150980] __netif_receive_skb_core+0x607/0x852 [51306.151178] ? swiotlb_sync_single+0x16/0x24 [51306.151373] __netif_receive_skb+0x18/0x5a [51306.151566] process_backlog+0x90/0x113 [51306.151761] net_rx_action+0x114/0x2dc [51306.151955] ? igb_msix_ring+0x2e/0x36 [51306.152151] __do_softirq+0xe7/0x259 [51306.152347] irq_exit+0x52/0x93 [51306.152541] do_IRQ+0xaa/0xc2 [51306.152735] common_interrupt+0x83/0x83 [51306.152931] RIP: 0010:mwait_idle+0x9e/0x125 [51306.153125] RSP: 0018:82003e28 EFLAGS: 0246 ORIG_RAX: ff1d [51306.153459] RAX: RBX: 8200e4c0 RCX: [51306.153656] RDX: RSI: RDI: [51306.153853] RBP: 82003e38 R08: 009e R09: [51306.154051] R10: 82003dc8 R11: R12: [51306.154247] R13: R14: 8200e4c0 R15: 8200e4c0 [51306.15] [51306.157951] arch_cpu_idle+0xf/0x11 [51306.158143] default_idle_call+0x25/0x27 [51306.158336] do_idle+0xb6/0x15d [51306.158527] cpu_startup_entry+0x1f/0x21 [51306.158721] rest_init+0x77/0x79 [51306.158915] start_kernel+0x3c9/0x3d6 [51306.159109] x86_64_start_reservations+0x2a/0x2c [51306.159307] x86_64_start_kernel+0x16a/0x178 [51306.159507] start_cpu+0x14/0x14 [51306.159701] Code: e8 ed d3 93 e1 5b 5d c3 0f 1f 44 00 00 55 89 c8 48 89 e5 41 57 41 56 49 89 f6 41 55 49 89 fd 41 54 49 89 d4 53 41 50 48 89 45 d0 <8b> 05 b4 b1 00 00 a8 01 74 04 f3 90 eb f2 44 8b 3d ad b1 00 00 [51306.160194] Kernel panic - not syncing: softlockup: hung tasks [51306.160392] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L 4.10.5-build-0132 #2 [51306.160725] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 [51306.160922] Call Trace: [51306.161115] [51306.161310] dump_stack+0x4d/0x63 [51306.161507] panic+0xd2/0x215 [51306.161702] watchdog_timer_fn+0x1a9/0x1cb [51306.161896] __hrtimer_run_queues+0xe4/0x1e3 [51306.162092] ? ktime_get_update_offsets_now+0x4f/0xef [51306.162288] hrtimer_interrupt+0xa5/0x167 [51306.162483] local_apic_timer_interrupt+0x4b/0x4e [51306.162679] smp_apic_timer_interrupt+0x38/0x48 [51306.162876] apic_timer_interrupt+0x83/0x90 [51306.163073] RIP: 0010:__nf_conntrack_find_get+0x23/0x2ab [nf_conntrack] [51306.163269] RSP: 0018:880436403c50 EFLAGS: 0203 ORIG_RAX: ff10 [51306.163601] RAX: 60fbc8a03408 RBX: 7fffc4020277d00a RCX: b9a11ba7 [51306.163798] RDX: 88042a40 RSI: 81c03e00 RDI: 820d2340 [51306.163996] RBP: 880436403c80 R08: 4f33ff15 R09: 820d2340 [51306.164195] R10: 88041032bd10 R11: R12: 880436403cd8 [51306.164394] R13: 8
4.9.4 panic, nf_conntrack_tuple_taken
Hi, Seems i'm quite "lucky" and hitting another bug. This time it is different server, but i believe i've seen this bug on few pppoe servers, but here it is happening once per 1-2 days. Out of tree patch applied, to optimize gc heuristics. I don't exclude (but very small chance) hardware issue, and this bug very hard to call trace/panic message, i dont know why, but it was not storing it in pstore, and once stored only half of message. It happens on 4.9.9 as well, but didnt captured call trace yet, if it is same or not, this is only one trace i was able to catch. Also might be related to fragmentation/tunnels, because reboots started when i ran ipip ddos protection tunnel. <4>[160340.861244] general protection fault: [#1] SMP <4>[160340.861527] Modules linked in: ioatdma w83l786ng w83l785ts w83795 w83793 w83792d w83791d w83781d w83627ehf vt8231 via686a tmp421 tmp401 tmp102 thmc50 tc74 smsc47m192 smm665 sis5595 sht21 sht15 pmbus_core pcf8591 ntc_thermistor nct7904 nct7802 nct6775 mcp3021 max6697 max6650 max6642 max6639 max31790 max197 max1668 max1619 max16065 max ltc4261 ltc4245 ltc4215 ltc4151 ltc2990 lm95245 lm95241 lm95234 lm93 lm92 lm90 lm87 lm85 lm83 lm80 lm78 lm77 lm75 lm73 lm70 lm63 lineage_pem k8temp k10temp jc42 ina3221 ina2xx ina209 ibmpex ibmaem i5k_amb i5500_temp hwmon_vid hih6130 gpio_fan gl518sm g760a ftsteutates fschmd fam15h_power f75375s emc6w201 emc2103 emc1403 ds620 ds1621 coretemp asus_atk0110 asc7621 amc6821 adt7x10 adt7470 adt7462 adt7411 ads7871 ads7828 ads1015 adm1031 adm1029 adm1021 adcxx ad7418 ad7414 <4>[160340.870563] ad7314 acpi_power_meter cls_u32 sch_pie sch_htb msr ipmi_devintf ipmi_si ipmi_msghandler xt_nat xt_set xt_mark xt_connmark iptable_raw xt_CT ip_set_hash_net ip_set nfnetlink xt_hl xt_TCPMSS xt_tcpudp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables netconsole configfs ipip tunnel4 ip_tunnel 8021q garp mrp stp llc ixgbe dca <4>[160340.875258] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.4-build-0130 #4 <4>[160340.875529] Hardware name: Supermicro X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0a 12/17/2015 <4>[160340.875981] task: 88040d5bd5c0 task.stack: c9000194 <4>[160340.876247] RIP: 0010:[] [] nf_conntrack_tuple_taken+0x68/0x196 [nf_conntrack] <4>[160340.876789] RSP: 0018:88041fdc37c0 EFLAGS: 00010246 <4>[160340.877053] RAX: 02530d1f RBX: ffb00404024062c8 RCX: 0001 <4>[160340.877506] RDX: 1f2f RSI: f3476b40 RDI: 8803f9542640 <4>[160340.877956] RBP: 88041fdc37f0 R08: 2682c87d R09: 5bf0500a <4>[160340.878410] R10: 001e6b01 R11: 3a8b60eb R12: 88041fdc3800 <4>[160340.878860] R13: 4aeb R14: 880407304780 R15: 820b2dc0 <4>[160340.879315] FS: () GS:88041fdc() knlGS: <4>[160340.879771] CS: 0010 DS: ES: CR0: 80050033 <4>[160340.880036] CR2: 0062da00 CR3: 02007000 CR4: 001406e0 <4>[160340.880483] Stack: <4>[160340.880743] 88040728 880407304780 880407304780 <4>[160340.881472] 0008 001e6b01 88041fdc3830 a00b8209 <4>[160340.882197] fa50655f 0e50f05b0002bb01 <4>[160340.882930] Call Trace: <4>[160340.883185] <4>[160340.883264] [] nf_nat_used_tuple+0x24/0x2b [nf_nat] <4>[160340.883789] [] nf_nat_setup_info+0x2bf/0x805 [nf_nat] <4>[160340.884062] [] ? nf_nat_bysource_hash+0xb0/0xb0 [nf_nat] <4>[160340.884331] [] xt_snat_target_v0+0x65/0x67 [xt_nat] <4>[160340.884599] [] ipt_do_table+0x28e/0x5a2 [ip_tables] <4>[160340.884868] [] ? ipt_do_table+0x586/0x5a2 [ip_tables] <4>[160340.885135] [] ? iptable_nat_ipv4_fn+0x12/0x12 [iptable_nat] <4>[160340.890247] [] iptable_nat_do_chain+0x1a/0x1c [iptable_nat] <4>[160340.890701] [] nf_nat_ipv4_fn+0xeb/0x177 [nf_nat_ipv4] <4>[160340.890970] [] nf_nat_ipv4_out+0x35/0x37 [nf_nat_ipv4] <4>[160340.891239] [] iptable_nat_ipv4_out+0x10/0x12 [iptable_nat] <4>[160340.891697] [] nf_iterate+0x34/0x57 <4>[160340.891960] [] nf_hook_slow+0x2b/0x91 <4>[160340.892224] [] ip_output+0x99/0xb6 <4>[160340.892493] [] ? ip_fragment.constprop.5+0x77/0x77 <4>[160340.892766] [] ip_forward_finish+0x53/0x58 <4>[160340.893034] [] ip_forward+0x32d/0x33a <4>[160340.893296] [] ? ip_frag_mem+0x3e/0x3e <4>[160340.893563] [] ip_rcv_finish+0x2e8/0x2f3 <4>[160340.893828] [] ip_rcv+0x318/0x325 <4>[160340.894095] [] ? ip_local_deliver_finish+0x109/0x109 <4>[160340.894365] [] __netif_receive_skb_core+0x5cf/0x807 <4>[160340.894631] [] ? tcp4_gro_receive+0x17b/0x17f <4>[160340.894902] [] ? inet_gro_receive+0x229/0x239 <4>[160340.895170] [] __netif_receive_skb+0x13/0x55 <4>[160340.895439] [] netif_receive_skb_internal+0x3b/0x7
Re: 4.9 conntrack performance issues
On 2017-01-30 13:26, Guillaume Nault wrote: On Sun, Jan 15, 2017 at 01:05:58AM +0200, Denys Fedoryshchenko wrote: Hi! Sorry if i added someone wrongly to CC, please let me know, if i should remove. I just run successfully 4.9 on my nat several days ago, and seems panic issue disappeared. Hi Denys, After two weeks running Linux 4.9, do you confirm that the original issue[1] is gone? Regards, Guillaume [1]: https://www.spinics.net/lists/netdev/msg410795.html Yes, no more reboots at all and 4.9 patched for gc issues seems significantly better for NAT performance (CPU load lower almost twice than previous kernels, i dont have exact numbers).
Re: 4.9 conntrack performance issues
On 2017-01-15 02:29, Florian Westphal wrote: Denys Fedoryshchenko wrote: On 2017-01-15 01:53, Florian Westphal wrote: >Denys Fedoryshchenko wrote: > >I suspect you might also have to change > >1011 } else if (expired_count) { >1012 gc_work->next_gc_run /= 2U; >1013 next_run = msecs_to_jiffies(1); >1014 } else { > >line 2013 to >next_run = msecs_to_jiffies(HZ / 2); I think its wrong to rely on "expired_count", with these kinds of numbers (up to 10k entries are scanned per round in Denys setup, its basically always going to be > 0. I think we should only decide to scan more frequently if eviction ratio is large, say, we found more than 1/4 of entries to be stale. I sent a small patch offlist that does just that. >How many total connections is the machine handling on average? >And how many new/delete events happen per second? 1-2 million connections, at current moment 988k I dont know if it is correct method to measure events rate: NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown. 40027 NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown. 40951 Thanks, thats exactly what I was looking for. So I am not at all surprised that gc_worker eats cpu cycles... It is not peak time, so values can be 2-3 higher at peak time, but even right now, it is hogging one core, leaving only 20% idle left, while others are 80-83% idle. I agree its a bug. >> |--54.65%--gc_worker >> | | >> | --3.58%--nf_ct_gc_expired >> | | >> | |--1.90%--nf_ct_delete > >I'd be interested to see how often that shows up on other cores >(from packet path). Other CPU's totally different: This is top entry 99.60% 0.00% swapper [kernel.kallsyms][k] start_secondary | ---start_secondary | --99.42%--cpu_startup_entry | [..] |--36.02%--process_backlog | | | | | | | | | --35.64%--__netif_receive_skb gc_worker didnt appeared on other core at all. Or i am checking something wrong? Look for "nf_ct_gc_expired" and "nf_ct_delete". Its going to be deep down in the call graph. I tried my best to record as much data as possible, but it doesnt show it in callgraph, just a little bit in statistics: 0.01% 0.00% swapper [nf_conntrack][k] nf_ct_delete 0.01% 0.00% swapper [nf_conntrack][k] nf_ct_gc_expired And thats it.
Re: 4.9 conntrack performance issues
On 2017-01-15 01:53, Florian Westphal wrote: Denys Fedoryshchenko wrote: [ CC Nicolas since he also played with gc heuristics in the past ] Sorry if i added someone wrongly to CC, please let me know, if i should remove. I just run successfully 4.9 on my nat several days ago, and seems panic issue disappeared. But i started to face another issue, it seems garbage collector is hogging one of CPU's. It was handling load very well at 4.8 and below, it might be still fine, but i suspect queues that belong to hogged cpu might experience issues. The worker doesn't grab locks for long and calls scheduler for every bucket to give a chance for other threads to run. It also doesn't block softinterrupts. Is there anything can be done to improve cpu load distribution or reduce single core load? No, I am afraid we don't export any of the heuristics as tuneables so far. You could try changing defaults in net/netfilter/nf_conntrack_core.c: #define GC_MAX_BUCKETS_DIV 64u /* upper bound of scan intervals */ #define GC_INTERVAL_MAX (2 * HZ) /* maximum conntracks to evict per gc run */ #define GC_MAX_EVICTS 256u (the first two result in ~2 minute worst case timeout detection on a fully idle system). For instance you could use GC_MAX_BUCKETS_DIV -> 128 GC_INTERVAL_MAX-> 30 * HZ (This means that it takes one hour for a dead connection to be picked up on an idle system, but thats only relevant in case you use conntrack events to log when connection went down and need more precise accounting). Not a big deal in my case. I suspect you might also have to change 1011 } else if (expired_count) { 1012 gc_work->next_gc_run /= 2U; 1013 next_run = msecs_to_jiffies(1); 1014 } else { line 2013 to next_run = msecs_to_jiffies(HZ / 2); or something like this to not have frequent rescans. OK The gc is also done from the packet path (i.e. accounted towards (k)softirq). How many total connections is the machine handling on average? And how many new/delete events happen per second? 1-2 million connections, at current moment 988k I dont know if it is correct method to measure events rate: NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown. 40027 NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown. 40951 It is not peak time, so values can be 2-3 higher at peak time, but even right now, it is hogging one core, leaving only 20% idle left, while others are 80-83% idle. 88.98% 0.00% kworker/24:1 [kernel.kallsyms] [k] process_one_work | ---process_one_work | |--54.65%--gc_worker | | | --3.58%--nf_ct_gc_expired | | | |--1.90%--nf_ct_delete I'd be interested to see how often that shows up on other cores (from packet path). Other CPU's totally different: This is top entry 99.60% 0.00% swapper [kernel.kallsyms][k] start_secondary | ---start_secondary | --99.42%--cpu_startup_entry | --98.04%--default_idle_call arch_cpu_idle | |--48.58%--call_function_single_interrupt | | | --46.36%--smp_call_function_single_interrupt | smp_trace_call_function_single_interrupt | | | |--44.18%--irq_exit | | | | | |--43.37%--__do_softirq | | | | | | | --43.18%--net_rx_action | | | | | | | |--36.02%--process_backlog | | | | | | | | | --35.64%--__netif_receive_skb gc_worker didnt appeared on other core at all. Or i am checking something wrong?
4.9 conntrack performance issues
Hi! Sorry if i added someone wrongly to CC, please let me know, if i should remove. I just run successfully 4.9 on my nat several days ago, and seems panic issue disappeared. But i started to face another issue, it seems garbage collector is hogging one of CPU's. Here is my data: 2xE5-2640 v3 396G ram 2x10G (bonding) with approx 14-15G load at peak time It was handling load very well at 4.8 and below, it might be still fine, but i suspect queues that belong to hogged cpu might experience issues. Is there anything can be done to improve cpu load distribution or reduce single core load? net.netfilter.nf_conntrack_buckets = 65536 net.netfilter.nf_conntrack_checksum = 1 net.netfilter.nf_conntrack_count = 1236021 net.netfilter.nf_conntrack_events = 1 net.netfilter.nf_conntrack_expect_max = 1024 net.netfilter.nf_conntrack_generic_timeout = 600 net.netfilter.nf_conntrack_helper = 0 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_log_invalid = 0 net.netfilter.nf_conntrack_max = 6553600 net.netfilter.nf_conntrack_tcp_be_liberal = 0 net.netfilter.nf_conntrack_tcp_loose = 0 net.netfilter.nf_conntrack_tcp_max_retrans = 3 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10 net.netfilter.nf_conntrack_tcp_timeout_established = 600 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 20 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 20 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 60 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 10 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 20 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 20 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 30 net.netfilter.nf_conntrack_timestamp = 0 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.nf_conntrack_max = 6553600 it is non-peak values, as adjustments i have shorter than default timeouts. Changing net.netfilter.nf_conntrack_buckets to higher value doesn't fix issue. I noticed that one of CPU's hogged (N24 in this case): Linux 4.9.2-build-0127 (NAT)01/14/17_x86_64_(32 CPU) 23:01:54 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 23:02:04 all0.090.001.600.010.00 28.280.00 0.00 70.01 23:02:04 00.110.000.000.000.00 32.380.00 0.00 67.51 23:02:04 10.120.000.120.000.00 29.910.00 0.00 69.86 23:02:04 20.230.000.110.000.00 29.570.00 0.00 70.09 23:02:04 30.110.000.110.110.00 28.800.00 0.00 70.86 23:02:04 40.230.000.110.110.00 31.410.00 0.00 68.14 23:02:04 50.110.000.000.000.00 29.280.00 0.00 70.61 23:02:04 60.110.000.110.000.00 31.810.00 0.00 67.96 23:02:04 70.110.000.110.000.00 32.690.00 0.00 67.08 23:02:04 80.000.000.230.000.00 42.120.00 0.00 57.64 23:02:04 90.110.000.000.000.00 30.860.00 0.00 69.02 23:02:04 100.110.000.110.000.00 30.930.00 0.00 68.84 23:02:04 110.000.000.110.000.00 32.730.00 0.00 67.16 23:02:04 120.110.000.110.000.00 29.850.00 0.00 69.92 23:02:04 130.000.000.000.000.00 30.960.00 0.00 69.04 23:02:04 140.000.000.000.000.00 30.090.00 0.00 69.91 23:02:04 150.000.000.110.000.00 30.630.00 0.00 69.26 23:02:04 160.110.000.000.000.00 25.880.00 0.00 74.01 23:02:04 170.110.000.000.000.00 22.820.00 0.00 77.07 23:02:04 180.110.000.000.000.00 23.750.00 0.00 76.14 23:02:04 190.110.000.110.000.00 24.860.00 0.00 74.92 23:02:04 200.110.000.110.110.00 24.480.00 0.00 75.19 23:02:04 210.220.000.110.000.00 23.430.00 0.00 76.24 23:02:04 220.110.000.110.000.00 25.460.00 0.00 74.32 23:02:04 230.000.000.110.000.00 25.470.00 0.00 74.41 23:02:04 240.000.00 45.060.000.00 42.180.00 0.00 12.76 23:02:04 250.110.000.110.110.00 25.220.00 0.00 74.46 23:02:04 260.110.000.000.110.00 23.390.00 0.00 76.39 23:02:04 270.220.000.110.000.00 23.830.00 0.00 75.85 23:02:04 280.110.000.110.000.00 24.100.00 0.00 75.68 23:02:04 290.110.000.110.
Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo
On 2017-01-11 19:22, Guillaume Nault wrote: Cc: netfilter-de...@vger.kernel.org, I'm afraid I'll need some help for this case. On Sat, Dec 17, 2016 at 09:48:13PM +0200, Denys Fedoryshchenko wrote: Hi, I posted recently several netfilter related crashes, didn't got any answers, one of them started to happen quite often on loaded NAT (17Gbps), so after trying endless ways to make it stable, i found out that in backtrace i can often see timers, and this bug probably appearing on older releases, i've seen such backtrace with timer fired for conntrack on them. I disabled Intel turbo for cpus on this loaded NAT, and voila, panic disappeared for 2nd day! * by wrmsr -a 0x1a0 0x4000850089 I am not sure timers is the reason, but probably turbo creating some condition for bug. Re-formatting the stack-trace for easier reference: [28904.162607] BUG: unable to handle kernel NULL pointer dereference at 0008 [28904.163210] IP: [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.163745] PGD 0 [28904.164058] Oops: 0002 [#1] SMP [28904.164323] Modules linked in: nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca [28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 4.8.14-build-0124 #2 [28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015 [28904.168853] task: 885fa42e8c40 task.stack: 885fa42f [28904.169114] RIP: 0010:[] [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246 [28904.169901] RAX: RBX: 885fbccc RCX: 885fbccc0010 [28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 885fbccc [28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 0100 [28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 885f87a1c140 [28904.170971] R13: 0005d948 R14: 000ea942 R15: 885f87a1c160 [28904.171237] FS: () GS:885fbccc() knlGS: [28904.171688] CS: 0010 DS: ES: CR0: 80050033 [28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 001406e0 [28904.172231] Stack: [28904.172482] 885f87a1c140 820a1405 885fbccc3e28 a00abb30 [28904.173182] 0002820a1405 885f87a1c140 885f99a28201 [28904.173884] 820050c8 885fbccc3e58 a00abc62 [28904.174585] Call Trace: [28904.174835] [28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2 [nf_conntrack] [28904.175613] [] nf_ct_delete+0x109/0x12c [nf_conntrack] [28904.175894] [] ? nf_ct_delete+0x12c/0x12c [nf_conntrack] [28904.176169] [] death_by_timeout+0xd/0xf [nf_conntrack] [28904.176443] [] call_timer_fn.isra.5+0x17/0x6b [28904.176714] [] expire_timers+0x6f/0x7e [28904.176975] [] run_timer_softirq+0x69/0x8b [28904.177238] [] ? clockevents_program_event+0xd0/0xe8 [28904.177504] [] __do_softirq+0xbd/0x1aa [28904.177765] [] irq_exit+0x37/0x7c [28904.178026] [] smp_trace_apic_timer_interrupt+0x7b/0x88 [28904.178300] [] smp_apic_timer_interrupt+0x9/0xb [28904.178565] [] apic_timer_interrupt+0x7c/0x90 [28904.178835] [28904.178907] [] ? mwait_idle+0x64/0x7a [28904.179436] [] ? atomic_notifier_call_chain+0x13/0x15 [28904.179712] [] arch_cpu_idle+0xa/0xc [28904.179976] [] default_idle_call+0x27/0x29 [28904.180244] [] cpu_startup_entry+0x11d/0x1c7 [28904.180508] [] start_secondary+0xe8/0xeb [28904.180767] Code: 80 2f 0b 82 48 89 df e8 da 90 84 e1 48 8b 43 10 49 8d 54 24 10 48 8d 4b 10 49 89 4c 24 18 a8 01 49 89 44 24 10 48 89 53 10 75 04 <89> 50 08 c6 03 00 5b 41 5c 5d c3 48 8b 05 10 be 00 00 89 f6 [28904.185546] RIP [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.186065] RSP [28904.186319] CR2: 0008 [28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]--- [28904.186860] Kernel panic - not syncing: Fatal exception in interrupt [28904.187155] Kernel Offset: disabled [28904.187419] Rebooting in 5 seconds.. [28909.193662] ACPI MEMORY or I/O RESET_REG. And here's decodecode's output: All code 0: 80 2f 0bsubb $0xb,(%rdi) 3: 82 (bad) 4: 48 89 dfmov%rbx,%rdi 7: e8 da 90 84 e1 callq 0xe18490e6 c: 48 8b 43 10 mov0x10(%rbx),%rax 10: 49 8d 54 24 10 lea0x10(%r12),%rdx 15: 48 8d 4b 10 lea0x10(%rbx),%rcx 19: 49 89 4c 24 18 mov%rcx,0x18(%r12) 1e: a8 01 test $0x1,%al
Re: 4.9.2 panic, __skb_flow_dissect, gro?
Yes, it is in the list (ixgbe) On 2017-01-11 02:16, Ian Kumlien wrote: Added David Miller to CC since he said it was queued for stable, maybe he can comment On Wed, Jan 11, 2017 at 12:49 AM, Denys Fedoryshchenko wrote: It seems this patch solve issue. I hope it will go to stable asap, because without it loaded routers crashing almost instantly on 4.9. I'm also worried that you could trigger it remotely I suspect the following: intel: fm10k, i40e, i40ev, igb, ixgbe, ixgbevf mellanox: mlx4, mlx5 qlogic: qede since skb_flow_dissect is called by eth_get_headlen in these drivers... My machine was running with igb when it happened, is your network driver in the list? David: Let me know if i can help with the -stable bit in anyway, i've been surprised to see it miss .1 and .2 commit d0af683407a26a4437d8fa6e283ea201f2ae8146 (patch) treee769779cf59b0b7b50a68db5d0b8897a7cb6 /net/core/flow_dissector.c parent 94ba998b63c41e92da1b2f0cd8679e038181ef48 (diff) flow_dissector: Update pptp handling to avoid null pointer deref. __skb_flow_dissect can be called with a skb or a data packet, either can be NULL. All calls seems to have been moved to __skb_header_pointer except the pptp handling which is still calling skb_header_pointer.
Re: 4.9.2 panic, __skb_flow_dissect, gro?
It seems this patch solve issue. I hope it will go to stable asap, because without it loaded routers crashing almost instantly on 4.9. commit d0af683407a26a4437d8fa6e283ea201f2ae8146 (patch) treee769779cf59b0b7b50a68db5d0b8897a7cb6 /net/core/flow_dissector.c parent 94ba998b63c41e92da1b2f0cd8679e038181ef48 (diff) flow_dissector: Update pptp handling to avoid null pointer deref. __skb_flow_dissect can be called with a skb or a data packet, either can be NULL. All calls seems to have been moved to __skb_header_pointer except the pptp handling which is still calling skb_header_pointer. On 2017-01-11 01:26, Denys Fedoryshchenko wrote: Hi, Got panic message on 4.9.2 with latest patches from stable-queue, probably it affects all 4.9 version Panic message: dmesg-erst-6374119981415661569:<6>[ 23.110324] ip_set: protocol 6 dmesg-erst-6374119981415661569:<1>[ 28.117455] BUG: unable to handle kernel NULL pointer dereference at 0078 dmesg-erst-6374119981415661569:<1>[ 28.118036] IP: [] __skb_flow_dissect+0x73f/0x931 dmesg-erst-6374119981415661569:<4>[ 28.118360] PGD 0 dmesg-erst-6374119981415661569:<4>[ 28.118427] dmesg-erst-6374119981415661569:<4>[ 28.118730] Oops: [#1] SMP dmesg-erst-6374119981415661569:<4>[ 28.118977] Modules linked in: xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables 8021q garp mrp stp llc netconsole configfs bonding ixgbe dca ipmi_watchdog ipmi_si acpi_ipmi ipmi_msghandler dmesg-erst-6374119981415661569:<4>[ 28.122784] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.9.2-build-0127 #3 dmesg-erst-6374119981415661569:<4>[ 28.123042] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 dmesg-erst-6374119981415661569:<4>[ 28.123488] task: 882fa6af24c0 task.stack: c90031338000 dmesg-erst-6374119981415661569:<4>[ 28.123742] RIP: 0010:[] [] __skb_flow_dissect+0x73f/0x931 dmesg-erst-6374119981415661569:<4>[ 28.124243] RSP: 0018:882fbfb03ce8 EFLAGS: 00010206 dmesg-erst-6374119981415661569:<4>[ 28.124497] RAX: 0130 RBX: 0022 RCX: 882f9eabb000 dmesg-erst-6374119981415661569:<4>[ 28.124756] RDX: 0010 RSI: 882f9eabb026 RDI: 002f dmesg-erst-6374119981415661569:<4>[ 28.125015] RBP: 882fbfb03d78 R08: 000c R09: 882f9eabb022 dmesg-erst-6374119981415661569:<4>[ 28.125275] R10: 0140 R11: 0001 R12: 0b88 dmesg-erst-6374119981415661569:<4>[ 28.125532] R13: 882fbfb03d9c R14: R15: 820c11a0 dmesg-erst-6374119981415661569:<4>[ 28.125792] FS: () GS:882fbfb0() knlGS: dmesg-erst-6374119981415661569:<4>[ 28.126227] CS: 0010 DS: ES: CR0: 80050033 dmesg-erst-6374119981415661569:<4>[ 28.126482] CR2: 0078 CR3: 00607f007000 CR4: 001406e0 dmesg-erst-6374119981415661569:<4>[ 28.126741] Stack: dmesg-erst-6374119981415661569:<4>[ 28.126983] 882fbfb03cf8 81885afb 0001bfb03d88 818953b5 dmesg-erst-6374119981415661569:<4>[ 28.127675] 882fbfb03d9c 2f08 882f9eabb000 882fbfb03d48 dmesg-erst-6374119981415661569:<4>[ 28.128350] 818ef3e4 882fa4177400 004e dmesg-erst-6374119981415661569:<4>[ 28.129027] Call Trace: dmesg-erst-6374119981415661569:<4>[ 28.129271] dmesg-erst-6374119981415661569:<4>[ 28.129340] [] ? kfree_skb+0x25/0x27 dmesg-erst-6374119981415661569:<4>[ 28.129655] [] ? __netif_receive_skb_core+0x61b/0x807 dmesg-erst-6374119981415661569:<4>[ 28.129917] [] ? udp4_gro_receive+0x1f6/0x256 dmesg-erst-6374119981415661569:<4>[ 28.130174] [] eth_get_headlen+0x4c/0x82 dmesg-erst-6374119981415661569:<4>[ 28.130435] [] ixgbe_clean_rx_irq+0x546/0x924 [ixgbe] dmesg-erst-6374119981415661569:<4>[ 28.130694] [] ixgbe_poll+0x4ef/0x679 [ixgbe] dmesg-erst-6374119981415661569:<4>[ 28.130952] [] net_rx_action+0x107/0x27d dmesg-erst-6374119981415661569:<4>[ 28.131207] [] __do_softirq+0xb5/0x1a3 dmesg-erst-6374119981415661569:<4>[ 28.131460] [] irq_exit+0x4d/0x8e dmesg-erst-6374119981415661569:<4>[ 28.131712] [] do_IRQ+0xaa/0xc2 dmesg-erst-6374119981415661569:<4>[ 28.131965] [] common_interrupt+0x7c/0x7c dmesg-erst-6374119981415661569:<4>[ 28.132217] dmesg-erst-6374119981415661569:<4>[ 28.132286] [] ? mwait_idle+0x4e/0x61 dmesg-erst-637
4.9.2 panic, __skb_flow_dissect, gro?
Hi, Got panic message on 4.9.2 with latest patches from stable-queue, probably it affects all 4.9 version Panic message: dmesg-erst-6374119981415661569:<6>[ 23.110324] ip_set: protocol 6 dmesg-erst-6374119981415661569:<1>[ 28.117455] BUG: unable to handle kernel NULL pointer dereference at 0078 dmesg-erst-6374119981415661569:<1>[ 28.118036] IP: [] __skb_flow_dissect+0x73f/0x931 dmesg-erst-6374119981415661569:<4>[ 28.118360] PGD 0 dmesg-erst-6374119981415661569:<4>[ 28.118427] dmesg-erst-6374119981415661569:<4>[ 28.118730] Oops: [#1] SMP dmesg-erst-6374119981415661569:<4>[ 28.118977] Modules linked in: xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables 8021q garp mrp stp llc netconsole configfs bonding ixgbe dca ipmi_watchdog ipmi_si acpi_ipmi ipmi_msghandler dmesg-erst-6374119981415661569:<4>[ 28.122784] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.9.2-build-0127 #3 dmesg-erst-6374119981415661569:<4>[ 28.123042] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 dmesg-erst-6374119981415661569:<4>[ 28.123488] task: 882fa6af24c0 task.stack: c90031338000 dmesg-erst-6374119981415661569:<4>[ 28.123742] RIP: 0010:[] [] __skb_flow_dissect+0x73f/0x931 dmesg-erst-6374119981415661569:<4>[ 28.124243] RSP: 0018:882fbfb03ce8 EFLAGS: 00010206 dmesg-erst-6374119981415661569:<4>[ 28.124497] RAX: 0130 RBX: 0022 RCX: 882f9eabb000 dmesg-erst-6374119981415661569:<4>[ 28.124756] RDX: 0010 RSI: 882f9eabb026 RDI: 002f dmesg-erst-6374119981415661569:<4>[ 28.125015] RBP: 882fbfb03d78 R08: 000c R09: 882f9eabb022 dmesg-erst-6374119981415661569:<4>[ 28.125275] R10: 0140 R11: 0001 R12: 0b88 dmesg-erst-6374119981415661569:<4>[ 28.125532] R13: 882fbfb03d9c R14: R15: 820c11a0 dmesg-erst-6374119981415661569:<4>[ 28.125792] FS: () GS:882fbfb0() knlGS: dmesg-erst-6374119981415661569:<4>[ 28.126227] CS: 0010 DS: ES: CR0: 80050033 dmesg-erst-6374119981415661569:<4>[ 28.126482] CR2: 0078 CR3: 00607f007000 CR4: 001406e0 dmesg-erst-6374119981415661569:<4>[ 28.126741] Stack: dmesg-erst-6374119981415661569:<4>[ 28.126983] 882fbfb03cf8 81885afb 0001bfb03d88 818953b5 dmesg-erst-6374119981415661569:<4>[ 28.127675] 882fbfb03d9c 2f08 882f9eabb000 882fbfb03d48 dmesg-erst-6374119981415661569:<4>[ 28.128350] 818ef3e4 882fa4177400 004e dmesg-erst-6374119981415661569:<4>[ 28.129027] Call Trace: dmesg-erst-6374119981415661569:<4>[ 28.129271] dmesg-erst-6374119981415661569:<4>[ 28.129340] [] ? kfree_skb+0x25/0x27 dmesg-erst-6374119981415661569:<4>[ 28.129655] [] ? __netif_receive_skb_core+0x61b/0x807 dmesg-erst-6374119981415661569:<4>[ 28.129917] [] ? udp4_gro_receive+0x1f6/0x256 dmesg-erst-6374119981415661569:<4>[ 28.130174] [] eth_get_headlen+0x4c/0x82 dmesg-erst-6374119981415661569:<4>[ 28.130435] [] ixgbe_clean_rx_irq+0x546/0x924 [ixgbe] dmesg-erst-6374119981415661569:<4>[ 28.130694] [] ixgbe_poll+0x4ef/0x679 [ixgbe] dmesg-erst-6374119981415661569:<4>[ 28.130952] [] net_rx_action+0x107/0x27d dmesg-erst-6374119981415661569:<4>[ 28.131207] [] __do_softirq+0xb5/0x1a3 dmesg-erst-6374119981415661569:<4>[ 28.131460] [] irq_exit+0x4d/0x8e dmesg-erst-6374119981415661569:<4>[ 28.131712] [] do_IRQ+0xaa/0xc2 dmesg-erst-6374119981415661569:<4>[ 28.131965] [] common_interrupt+0x7c/0x7c dmesg-erst-6374119981415661569:<4>[ 28.132217] dmesg-erst-6374119981415661569:<4>[ 28.132286] [] ? mwait_idle+0x4e/0x61 dmesg-erst-6374119981415661569:<4>[ 28.132773] [] arch_cpu_idle+0xa/0xc dmesg-erst-6374119981415661569:<4>[ 28.133026] [] default_idle_call+0x20/0x22 dmesg-erst-6374119981415661569:<4>[ 28.133282] [] cpu_startup_entry+0xde/0x185 dmesg-erst-6374119981415661569:<4>[ 28.133539] [] start_secondary+0xe8/0xeb dmesg-erst-6374119981415661569:<4>[ 28.133792] Code: f7 e8 eb 63 ff ff 85 c0 0f 88 d5 01 00 00 44 8b 45 80 48 8d 75 b0 66 44 8b 66 0c 41 83 c0 0e e9 87 00 00 00 41 8d 50 04 66 85 c0 <41> 8b 46 78 44 0f 48 c2 41 2b 46 7c 42 8d 34 03 29 f0 83 f8 03 dmesg-erst-6374119981415661569:<1>[ 28.138401] RIP [] __skb_flow_dissect+0x73f/0x931 dmesg-erst-6374119981415661569:<4>[ 28.138718] RSP dmesg-erst-6374119981415661569:<4>[ 28.138964] CR2: 0078 dmesg-erst-6374119981415661569:<4>[ 28.139215] ---[ end trace 46fb1cf5af272
probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo
Hi, I posted recently several netfilter related crashes, didn't got any answers, one of them started to happen quite often on loaded NAT (17Gbps), so after trying endless ways to make it stable, i found out that in backtrace i can often see timers, and this bug probably appearing on older releases, i've seen such backtrace with timer fired for conntrack on them. I disabled Intel turbo for cpus on this loaded NAT, and voila, panic disappeared for 2nd day! * by wrmsr -a 0x1a0 0x4000850089 I am not sure timers is the reason, but probably turbo creating some condition for bug. Here is examples of backtrace of last reboots (kernel 4.8.14), and same kernel worked perfectly without turbo. Last one also one crash on 4.8.0 that looks painfully similar, on totally different workload, but with conntrack enabled. It happens there much less often, so harder to crash and test by disabling turbo. [28904.162607] BUG: unable to handle kernel NULL pointer dereference at 0008 [28904.163210] IP: [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.163745] PGD 0 [28904.164058] Oops: 0002 [#1] SMP [28904.164323] Modules linked in: nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca [28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 4.8.14-build-0124 #2 [28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015 [28904.168853] task: 885fa42e8c40 task.stack: 885fa42f [28904.169114] RIP: 0010:[] [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246 [28904.169901] RAX: RBX: 885fbccc RCX: 885fbccc0010 [28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 885fbccc [28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 0100 [28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 885f87a1c140 [28904.170971] R13: 0005d948 R14: 000ea942 R15: 885f87a1c160 [28904.171237] FS: () GS:885fbccc() knlGS: [28904.171688] CS: 0010 DS: ES: CR0: 80050033 [28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 001406e0 [28904.172231] Stack: [28904.172482] 885f87a1c140 820a1405 885fbccc3e28 a00abb30 [28904.173182] 0002820a1405 885f87a1c140 885f99a28201 [28904.173884] 820050c8 885fbccc3e58 a00abc62 [28904.174585] Call Trace: [28904.174835] [28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2 [nf_conntrack] [28904.175613] [] nf_ct_delete+0x109/0x12c [nf_conntrack] [28904.175894] [] ? nf_ct_delete+0x12c/0x12c [nf_conntrack] [28904.176169] [] death_by_timeout+0xd/0xf [nf_conntrack] [28904.176443] [] call_timer_fn.isra.5+0x17/0x6b [28904.176714] [] expire_timers+0x6f/0x7e [28904.176975] [] run_timer_softirq+0x69/0x8b [28904.177238] [] ? clockevents_program_event+0xd0/0xe8 [28904.177504] [] __do_softirq+0xbd/0x1aa [28904.177765] [] irq_exit+0x37/0x7c [28904.178026] [] smp_trace_apic_timer_interrupt+0x7b/0x88 [28904.178300] [] smp_apic_timer_interrupt+0x9/0xb [28904.178565] [] apic_timer_interrupt+0x7c/0x90 [28904.178835] [28904.178907] [] ? mwait_idle+0x64/0x7a [28904.179436] [] ? atomic_notifier_call_chain+0x13/0x15 [28904.179712] [] arch_cpu_idle+0xa/0xc [28904.179976] [] default_idle_call+0x27/0x29 [28904.180244] [] cpu_startup_entry+0x11d/0x1c7 [28904.180508] [] start_secondary+0xe8/0xeb [28904.180767] Code: 80 2f 0b 82 48 89 df e8 da 90 84 e1 48 8b 43 10 49 8d 54 24 10 48 8d 4b 10 49 89 4c 24 18 a8 01 49 89 44 24 10 48 89 53 10 75 04 89 50 08 c6 03 00 5b 41 5c 5d c3 48 8b 05 10 be 00 00 89 f6 [28904.185546] RIP [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.186065] RSP [28904.186319] CR2: 0008 [28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]--- [28904.186860] Kernel panic - not syncing: Fatal exception in interrupt [28904.187155] Kernel Offset: disabled [28904.187419] Rebooting in 5 seconds.. [28909.193662] ACPI MEMORY or I/O RESET_REG. [14125.227611] BUG: unable to handle kernel NULL pointer dereference at (null) [14125.228215] IP: [] nf_nat_setup_info+0x6d8/0x755 [nf_nat] [14125.228564] PGD 0 [14125.228882] Oops: [#1] SMP [14125.229146] Modules linked in: nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw ipt
Kernel panic in netfilter 4.8.10 probably on conntrack -L
Hi! I have quite loaded NAT server (approx 17Gbps of traffic) where periodic "conntrack -L" might trigger once per day kernel panic. I am not definitely sure it is triggered exactly at running tool, or just by enabling events. Here is panic message: [221287.380762] general protection fault: [#1] SMP [221287.381029] Modules linked in: xt_rateest xt_RATEEST nf_conntrack_netlink netconsole configfs tun nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables 8021q garp mrp stp llc bonding ixgbe dca [221287.384913] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10-build-0121 #10 [221287.385184] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015 [221287.385634] task: 8200b4c0 task.stack: 8200 [221287.385900] RIP: 0010:[] [] nf_conntrack_eventmask_report+0xba/0x123 [nf_conntrack] [221287.386428] RSP: 0018:882fbf603df8 EFLAGS: 00010202 [221287.386693] RAX: RBX: 882f96a51da8 RCX: [221287.387134] RDX: RSI: 882fbf603e00 RDI: 0004 [221287.387575] RBP: 882fbf603e38 R08: ff81822024ff R09: 0004 [221287.388011] R10: 882fbf603de0 R11: 820050c0 R12: 882f810bf0c0 [221287.388445] R13: R14: R15: 0004 [221287.388877] FS: () GS:882fbf60() knlGS: [221287.389311] CS: 0010 DS: ES: CR0: 80050033 [221287.389567] CR2: 7faff0bd8978 CR3: 02006000 CR4: 001406f0 [221287.389998] Stack: [221287.390238] 00049f292300 882f810bf0c0 882f810bf0c0 [221287.390913] 882f96a51d80 820050c8 [221287.391587] 882fbf603e68 a0098bd3 8100 a0098c85 [221287.392262] Call Trace: [221287.392508] [221287.392579] [] nf_ct_delete+0x7a/0x12c [nf_conntrack] [221287.393082] [] ? nf_ct_delete+0x12c/0x12c [nf_conntrack] [221287.393351] [] death_by_timeout+0xd/0xf [nf_conntrack] [221287.393617] [] call_timer_fn.isra.5+0x17/0x6b [221287.393881] [] expire_timers+0x6f/0x7e [221287.394134] [] run_timer_softirq+0x69/0x8b [221287.394390] [] __do_softirq+0xbd/0x1aa [221287.394643] [] irq_exit+0x37/0x7c [221287.394898] [] smp_trace_call_function_single_interrupt+0x2e/0x30 [221287.395341] [] smp_call_function_single_interrupt+0x9/0xb [221287.395600] [] call_function_single_interrupt+0x7c/0x90 [221287.395857] [221287.395926] [] ? mwait_idle+0x64/0x7a [221287.396413] [] arch_cpu_idle+0xa/0xc [221287.396665] [] default_idle_call+0x27/0x29 [221287.396919] [] cpu_startup_entry+0x11d/0x1c7 [221287.397175] [] rest_init+0x72/0x74 [221287.397428] [] start_kernel+0x3ba/0x3c7 [221287.397681] [] x86_64_start_reservations+0x2a/0x2c [221287.397937] [] x86_64_start_kernel+0x12a/0x135 [221287.402124] Code: f2 89 75 d0 75 04 4c 8b 73 08 0f b7 73 10 41 89 ff 4d 89 f1 4d 09 f9 31 c0 49 85 f1 74 67 41 89 d5 89 7d c4 48 8d 75 c8 44 09 f7 ff 10 89 c2 c1 ea 1f 75 05 4d 85 f6 74 4b 49 83 c4 04 89 45 [221287.406724] RIP [] nf_conntrack_eventmask_report+0xba/0x123 [nf_conntrack] [221287.407234] RSP [221287.407489] ---[ end trace 4b077b9412fc7065 ]--- [221287.407746] Kernel panic - not syncing: Fatal exception in interrupt [221287.408013] Kernel Offset: disabled [221287.408270] Rebooting in 5 seconds.. Dec 5 23:17:58 10.0.253.34 Dec 5 23:17:58 10.0.253.34 [221292.408645] ACPI MEMORY or I/O RESET_REG.
Re: SNAT --random & fully is not actually random for ips
On 2016-11-28 13:29, Pablo Neira Ayuso wrote: On Mon, Nov 28, 2016 at 01:12:07PM +0200, Denys Fedoryshchenko wrote: On 2016-11-28 13:06, Pablo Neira Ayuso wrote: >Why does your patch reverts NF_NAT_RANGE_PROTO_RANDOM_FULLY? Ops, sorry i just did mistake with files, actually it is in reverse ( did this patch, and it worked properly with it, with random source ip). Oh, I see 8) --- nf_nat_core.c 2016-11-21 09:11:59.0 + +++ nf_nat_core.c.new 2016-11-28 09:55:54.0 + @@ -282,9 +282,13 @@ * client coming from the same IP (some Internet Banking sites * like this), even across reboots. */ - j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), + if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { + j = prandom_u32(); + } else { + j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), range->flags & NF_NAT_RANGE_PERSISTENT ? 0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id); + } full_range = false; for (i = 0; i <= max; i++) { This is current situation, RANDOM_FULLY actually does prandom_u32 for source port only, but not for IP. IP kept as persistent and kind of predictable, because hash function based on source ip. Sure i did tried to specify any combination of flags, but looking to "find_best_ips_proto" function, it wont have any effect. IIRC the original intention on random-fully was to cover only ports. Did you interpret from git history otherwise? Otherwise, safe procedure is to add a new flag. No, seems i didnt read man page well, sorry. I will check it, maybe will try to add new option and submit a patch, still studying impact on "balancing" with this change, seems it works great. But not really sure such thing needed for someone else, actually some might have privacy concerns as well, and can use such option for privacy.
Re: SNAT --random & fully is not actually random for ips
On 2016-11-28 13:06, Pablo Neira Ayuso wrote: On Mon, Nov 28, 2016 at 12:45:59PM +0200, Denys Fedoryshchenko wrote: Hello, I noticed that if i specify -j SNAT with options --random --random-fully still it keeps persistence for source IP. So you specify both? Actually truly random src ip required in some scenarios like links balanced by IPs, but seems since 2012 at least it is not possible. But actually if i do something like: --- nf_nat_core.c.new 2016-11-28 09:55:54.0 + +++ nf_nat_core.c 2016-11-21 09:11:59.0 + @@ -282,13 +282,9 @@ * client coming from the same IP (some Internet Banking sites * like this), even across reboots. */ - if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { - j = prandom_u32(); - } else { - j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), + j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), range->flags & NF_NAT_RANGE_PERSISTENT ? 0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id); - } full_range = false; for (i = 0; i <= max; i++) { It works as intended. But i guess to not break compatibility it is better should be introduced as new option? Or maybe there is no really need for such option? Why does your patch reverts NF_NAT_RANGE_PROTO_RANDOM_FULLY? Ops, sorry i just did mistake with files, actually it is in reverse ( did this patch, and it worked properly with it, with random source ip). --- nf_nat_core.c 2016-11-21 09:11:59.0 + +++ nf_nat_core.c.new 2016-11-28 09:55:54.0 + @@ -282,9 +282,13 @@ * client coming from the same IP (some Internet Banking sites * like this), even across reboots. */ - j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), + if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { + j = prandom_u32(); + } else { + j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), range->flags & NF_NAT_RANGE_PERSISTENT ? 0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id); + } full_range = false; for (i = 0; i <= max; i++) { This is current situation, RANDOM_FULLY actually does prandom_u32 for source port only, but not for IP. IP kept as persistent and kind of predictable, because hash function based on source ip. Sure i did tried to specify any combination of flags, but looking to "find_best_ips_proto" function, it wont have any effect.
SNAT --random & fully is not actually random for ips
Hello, I noticed that if i specify -j SNAT with options --random --random-fully still it keeps persistence for source IP. Actually truly random src ip required in some scenarios like links balanced by IPs, but seems since 2012 at least it is not possible. But actually if i do something like: --- nf_nat_core.c.new 2016-11-28 09:55:54.0 + +++ nf_nat_core.c 2016-11-21 09:11:59.0 + @@ -282,13 +282,9 @@ * client coming from the same IP (some Internet Banking sites * like this), even across reboots. */ - if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) { - j = prandom_u32(); - } else { - j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), + j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32), range->flags & NF_NAT_RANGE_PERSISTENT ? 0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id); - } full_range = false; for (i = 0; i <= max; i++) { It works as intended. But i guess to not break compatibility it is better should be introduced as new option? Or maybe there is no really need for such option?
Re: kernel panic TPROXY , vanilla 4.7.1
On 2016-08-17 19:04, Eric Dumazet wrote: On Wed, 2016-08-17 at 08:42 -0700, Eric Dumazet wrote: On Wed, 2016-08-17 at 17:31 +0300, Denys Fedoryshchenko wrote: > Hi! > > Tried to run squid on latest kernel, and hit a panic > Sometimes it just shows warning in dmesg (but doesnt work properly) > [ 75.701666] IPv4: Attempt to release TCP socket in state 10 > 88102d430780 > [ 83.866974] squid (2700) used greatest stack depth: 12912 bytes left > [ 87.506644] IPv4: Attempt to release TCP socket in state 10 > 880078a48780 > [ 114.704295] IPv4: Attempt to release TCP socket in state 10 > 881029f8ad00 > > I cannot catch yet oops/panic message, netconsole not working. > > After triggering warning message 3 times, i am unable to run squid > anymore (without reboot), and in netstat it doesnt show port running. > > firewall is: > *mangle > -A PREROUTING -p tcp -m socket -j DIVERT > -A PREROUTING -p tcp -m tcp --dport 80 -i eno1 -j TPROXY --on-port 3129 > --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1 > -A DIVERT -j MARK --set-xmark 0x1/0x > -A DIVERT -j ACCEPT > > routing > ip rule add fwmark 1 lookup 100 > ip route add local default dev eno1 table 100 > > > squid config is default with tproxy option > http_port 3129 tproxy > Hmppff... sorry for this, I will send a fix. Thanks for the report ! Could you try the following ? Thanks ! net/netfilter/xt_TPROXY.c |4 1 file changed, 4 insertions(+) diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c index 7f4414d26a66..663c4c3c9072 100644 --- a/net/netfilter/xt_TPROXY.c +++ b/net/netfilter/xt_TPROXY.c @@ -127,6 +127,8 @@ nf_tproxy_get_sock_v4(struct net *net, struct sk_buff *skb, void *hp, daddr, dport, in->ifindex); + if (sk && !atomic_inc_not_zero(&sk->sk_refcnt)) + sk = NULL; /* NOTE: we return listeners even if bound to * 0.0.0.0, those are filtered out in * xt_socket, since xt_TPROXY needs 0 bound @@ -195,6 +197,8 @@ nf_tproxy_get_sock_v6(struct net *net, struct sk_buff *skb, int thoff, void *hp, daddr, ntohs(dport), in->ifindex); + if (sk && !atomic_inc_not_zero(&sk->sk_refcnt)) + sk = NULL; /* NOTE: we return listeners even if bound to * 0.0.0.0, those are filtered out in * xt_socket, since xt_TPROXY needs 0 bound Yes, everything fine after patch! Thanks a lot
kernel panic TPROXY , vanilla 4.7.1
Hi! Tried to run squid on latest kernel, and hit a panic Sometimes it just shows warning in dmesg (but doesnt work properly) [ 75.701666] IPv4: Attempt to release TCP socket in state 10 88102d430780 [ 83.866974] squid (2700) used greatest stack depth: 12912 bytes left [ 87.506644] IPv4: Attempt to release TCP socket in state 10 880078a48780 [ 114.704295] IPv4: Attempt to release TCP socket in state 10 881029f8ad00 I cannot catch yet oops/panic message, netconsole not working. After triggering warning message 3 times, i am unable to run squid anymore (without reboot), and in netstat it doesnt show port running. firewall is: *mangle -A PREROUTING -p tcp -m socket -j DIVERT -A PREROUTING -p tcp -m tcp --dport 80 -i eno1 -j TPROXY --on-port 3129 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1 -A DIVERT -j MARK --set-xmark 0x1/0x -A DIVERT -j ACCEPT routing ip rule add fwmark 1 lookup 100 ip route add local default dev eno1 table 100 squid config is default with tproxy option http_port 3129 tproxy
Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit
On 2016-08-09 00:05, Guillaume Nault wrote: On Mon, Aug 08, 2016 at 02:25:00PM +0300, Denys Fedoryshchenko wrote: On 2016-08-01 23:59, Guillaume Nault wrote: > Do you still have the vmlinux file with debug symbols that generated > this panic? Sorry for delay, i didn't had same image on all servers and probably i found cause of panic, but still testing on several servers. If i remove SFQ qdisc from ppp shapers, servers not rebooting anymore. Thanks for the feedback. I wonder which interactions between SFQ and PPP can lead to this problem. I'll take a look. But still i need around 2 days to make sure that's the reason. Okay, just let me know if you can confirm that removing SFQ really solves the problem. After long testing, i can confirm removing sfq from rules decreased panic reboot greatly, tested on many different servers. I will try today to do some stress tests, to apply on live system at night sfq qdiscs, then remove them. Then i will try also to disconnect all users with sfq qdiscs attached. Not sure it will help to reproduce the bug, but worth to try. Still i am hitting once per week some different conntrack bug, sand thats why i was confused, i was getting clearly panics in conntrack and then something else, i was not sure if it is different bugs, hardware glitch or something else.
Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit
On 2016-08-01 23:59, Guillaume Nault wrote: Do you still have the vmlinux file with debug symbols that generated this panic? Sorry for delay, i didn't had same image on all servers and probably i found cause of panic, but still testing on several servers. If i remove SFQ qdisc from ppp shapers, servers not rebooting anymore. But still i need around 2 days to make sure that's the reason.
Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit
On 2016-08-01 23:59, Guillaume Nault wrote: On Thu, Jul 28, 2016 at 02:28:23PM +0300, Denys Fedoryshchenko wrote: [ 5449.904989] CPU: 1 PID: 6359 Comm: ip Not tainted 4.7.0-build-0109 #2 [ 5449.905255] Hardware name: Supermicro X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0 04/24/2015 [ 5449.905712] task: 8803eef4 ti: 8803fd754000 task.ti: 8803fd754000 [ 5449.906168] RIP: 0010:[] [] inet_fill_ifaddr+0x5a/0x264 [ 5449.906710] RSP: 0018:8803fd757b98 EFLAGS: 00010286 [ 5449.906976] RAX: 8803ef65cb90 RBX: 8803f7d2cd00 RCX: [ 5449.907248] RDX: 00080002 RSI: 8803ef65cb90 RDI: 8803ef65cba8 [ 5449.907519] RBP: 8803fd757be0 R08: 0008 R09: 0002 [ 5449.907792] R10: ffa005040269f480 R11: 820a1c00 R12: ffa005040269f480 [ 5449.908067] R13: 8803ef65cb90 R14: R15: 8803f7d2cd00 [ 5449.908339] FS: 7f660674d700() GS:88041fc4() knlGS: [ 5449.908796] CS: 0010 DS: ES: CR0: 80050033 [ 5449.909067] CR2: 008b9018 CR3: 0003f2a11000 CR4: 001406e0 [ 5449.909339] Stack: [ 5449.909598] 0163a8c0869711ac 0080 0003e1d50003e1d5 [ 5449.910329] 8800d54c0ac8 8803f0d9 0005 [ 5449.911066] 8803f7d2cd00 8803fd757c40 818a9f73 820a1c00 [ 5449.911803] Call Trace: [ 5449.912061] [] inet_dump_ifaddr+0xfb/0x185 [ 5449.912332] [] rtnl_dump_all+0xa9/0xc2 [ 5449.912601] [] netlink_dump+0xf0/0x25c [ 5449.912873] [] netlink_recvmsg+0x1a9/0x2d3 [ 5449.913142] [] sock_recvmsg+0x14/0x16 [ 5449.913407] [] ___sys_recvmsg+0xea/0x1a1 [ 5449.913675] [] ? alloc_pages_vma+0x167/0x1a0 [ 5449.913945] [] ? page_add_new_anon_rmap+0xb4/0xbd [ 5449.914212] [] ? lru_cache_add_active_or_unevictable+0x31/0x9d [ 5449.914664] [] ? handle_mm_fault+0x632/0x112d [ 5449.914940] [] ? vma_merge+0x27e/0x2b1 [ 5449.915208] [] __sys_recvmsg+0x3d/0x5e [ 5449.915478] [] ? __sys_recvmsg+0x3d/0x5e [ 5449.915747] [] SyS_recvmsg+0xd/0x17 [ 5449.916017] [] entry_SYSCALL_64_fastpath+0x17/0x93 Do you still have the vmlinux file with debug symbols that generated this panic? I have slightly different build now (tried to enable slightly different kernel options), but i had also new panic in inet_fill_ifaddr in new build. I will prepare tomorrow(everything at office) all files and provide link with sources and vmlinux, and sure new panic message on this build. New panic message happened on completely different location and ISP.
Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit
On 2016-07-28 14:09, Guillaume Nault wrote: On Tue, Jul 12, 2016 at 10:31:18AM -0700, Cong Wang wrote: On Mon, Jul 11, 2016 at 12:45 PM, wrote: > Hi > > On latest kernel i noticed kernel panic happening 1-2 times per day. It is > also happening on older kernel (at least 4.5.3). > ... > [42916.426463] Call Trace: > [42916.426658] > > [42916.426719] [] skb_push+0x36/0x37 > [42916.427111] [] ppp_start_xmit+0x10f/0x150 > [ppp_generic] > [42916.427314] [] dev_hard_start_xmit+0x25a/0x2d3 > [42916.427516] [] ? > validate_xmit_skb.isra.107.part.108+0x11d/0x238 > [42916.427858] [] sch_direct_xmit+0x89/0x1b5 > [42916.428060] [] __qdisc_run+0x133/0x170 > [42916.428261] [] net_tx_action+0xe3/0x148 > [42916.428462] [] __do_softirq+0xb9/0x1a9 > [42916.428663] [] irq_exit+0x37/0x7c > [42916.428862] [] smp_apic_timer_interrupt+0x3d/0x48 > [42916.429063] [] apic_timer_interrupt+0x7c/0x90 Interesting, we call a skb_cow_head() before skb_push() in ppp_start_xmit(), I have no idea why this could happen. The skb is corrupted: head is at 8800b0bf2800 while data is at ffa00500b0bf284c. Figuring out how this corruption happened is going to be hard without a way to reproduce the problem. Denys, can you confirm you're using a vanilla kernel? Also I guess the ppp devices and tc settings are handled by accel-ppp. If so, can you share more info about your setup (accel-ppp.conf, radius attributes, iptables...) so that I can try to reproduce it on my machines? I have slight modification from vanilla: --- linux/net/sched/sch_htb.c 2016-06-08 01:23:53.0 + +++ linux-new/net/sched/sch_htb.c 2016-06-21 14:03:08.398486593 + @@ -1495,10 +1495,10 @@ cl->common.classid); cl->quantum = 1000; } - if (!hopt->quantum && cl->quantum > 20) { + if (!hopt->quantum && cl->quantum > 200) { pr_warn("HTB: quantum of class %X is big. Consider r2q change.\n", cl->common.classid); - cl->quantum = 20; + cl->quantum = 200; } if (hopt->quantum) cl->quantum = hopt->quantum; But i guess it should not be reason of crash (it is related to another system, without it i was unable to shape over 7Gbps, maybe with latest kernel i will not need this patch). I'm trying to make reproducible conditions of crash, because right now it happens only on some servers in large networks (completely different ISPs, so i excluded possible hardware fault of specific server). It is complex config, i have accel-ppp, plus my own "shaping daemon" that apply several shapers on ppp interfaces. Wost thing it happens only on live customers, i am unable to reproduce same on stress tests. Also until recent kernel i was getting different panic messages (but all related to ppp). I think also at least one reason of crash also was fixed by "ppp: defer netns reference release for ppp channel" in 4.7.0 (maybe thats why i am getting less crashes recently). I tried also various kernel debug options that doesn't cause major performance degradation (locks checking, freed memory poisoning and etc), without any luck yet. Is it useful if i will post panics that at least occurs twice? (I will post below example, got recently) Sure if i will be able to reproducible conditions i will send them immediately. [ 5449.900988] general protection fault: [#1] SMP [ 5449.901263] Modules linked in: cls_fw act_police cls_u32 sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc netconsole configfs xt_nat ts_bm xt_string xt_connmark xt_TCPMSS xt_tcpudp xt_mark iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 8021q garp mrp stp llc ixgbe dca [ 5449.904989] CPU: 1 PID: 6359 Comm: ip Not tainted 4.7.0-build-0109 #2 [ 5449.905255] Hardware name: Supermicro X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0 04/24/2015 [ 5449.905712] task: 8803eef4 ti: 8803fd754000 task.ti: 8803fd754000 [ 5449.906168] RIP: 0010:[] [] inet_fill_ifaddr+0x5a/0x264 [ 5449.906710] RSP: 0018:8803fd757b98 EFLAGS: 00010286 [ 5449.906976] RAX: 8803ef65cb90 RBX: 8803f7d2cd00 RCX: [ 5449.907248] RDX: 00080002 RSI: 8803ef65cb90 RDI: 8803ef65cba8 [ 5449.907519] RBP: 8803fd757be0 R08: 0008 R09: 0002 [ 5449.907792] R10: ffa005040269f480 R11: 820a1c00 R12: ffa005040269f480 [ 5449.908067] R13: 8803ef65cb90 R14: R15: 8803f7d2cd00 [ 5449.908339] FS: 7f660674d700() GS:88041fc4() knlGS: [ 5449.908796] CS: 0010 DS: ES: CR0: 80050033 [ 5449.909067] CR2: 008b9018 CR3: 0003f2a11000 CR4: 0
Re: kernel panic, __neigh_notify, 4.7.0-rc7, Workqueue: events_power_efficient neigh_periodic_work
On 2016-07-24 21:40, nuclear...@nuclearcat.com wrote: Different hardware, but same workload. Seems different bug, happened at least twice on this unit (both kernel panic messages here) As additional sidenote, that might be useful (found in commits, that proxy arp might induce this bug, such as in commit "net/neighbour: fix crash at dumping device-agnostic proxy entries"): it is pppoe server with proxy_arp running on it
Re: kernel panic in 4.2.3, rb_erase in sch_fq
I can confirm, after patch this issue never appeared again. So maybe good to push it to stable and etc :) Thanks a lot Eric, you saved me again. Still i have some weird panic issues, maybe related to conntrack, but they are rare even on high load, so i am slowly gathering data, and i found at least one more person with similar conntrack crashes on latest kernels. On 2015-11-04 06:46, Eric Dumazet wrote: On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: On 2015-11-04 00:06, Cong Wang wrote: > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > wrote: >> Hi! >> >> Actually seems i was getting this panic for a while (once per week) on >> loaded pppoe server, but just now was able to get full panic message. >> After checking commit logs on sch_fq.c i didnt seen any fixes, so >> probably >> upgrading to newer kernel wont help? > > > Can you share your `tc qdisc show dev ` with us? And how to > reproduce > it? I tried to setup htb+fq and then flip the interface back and forth > but I don't > see any crash. My guess it wont be easy to reproduce, it is happening on box with 4.5k interfaces, that constantly create/delete interfaces, and even with that this problem may happen once per day, or may not happen for 1 week. Here is script that is being fired after new ppp interface detected. But pppoe process are independent from process that are "establishing" shapers. It is probably a generic bug. sch_fq seems OK to me. Somehow nobody tries to change qdisc hundred times per second ;) Could you try following patch ? It seems to 'fix' the issue for me. diff --git a/net/core/dev.c b/net/core/dev.c index 8ce3f74cd6b9..bf136103bc7b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, spin_lock(&q->busylock); spin_lock(root_lock); + if (unlikely(q != rcu_dereference_bh(txq->qdisc))) { + pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", q->state); + kfree_skb(skb); + rc = NET_XMIT_DROP; + goto end; + } if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { kfree_skb(skb); rc = NET_XMIT_DROP; @@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, __qdisc_run(q); } } +end: spin_unlock(root_lock); if (unlikely(contended)) spin_unlock(&q->busylock); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.3.0, neighbour: arp_cache: neighbor table overflow! and panic
Hi I have several pppoe servers running under older kernels, and upgraded two of them to 4.3.0 After that, one of them randomly rebooting and stacktrace always different. Also i noticed message appearing, that didnt exist before on older kernels, appearing on both now: "neighbour: arp_cache: neighbor table overflow!" At ip neigh i didnt noticed anything suspicious, there is less than 10 entries, but there is quite a lot arp requests on eth0 (irrelevant to this host), that may cause some issues. Here is panic messages caught over netconsole: [151784.835507] general protection fault: [#1] SMP [151784.836049] Modules linked in: act_skbedit sch_fq cls_fw act_police cls_u32 sch_ingress sch_sfq sch_htb netconsole configfs pppoe pppox ppp_generic slhc xt_nat ts_bm xt_string xt_connmark xt_TCPMSS xt_tcpudp xt_mark iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 8021q garp mrp stp llc [151784.840667] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 4.3.0-build-0087 #3 [151784.841014] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012 [151784.841575] task: 88042d29dc00 ti: 88042d2c4000 task.ti: 88042d2c4000 [151784.847603] RIP: 0010:[] [] nf_ct_delete+0x28/0x20e [nf_conntrack] [151784.848421] RSP: 0018:88042f0a3e80 EFLAGS: 00010246 [151784.848797] RAX: ffa2050402d3ab00 RBX: 8803d5087368 RCX: dead0200 [151784.849421] RDX: RSI: RDI: 8803d5087368 [151784.850090] RBP: 88042f0a3ec8 R08: 88042f0a3f08 R09: 0100 [151784.850736] R10: 2710 R11: 0020 R12: a0045380 [151784.851389] R13: 0065 R14: R15: [151784.852096] FS: () GS:88042f0a() knlGS: [151784.852791] CS: 0010 DS: ES: CR0: 80050033 [151784.853204] CR2: 7fec804efcbc CR3: 0200c000 CR4: 000406e0 [151784.853752] Stack: [151784.854110] 88042b074400 1f3c94e2cf92 0001144267c3b61c 172f18fced27 [151784.855136] 8100 a0045380 0065 88042d2c8000 [151784.856203] 0100 88042f0a3ed8 a004538d 88042f0a3ef8 [151784.857224] Call Trace: [151784.857608] [151784.857717] [] ? nf_ct_delete+0x20e/0x20e [nf_conntrack] [151784.858432] [] death_by_timeout+0xd/0xf [nf_conntrack] [151784.858807] [] call_timer_fn.isra.26+0x17/0x6d [151784.859238] [] run_timer_softirq+0x172/0x193 [151784.859630] [] __do_softirq+0xba/0x1a9 [151784.859985] [] irq_exit+0x37/0x7c [151784.860380] [] smp_apic_timer_interrupt+0x3d/0x48 [151784.860774] [] apic_timer_interrupt+0x7c/0x90 [151784.861144] [151784.861272] [] ? mwait_idle+0x68/0x7e [151784.862033] [] ? atomic_notifier_call_chain+0x13/0x15 [151784.862409] [] arch_cpu_idle+0xa/0xc [151784.862754] [] default_idle_call+0x27/0x29 [151784.863127] [] cpu_startup_entry+0x121/0x1da [151784.863518] [] start_secondary+0xe7/0xea [151784.863893] Code: 5f 5d c3 55 48 89 e5 41 57 41 89 d7 41 56 41 89 f6 41 55 41 54 53 48 89 fb 48 83 ec 20 48 8b 87 c8 00 00 00 48 85 c0 74 0c 31 d2 83 78 1c 00 0f 95 c2 eb 02 31 d2 85 d2 74 1e 44 0f b7 60 1c [151784.871213] RIP [] nf_ct_delete+0x28/0x20e [nf_conntrack] [151784.871692] RSP [151784.872062] ---[ end trace 54f9b78db1dfe968 ]--- [151784.886584] Kernel panic - not syncing: Fatal exception in interrupt [151784.886981] Kernel Offset: disabled [151784.922664] Rebooting in 5 seconds.. 10.0.253.10 [ 1722.079874] general protection fault: [#1] 10.0.253.10 SMP 10.0.253.10 10.0.253.10 [ 1722.080366] Modules linked in: 10.0.253.10 act_skbedit 10.0.253.10 sch_fq 10.0.253.10 cls_fw 10.0.253.10 act_police 10.0.253.10 cls_u32 10.0.253.10 sch_ingress 10.0.253.10 sch_sfq 10.0.253.10 sch_htb 10.0.253.10 netconsole 10.0.253.10 configfs 10.0.253.10 pppoe 10.0.253.10 pppox 10.0.253.10 ppp_generic 10.0.253.10 slhc 10.0.253.10 xt_nat 10.0.253.10 ts_bm 10.0.253.10 xt_string 10.0.253.10 xt_connmark 10.0.253.10 xt_TCPMSS 10.0.253.10 xt_tcpudp 10.0.253.10 xt_mark 10.0.253.10 iptable_filter 10.0.253.10 iptable_nat 10.0.253.10 nf_conntrack_ipv4 10.0.253.10 nf_defrag_ipv4 10.0.253.10 nf_nat_ipv4 10.0.253.10 nf_nat 10.0.253.10 nf_conntrack 10.0.253.10 iptable_mangle 10.0.253.10 ip_tables 10.0.253.10 x_tables 10.0.253.10 8021q 10.0.253.10 garp 10.0.253.10 mrp 10.0.253.10 stp 10.0.253.10 llc 10.0.253.10 10.0.253.10 [ 1722.085568] CPU: 19 PID: 103 Comm: ksoftirqd/19 Not tainted 4.3.0-build-0087 #3 10.0.253.10 [ 1722.086291] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012 10.0.253.10 [ 1722.087011] tas
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-04 06:58, Eric Dumazet wrote: On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote: On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: > On 2015-11-04 00:06, Cong Wang wrote: > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > > wrote: > >> Hi! > >> > >> Actually seems i was getting this panic for a while (once per week) on > >> loaded pppoe server, but just now was able to get full panic message. > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so > >> probably > >> upgrading to newer kernel wont help? > > > > > > Can you share your `tc qdisc show dev ` with us? And how to > > reproduce > > it? I tried to setup htb+fq and then flip the interface back and forth > > but I don't > > see any crash. > My guess it wont be easy to reproduce, it is happening on box with 4.5k > interfaces, that constantly create/delete interfaces, > and even with that this problem may happen once per day, or may not > happen for 1 week. > > Here is script that is being fired after new ppp interface detected. But > pppoe process are independent from > process that are "establishing" shapers. It is probably a generic bug. sch_fq seems OK to me. Somehow nobody tries to change qdisc hundred times per second ;) Could you try following patch ? It seems to 'fix' the issue for me. Following patch would be more appropriate. Prior one was meant to 'show' the issue. diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cb5d4ad32946..7f5f3e8a10f5 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, spin_lock_bh(root_lock); /* Prune old scheduler */ - if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1) - qdisc_reset(oqdisc); - + if (oqdisc) { + if (atomic_read(&oqdisc->refcnt) <= 1) + qdisc_reset(oqdisc); + set_bit(__QDISC_STATE_DEACTIVATED, &oqdisc->state); + } /* ... and graft new one */ if (qdisc == NULL) qdisc = &noop_qdisc; Applied, will test it, but this bug might be triggered rarely. I will try to push it to more pppoe servers in order to stress test them (and 4.3) more. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values
On 2015-11-04 06:28, Eric Dumazet wrote: On Wed, 2015-11-04 at 06:12 +0200, Denys Fedoryshchenko wrote: Just enabling gro or gso (or together) is fine there. Thanks for advice. Seems only tso causing problems. Also i guess if i keep tso disabled, it will solve my MTU issues (i had once issue, that traffic heading to pppoe users, who have 14xx mtu, was blocked, when offloading enabled on transit server, but can't reproduce it quickly again). Should i try to report to e1000e maintainers this bug? On similar setup it is happening only at specific locations, but i am not definitely sure what can be the reason. Not sure, have you tried per chance latest kernel (linux-4.3) for this e1000e issue ? Are you using vlan tags on this NIC ? Tested now, can be reproduced on 4.3 as well. What is interesting, if i enable tso alone, and leave gso/gro off - it is working fine. gso+gro on, tso off - fine also. But if i enable them all together - i trigger the bug. [ 71.699687] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [ 71.699687] TDH <96> [ 71.699687] TDT <9c> [ 71.699687] next_to_use <9c> [ 71.699687] next_to_clean<92> [ 71.699687] buffer_info[next_to_clean]: [ 71.699687] time_stamp [ 71.699687] next_to_watch<96> [ 71.699687] jiffies [ 71.699687] next_to_watch.status <0> [ 71.699687] MAC Status <40080083> [ 71.699687] PHY Status <796d> [ 71.699687] PHY 1000BASE-T Status <3800> [ 71.699687] PHY Extended Status<3000> [ 71.699687] PCI Status <10> [ 73.699241] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [ 73.699241] TDH <96> [ 73.699241] TDT <9c> [ 73.699241] next_to_use <9c> [ 73.699241] next_to_clean<92> [ 73.699241] buffer_info[next_to_clean]: [ 73.699241] time_stamp [ 73.699241] next_to_watch<96> [ 73.699241] jiffies [ 73.699241] next_to_watch.status <0> [ 73.699241] MAC Status <40080083> [ 73.699241] PHY Status <796d> [ 73.699241] PHY 1000BASE-T Status <3800> [ 73.699241] PHY Extended Status<3000> [ 73.699241] PCI Status <10> [ 75.698775] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [ 75.698775] TDH <96> [ 75.698775] TDT <9c> [ 75.698775] next_to_use <9c> [ 75.698775] next_to_clean<92> [ 75.698775] buffer_info[next_to_clean]: [ 75.698775] time_stamp [ 75.698775] next_to_watch<96> [ 75.698775] jiffies [ 75.698775] next_to_watch.status <0> [ 75.698775] MAC Status <40080083> [ 75.698775] PHY Status <796d> [ 75.698775] PHY 1000BASE-T Status <3800> [ 75.698775] PHY Extended Status<3000> [ 75.698775] PCI Status <10> [ 76.709871] [ cut here ] [ 76.710075] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x17c/0x1e2() [ 76.710383] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out [ 76.710572] Modules linked in: xt_CLASSIFY xt_set ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_recent ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_tcpudp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre ip_set_hash_net ip_set nfnetlink iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables act_nat cls_u32 sch_ingress [ 76.713354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-build-0087 #1 [ 76.713547] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0041.120520121743 12/05/2012 [ 76.713868] 88042f003e08 81259d1d 88042f003e50 [ 76.714413] 88042f003e40 810bda73 818654a3 88042c29 [ 76.714946] 8800be758c00 0001 88042f003ea0 [ 76.715481] Call Trace: [ 76.715657][] dump_stack+0x44/0x55 [ 76.715908] [] warn_slowpath_common+0x95/0xae [ 76.716095] [] ? dev_watchdog+0x17c/0x1e2 [ 76.716281] [] warn_slowpath_fmt+0x47/0x49 [ 76.716470] [] ? mod_timer_pinned+0xaf/0xbe [ 76.716662] [] dev_watchdog+0x17c/0x1e2 [ 76.716850] [] ? dev_graft_qdisc+0x65/0x65 [ 76.717039] [] call_timer_fn.isra.26+0x17/0x6d [ 76.717227] [] run_timer_softirq+0x172/0x193 [ 76.717418] [] __do_softirq+0xba/0x1a9 [ 76.717606] [] irq_exit+0x37/0x7c [ 76.717795] [] smp_apic_timer_interrupt+0x3d/0x48 [ 76.717988] [] apic_timer_interrupt+0x7c/0x90 [ 76.7181
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-04 00:06, Cong Wang wrote: On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko wrote: Hi! Actually seems i was getting this panic for a while (once per week) on loaded pppoe server, but just now was able to get full panic message. After checking commit logs on sch_fq.c i didnt seen any fixes, so probably upgrading to newer kernel wont help? Can you share your `tc qdisc show dev ` with us? And how to reproduce it? I tried to setup htb+fq and then flip the interface back and forth but I don't see any crash. My guess it wont be easy to reproduce, it is happening on box with 4.5k interfaces, that constantly create/delete interfaces, and even with that this problem may happen once per day, or may not happen for 1 week. Here is script that is being fired after new ppp interface detected. But pppoe process are independent from process that are "establishing" shapers. /sbin/tc qdisc del root /sbin/tc qdisc add handle 1: root htb default 3 /sbin/tc filter add parent 1:0 protocol ip prio 4 handle 1 fw flowid 1:3 /sbin/tc filter add parent 1:0 protocol ip prio 3 u32 match ip protocol 6 0xff match ip src 10.0.252.8/32 flowid 1:3/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 1 0xff flowid 1:0 /sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 6 0xff match ip sport 80 0x flowid 1:4 /sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 6 0xff match ip sport 443 0x flowid 1:5 /sbin/tc filter add parent 1:0 protocol ip prio 100 u32 match u32 0 0 flowid 1:2 /sbin/tc class add classid 1:1 parent 1:0 htb rate 512Kbit ceil 512Kbit. /sbin/tc class add classid 1:2 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc class add classid 1:3 parent 1:0 htb rate 10Mbit ceil 10Mbit /sbin/tc class add classid 1:4 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc class add classid 1:5 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc qdisc add parent 1:2 fq limit 300 /sbin/tc qdisc add parent 1:3 pfifo limit 300 /sbin/tc qdisc add parent 1:4 fq limit 300 /sbin/tc qdisc add parent 1:5 fq limit 300 Possible cases come to my mind (but maybe i missed others): Script and tc working and interface are deleted in a process (e.g. interface disappears) Script deleting root while there is heavy traffic on interface and a lot of packets queued ppp interface destroyed, while there is a lot of traffic queued on it (this one a bit rare situation) Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values
On 2015-11-03 23:23, Eric Dumazet wrote: On Tue, 2015-11-03 at 22:24 +0200, Denys Fedoryshchenko wrote: I wont argue on that, you are right. Ok, then it is a bit offtopic in current case, different setup, but i know this one has easy to reproduce issues with offloading. but this is bug related to that, directly appearing when i enable tso/gso/gro. I am losing access to remote box, so max i can do right now: ethtool -K eth0 tso on gso on gro on; sleep 5;ethtool -K eth0 tso off gso off gro off No shapers, just plain nat. I suspect it might be specific to network card, but not sure. What happens if you enable gro, but disable tso ? With GRO enabled, you'll get a good performance increase, as forwarding and qdisc will use big packets. Just enabling gro or gso (or together) is fine there. Thanks for advice. Seems only tso causing problems. Also i guess if i keep tso disabled, it will solve my MTU issues (i had once issue, that traffic heading to pppoe users, who have 14xx mtu, was blocked, when offloading enabled on transit server, but can't reproduce it quickly again). Should i try to report to e1000e maintainers this bug? On similar setup it is happening only at specific locations, but i am not definitely sure what can be the reason. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values
On 2015-11-03 21:49, Eric Dumazet wrote: Well, I am telling you. Say no to people advising to turn off GRO/TSO. If you were the guy adviding others to do so, it is time to see the light. Lets fix the bugs if any, instead of spreading disinformation. I am so tired of telling these very simple facts guys. If you prefer, continue to work on linux-2.0 but don't ask help on netdev. I wont argue on that, you are right. Ok, then it is a bit offtopic in current case, different setup, but i know this one has easy to reproduce issues with offloading. but this is bug related to that, directly appearing when i enable tso/gso/gro. I am losing access to remote box, so max i can do right now: ethtool -K eth0 tso on gso on gro on; sleep 5;ethtool -K eth0 tso off gso off gro off No shapers, just plain nat. I suspect it might be specific to network card, but not sure. 4.1.4 02:00.0 "Class 0200" "8086" "10d3" "8086" "357a" driver: e1000e version: 2.3.2-k firmware-version: 0.13-4 bus-info: :00:19.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no But after that messages, honestly i don't know where to dig. [6606122.904234] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [6606122.904234] TDH [6606122.904234] TDT [6606122.904234] next_to_use [6606122.904234] next_to_clean [6606122.904234] buffer_info[next_to_clean]: [6606122.904234] time_stamp <12761e88c> [6606122.904234] next_to_watch [6606122.904234] jiffies <12761e928> [6606122.904234] next_to_watch.status <0> [6606122.904234] MAC Status <40080083> [6606122.904234] PHY Status <796d> [6606122.904234] PHY 1000BASE-T Status <3800> [6606122.904234] PHY Extended Status<3000> [6606122.904234] PCI Status <10> [6606124.903733] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [6606124.903733] TDH [6606124.903733] TDT [6606124.903733] next_to_use [6606124.903733] next_to_clean [6606124.903733] buffer_info[next_to_clean]: [6606124.903733] time_stamp <12761e88c> [6606124.903733] next_to_watch [6606124.903733] jiffies <12761e9f0> [6606124.903733] next_to_watch.status <0> [6606124.903733] MAC Status <40080083> [6606124.903733] PHY Status <796d> [6606124.903733] PHY 1000BASE-T Status <3800> [6606124.903733] PHY Extended Status<3000> [6606124.903733] PCI Status <10> [6606126.903291] e1000e :00:19.0 eth0: Detected Hardware Unit Hang: [6606126.903291] TDH [6606126.903291] TDT [6606126.903291] next_to_use [6606126.903291] next_to_clean [6606126.903291] buffer_info[next_to_clean]: [6606126.903291] time_stamp <12761e88c> [6606126.903291] next_to_watch [6606126.903291] jiffies <12761eab8> [6606126.903291] next_to_watch.status <0> [6606126.903291] MAC Status <40080083> [6606126.903291] PHY Status <796d> [6606126.903291] PHY 1000BASE-T Status <3800> [6606126.903291] PHY Extended Status<3000> [6606126.903291] PCI Status <10> [6606127.912352] [ cut here ] [6606127.912566] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x180/0x1e6() [6606127.912877] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out [6606127.913067] Modules linked in: xt_CLASSIFY xt_set ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_recent ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_tcpudp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre ip_set_hash_net ip_set nfnetlink iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables act_nat cls_u32 sch_ingress [6606127.915843] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.4-build-0084 #1 [6606127.916035] Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS S1200BT.86B.02.00.0041.120520121743 12/05/2012 [6606127.916356] 0009 88042f003dd8 81896390 00fb [6606127.916903] 88042f003e28 88042f003e18 810bc024 820aad98 [6606127.917451] 81830ab3 8800be47c000 88042a8dce00 0001 [6606127.917991] Call Trace: [6606127.918175][] dump_stack+0x45/0x57 [6606127.918429] [] warn_slowpath_common+0x97/0xb1 [6606127.918621] [] ? dev_watchdog+0x180/0x1e6 [6606127.918812] [] warn_slowpath_fmt+0x41/0x43 [6606127.919007] [] ? nf_ct_delete+0x1ef/0x202 [nf_conntrack] [6606127.919201] [] dev_watchdog+0x180/0x1e6 [6606127.919396] [] ? nf_ct_delete+0x202/0x202 [nf_conntrack] [6606127.919589] [] ? dev_graft_qdisc+0x65/0x65 [6606127.919781] [] call_timer_fn.isra.27+0x17/0x6d [6606127.919
Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values
On 2015-11-03 21:11, Eric Dumazet wrote: On Tue, 2015-11-03 at 19:33 +0200, Denys Fedoryshchenko wrote: Hi Recently i was testing shaping over single 10G cards, for speeds up to 3-4Gbps, and noticed interesting effect. Shaping scheme: Incoming bandwidth comes to switch port, with access vlan 100 Outgoing bandwidth leaves switch port with access vlan 200 Linux with Intel X710 connected to trunk port, bridge created, eth0.100 bridged to eth0.200 gso/gro/tso disabled (they doesn't work nice with shapers) Well, this seems urban legend to me. Something that is repeatedly copied/pasted on many web pages since last century. Given the nature of qdisc (being protected by a spinlock), you absolutely want to have some kind of aggregation. I have a patch to allow a sysadmin to set a max gro segs value to incoming packets. You could play with it. Start with 4 segments, allow GSO/TSO on the output and watch performance coming back. It is not, since i have more than 120 servers installed over country (most of them handle small traffic), in forwarding mode, first thing i am doing on forwarding setup - disabling gro/gso/tso. It is helped also many ISP on their forum where i visit often, first thing in troubleshooting unreliable network traffic forwarding - disabling offloading. Because problem starts from incorrect shaping, and ends in some cases with network drivers spitting watchdog errors. Sometimes even shaper not necessary, just plain forwarding with offload enabled can cause issues, but it might be bug in networking drivers. Should i try to reproduce and report? Sure if anybody can look into this issue. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values
Hi Recently i was testing shaping over single 10G cards, for speeds up to 3-4Gbps, and noticed interesting effect. Shaping scheme: Incoming bandwidth comes to switch port, with access vlan 100 Outgoing bandwidth leaves switch port with access vlan 200 Linux with Intel X710 connected to trunk port, bridge created, eth0.100 bridged to eth0.200 gso/gro/tso disabled (they doesn't work nice with shapers) Sure latest kernel Shaper are installed on eth0.200, and seems multiqueue works on eth0 in general (i see packets are distributed over each queue), CPU load is very low (max 20% on core, but usually below 5%). I tried: HTB with fq, pfifo, pie qdisc HFSC with fq, pfifo, pie qdisc After i run shaper with default values, i can see traffic start to queue in classes and total traffic doesn't reach more than 2.4Gbit, and if i remove shaper it directly reach 4Gbit. The only trick i found, it is running pie with burst 1 cburst 1 in leaf classes, and 10 in root class (i think 1 in root class might work as well). If i change discipline to fq, i am returning back to 2.4Gbit, but it might be just because fq is not intended to be used with HTB leaf class. So in my case burst/cburst solved issue, but i suspect maybe possible more elegant solution/tuning, than putting some random values? Is there any particular reason why i am limited by ~2.4Gbit on any other settings? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-02 18:12, Eric Dumazet wrote: On Mon, 2015-11-02 at 17:58 +0200, Denys Fedoryshchenko wrote: On 2015-11-02 17:24, Eric Dumazet wrote: > On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote: >> Hi! >> >> Actually seems i was getting this panic for a while (once per week) on >> loaded pppoe server, but just now was able to get full panic message. >> After checking commit logs on sch_fq.c i didnt seen any fixes, so >> probably upgrading to newer kernel wont help? > > I do not think we support sch_fq as a HTB leaf. > > If you want both HTB and sch_fq, you need to setup a bonding device. > > HTB on bond0 > > sch_fq on the slaves > > Sure, the kernel should not crash, but HTB+sch_fq on same net device is > certainly not something that will work anyway. Strange, because except ppp, on static devices it works really very well in such scheme. It is the only solution that can throttle incoming bandwidth, when bandwidth is very overbooked - reliably, for my use cases, such as 256k+ flows/2.5Gbps and several different classes of traffic, so using DRR will end up in just not enough classes. On latest kernels i had to patch tc to provide parameter for orphan mask in fq, to increase number for flows for transit traffic. None of other qdiscs able to solve this problem, incoming bandwidth simply flowing 10-20% more than set, but fq is doing magic. The only device that was working with similar efficiency for such cases - proprietary PacketShaper, but is modifying tcp window size, and can't be called transparent, and also has stability issues over 1Gbps. Ah, I was thinking you needed more like 10Gb traffic ;) with HTB on bonding, we can use MQ+FQ on the slaves in order to use many cpus to serve local traffic. But yes, if you use HTB+FQ for forwarding, I guess the bonding setup is not really needed. Well, here country is very underdeveloped in matters of technology. 10G interfaces appeared in some ISP only this year. On the ppp interfaces where crash happening - it is even less bandwidth. Each user max 1-2Mbps(average usage 128kbps), 4.5k interfaces. But i have some more heavy setups there, around 9k pppoe users terminated on single server, (means 9k interfaces), about 2Gbps traffic passing thru. If i take non-FOSS solution, i will have to pay for software licenses $100k+, which is unbearable for local ISP. fq is not critical in this specific use case, i can use for ppp interfaces fifo or such, but i guess better to report a but :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-02 17:24, Eric Dumazet wrote: On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote: Hi! Actually seems i was getting this panic for a while (once per week) on loaded pppoe server, but just now was able to get full panic message. After checking commit logs on sch_fq.c i didnt seen any fixes, so probably upgrading to newer kernel wont help? I do not think we support sch_fq as a HTB leaf. If you want both HTB and sch_fq, you need to setup a bonding device. HTB on bond0 sch_fq on the slaves Sure, the kernel should not crash, but HTB+sch_fq on same net device is certainly not something that will work anyway. Strange, because except ppp, on static devices it works really very well in such scheme. It is the only solution that can throttle incoming bandwidth, when bandwidth is very overbooked - reliably, for my use cases, such as 256k+ flows/2.5Gbps and several different classes of traffic, so using DRR will end up in just not enough classes. On latest kernels i had to patch tc to provide parameter for orphan mask in fq, to increase number for flows for transit traffic. None of other qdiscs able to solve this problem, incoming bandwidth simply flowing 10-20% more than set, but fq is doing magic. The only device that was working with similar efficiency for such cases - proprietary PacketShaper, but is modifying tcp window size, and can't be called transparent, and also has stability issues over 1Gbps. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel panic in 4.2.3, rb_erase in sch_fq
Hi! Actually seems i was getting this panic for a while (once per week) on loaded pppoe server, but just now was able to get full panic message. After checking commit logs on sch_fq.c i didnt seen any fixes, so probably upgrading to newer kernel wont help? [237470.633382] general protection fault: [#1] SMP [237470.633832] Modules linked in: netconsole configfs act_skbedit sch_fq cls_fw act_police cls_u32 sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc xt_nat ts_bm xt_string xt_connmark xt_TCPMSS xt_tcpudp xt_mark iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 8021q garp mrp stp llc [237470.637835] CPU: 1 PID: 14035 Comm: accel-pppd Not tainted 4.2.3-build-0087 #3 [237470.638342] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012 [237470.638859] task: 8803ef8b5080 ti: 8803ed7e task.ti: 8803ed7e [237470.639370] RIP: 0010:[] [] rb_erase+0x37/0x2c4 [237470.639960] RSP: 0018:8803ed7e3b88 EFLAGS: 00010286 [237470.644863] RAX: RBX: 8804106ab000 RCX: 0001 [237470.645366] RDX: ffa2050402210218 RSI: 88040cfe2cf0 RDI: 8803f50d00e0 [237470.645872] RBP: 8803ed7e3b88 R08: R09: 88042ee37d50 [237470.646376] R10: ea000fe7a9c0 R11: 94f1b850 R12: 019e [237470.646881] R13: 88040cfe2cf0 R14: 8803f50d00d0 R15: [237470.647381] FS: 7fcd5d384700() GS:88042ee2() knlGS: [237470.647889] CS: 0010 DS: ES: CR0: 80050033 [237470.648209] CR2: 7fcd003efa90 CR3: 000424b6e000 CR4: 000406e0 [237470.648707] Stack: [237470.648990] 8803ed7e3bb8 a00ef38b 8804106ab000 880416079000 [237470.649791] 0002 8804160790d8 8803ed7e3bd8 8183785c [237470.650589] 0002 8800b021d000 8803ed7e3c18 a00d247a [237470.651387] Call Trace: [237470.651716] [] fq_reset+0x7a/0xf2 [sch_fq] [237470.652084] [] qdisc_reset+0x18/0x42 [237470.652444] [] htb_reset+0x96/0x14d [sch_htb] [237470.652780] [] qdisc_reset+0x18/0x42 [237470.653146] [] dev_deactivate_queue.constprop.34+0x43/0x53 [237470.653726] [] dev_deactivate_many+0x53/0x206 [237470.654088] [] __dev_close_many+0x73/0xbf [237470.654436] [] __dev_close+0x2c/0x41 [237470.654784] [] ? _raw_spin_unlock_bh+0x15/0x17 [237470.655106] [] __dev_change_flags+0xa5/0x13c [237470.655427] [] dev_change_flags+0x23/0x59 [237470.655777] [] ? mutex_lock+0x13/0x24 [237470.656073] [] devinet_ioctl+0x246/0x533 [237470.656372] [] inet_ioctl+0x8c/0xa6 [237470.656667] [] sock_do_ioctl+0x22/0x40 [237470.656960] [] sock_ioctl+0x1f2/0x200 [237470.657253] [] do_vfs_ioctl+0x360/0x41a [237470.657549] [] ? vfs_write+0x105/0x164 [237470.657841] [] SyS_ioctl+0x39/0x61 [237470.658134] [] entry_SYSCALL_64_fastpath+0x16/0x6e [237470.658431] Code: 48 85 c0 75 36 48 8b 0f 48 89 c8 48 83 e0 fc 74 12 48 39 78 10 75 06 48 89 50 10 eb 09 48 89 50 08 eb 03 48 89 16 48 85 d2 74 08 89 0a e9 83 02 00 00 80 e1 01 e9 c3 00 00 00 48 85 d2 75 2c [237470.663930] RIP [] rb_erase+0x37/0x2c4 [237470.664296] RSP [237470.664598] ---[ end trace 32ea40a7de450892 ]--- [237470.673272] Kernel panic - not syncing: Fatal exception in interrupt [237470.673577] Kernel Offset: disabled [237470.704654] Rebooting in 5 seconds.. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()
On 2015-10-22 03:14, Matt Bennett wrote: On Tue, 2015-10-13 at 05:13 +0300, Denys Fedoryshchenko wrote: On 2015-10-07 15:12, Guillaume Nault wrote: > On Mon, Oct 05, 2015 at 02:08:44PM +0200, Guillaume Nault wrote: >>if (po) { >>struct sock *sk = sk_pppox(po); >> >> - bh_lock_sock(sk); >> - >> - /* If the user has locked the socket, just ignore >> - * the packet. With the way two rcv protocols hook into >> - * one socket family type, we cannot (easily) distinguish >> - * what kind of SKB it is during backlog rcv. >> - */ >> - if (sock_owned_by_user(sk) == 0) { >> - /* We're no longer connect at the PPPOE layer, >> - * and must wait for ppp channel to disconnect us. >> - */ >> - sk->sk_state = PPPOX_ZOMBIE; >> - } >> - >> - bh_unlock_sock(sk); >>if (!schedule_work(&po->proto.pppoe.padt_work)) >>sock_put(sk); >>} >> > Finally, I think I'll keep this approach for net-next, to completely > remove PPPOX_ZOMBIE. > For now, let's just avoid any assumption about the relationship between > the PPPOX_ZOMBIE state and the value of po->pppoe_dev, as suggested by > Matt. > > Denys, can you let me know if your issue goes away with the following > patch? > --- > diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c > index 2ed7506..5e0b432 100644 > --- a/drivers/net/ppp/pppoe.c > +++ b/drivers/net/ppp/pppoe.c > @@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock) > >po = pppox_sk(sk); > > - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) { > + if (po->pppoe_dev) { >dev_put(po->pppoe_dev); >po->pppoe_dev = NULL; >} I just got OK to upgrade server yesterday, for now around 12 hours working fine. I need 1-2 more days, and maybe will upgrade few more servers to say for sure, if it is ok or not. Sorry for delay, just it is production servers and at current situation they cannot tolerate significant downtime. Any update on whether this issue is fixed with the suggested patch? As on server i am allowed to test - no crashed anymore, but i am unable to get permission yet to test on server where this crash was happening several times per day. But all i can say it is definitely better now. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()
On 2015-10-07 15:12, Guillaume Nault wrote: On Mon, Oct 05, 2015 at 02:08:44PM +0200, Guillaume Nault wrote: if (po) { struct sock *sk = sk_pppox(po); - bh_lock_sock(sk); - - /* If the user has locked the socket, just ignore -* the packet. With the way two rcv protocols hook into -* one socket family type, we cannot (easily) distinguish -* what kind of SKB it is during backlog rcv. -*/ - if (sock_owned_by_user(sk) == 0) { - /* We're no longer connect at the PPPOE layer, -* and must wait for ppp channel to disconnect us. -*/ - sk->sk_state = PPPOX_ZOMBIE; - } - - bh_unlock_sock(sk); if (!schedule_work(&po->proto.pppoe.padt_work)) sock_put(sk); } Finally, I think I'll keep this approach for net-next, to completely remove PPPOX_ZOMBIE. For now, let's just avoid any assumption about the relationship between the PPPOX_ZOMBIE state and the value of po->pppoe_dev, as suggested by Matt. Denys, can you let me know if your issue goes away with the following patch? --- diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index 2ed7506..5e0b432 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock) po = pppox_sk(sk); - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) { + if (po->pppoe_dev) { dev_put(po->pppoe_dev); po->pppoe_dev = NULL; } I just got OK to upgrade server yesterday, for now around 12 hours working fine. I need 1-2 more days, and maybe will upgrade few more servers to say for sure, if it is ok or not. Sorry for delay, just it is production servers and at current situation they cannot tolerate significant downtime. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()
On 2015-10-02 20:54, Guillaume Nault wrote: On Fri, Oct 02, 2015 at 11:01:45AM +0300, Denys Fedoryshchenko wrote: Here is similar panic after patch applied (it might be different bug), got over netconsole: [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted 4.2.2-build-0087 #2 [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [126348.618193] task: 8817cfbe ti: 8817c635 task.ti: 8817c635 [126348.618696] RIP: 0010:[] [] pppoe_release+0x56/0x142 [pppoe] [126348.619306] RSP: 0018:8817c6353e28 EFLAGS: 00010202 [126348.619601] RAX: RBX: 8817a92b0400 RCX: [126348.620152] RDX: 0001 RSI: fe01 RDI: 8180c18a [126348.620715] RBP: 8817c6353e68 R08: R09: [126348.621254] R10: 88173c02b210 R11: 0293 R12: 8817b3c18000 [126348.621784] R13: 8817b3c18030 R14: 8817967f1140 R15: 8817d226c920 [126348.622330] FS: 7f9444db9700() GS:8817dee0() knlGS: [126348.622876] CS: 0010 DS: ES: CR0: 80050033 [126348.623202] CR2: 0428 CR3: 0017c70b2000 CR4: 001406f0 [126348.623760] Stack: [126348.624056] 000100200018 0001 8817b3c18000 [126348.624925] a00ec280 8817b3c18030 8817967f1140 8817d226c920 [126348.625736] 8817c6353e88 8180820a 88173c02b200 0008 [126348.626533] Call Trace: [126348.626873] [] sock_release+0x1a/0x70 [126348.627183] [] sock_close+0xd/0x11 [126348.627512] [] __fput+0xdf/0x193 [126348.627845] [] fput+0x9/0xb [126348.628169] [] task_work_run+0x78/0x8f [126348.628517] [] do_notify_resume+0x40/0x4e [126348.628837] [] int_signal+0x12/0x17 Ok, so there's another possibility for pppoe_release() to be called while sk->sk_state is PPPOX_{CONNECTED,BOUND,ZOMBIE} but po->pppoe_dev is NULL. I'll check the code to see if I can find any race wrt. po->pppoe_dev and sk->sk_state settings. In a previous message, you said you'd try reverting 287f3a943fef ("pppoe: Use workqueue to die properly when a PADT is received") and related patches. I guess "related patches" means 665a6cd809f4 ("pppoe: drop pppoe device in pppoe_unbind_sock_work"), right?. Did these reverts give any successful result? BTW, please don't top-post. I am doing just "dirty" patch like this, i cannot certainly remember if i was doing git reversal, because it was a while when i spotted this bug. After that pppoe server is not rebooting. diff -Naur linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c linux-4.2.2-changed/drivers/net/ppp/pppoe.c --- linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 2015-09-29 20:38:27.0 +0300 +++ linux-4.2.2-changed/drivers/net/ppp/pppoe.c 2015-10-04 19:05:55.697732991 +0300 @@ -519,7 +519,7 @@ } bh_unlock_sock(sk); - if (!schedule_work(&po->proto.pppoe.padt_work)) +// if (!schedule_work(&po->proto.pppoe.padt_work)) sock_put(sk); } @@ -633,7 +633,7 @@ lock_sock(sk); - INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work); +// INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work); error = -EINVAL; if (sp->sa_protocol != PX_PROTO_OE) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()
Here is similar panic after patch applied (it might be different bug), got over netconsole: [126348.610996] BUG: unable to handle kernel NULL pointer dereference at 0428 [126348.611656] IP: [] pppoe_release+0x56/0x142 [pppoe] [126348.612033] PGD 17d0b03067 PUD 17c721b067 PMD 0 [126348.612545] Oops: [#1] SMP [126348.612981] Modules linked in: act_skbedit sch_fq cls_fw act_police cls_u32 sch_ingress sch_sfq sch_htb pppoe pppox ppp_generic slhc netconsole configfs xt_nat ts_bm xt_string xt_connmark xt_TCPMSS xt_tcpudp xt_mark iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 8021q garp mrp stp llc bonding [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted 4.2.2-build-0087 #2 [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [126348.618193] task: 8817cfbe ti: 8817c635 task.ti: 8817c635 [126348.618696] RIP: 0010:[] [] pppoe_release+0x56/0x142 [pppoe] [126348.619306] RSP: 0018:8817c6353e28 EFLAGS: 00010202 [126348.619601] RAX: RBX: 8817a92b0400 RCX: [126348.620152] RDX: 0001 RSI: fe01 RDI: 8180c18a [126348.620715] RBP: 8817c6353e68 R08: R09: [126348.621254] R10: 88173c02b210 R11: 0293 R12: 8817b3c18000 [126348.621784] R13: 8817b3c18030 R14: 8817967f1140 R15: 8817d226c920 [126348.622330] FS: 7f9444db9700() GS:8817dee0() knlGS: [126348.622876] CS: 0010 DS: ES: CR0: 80050033 [126348.623202] CR2: 0428 CR3: 0017c70b2000 CR4: 001406f0 [126348.623760] Stack: [126348.624056] 000100200018 0001 8817b3c18000 [126348.624925] a00ec280 8817b3c18030 8817967f1140 8817d226c920 [126348.625736] 8817c6353e88 8180820a 88173c02b200 0008 [126348.626533] Call Trace: [126348.626873] [] sock_release+0x1a/0x70 [126348.627183] [] sock_close+0xd/0x11 [126348.627512] [] __fput+0xdf/0x193 [126348.627845] [] fput+0x9/0xb [126348.628169] [] task_work_run+0x78/0x8f [126348.628517] [] do_notify_resume+0x40/0x4e [126348.628837] [] int_signal+0x12/0x17 [126348.629131] Code: 48 8b 83 e0 00 00 00 a8 01 74 12 48 89 df e8 0d 24 72 e1 b8 f7 ff ff ff e9 eb 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a0 02 00 00 8b 80 28 04 00 00 65 ff 08 48 c7 83 a0 02 00 00 00 00 00 00 [126348.635060] RIP [] pppoe_release+0x56/0x142 [pppoe] [126348.635432] RSP [126348.635718] CR2: 0428 [126348.641165] ---[ end trace 911ff90a1416e3d1 ]--- [126348.653235] Kernel panic - not syncing: Fatal exception [126348.653538] Kernel Offset: disabled [126348.677177] Rebooting in 5 seconds.. On 2015-09-30 12:45, Guillaume Nault wrote: Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"), pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the following oops: [ 570.140800] BUG: unable to handle kernel NULL pointer dereference at 04e0 [ 570.142931] IP: [] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0 [ 570.144601] Oops: [#1] SMP [ 570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio [ 570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1 [ 570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 570.144601] task: 88003d30d600 ti: 880036b6 task.ti: 880036b6 [ 570.144601] RIP: 0010:[] [] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] RSP: 0018:880036b63e08 EFLAGS: 00010202 [ 570.144601] RAX: RBX: 88003434 RCX: 0206 [ 570.144601] RDX: 0006 RSI: 88003d30dd20 RDI: 88003d30dd20 [ 570.144601] RBP: 880036b63e28 R08: 0001 R09: [ 570.144601] R10: 7ffee9b50420 R11: 880034340078 R12: 8800387ec780 [ 570.144601] R13: 8800387ec7b0 R14: 88003e222aa0 R15: 8800387ec7b0 [ 570.144601] FS: 7f5672f48700() GS:88003fc8() knlGS: [ 570.144601] CS: 0010 DS: ES: CR0: 80050033 [ 570.144601] CR2: 00
Re: 4.1.0, kernel panic, pppoe_release
On 2015-09-25 17:38, Guillaume Nault wrote: On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote: Hi, Sorry for late reply, was not able to push new kernel on pppoes without permissions (it's production servers), just got OK. I am testing patch on another pppoe server with 9k users, for ~3 days, seems fine. I will test today also on server that was experiencing crashes within 1 day. Thanks for the feedback. I'm about to submit a fix. Should I add a Tested-by tag for you? On one of servers i got same crash as before, within hours. 9k users server also crashed after while, so it seems it doesn't help. I will do some more tests tomorrow. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.1.0, kernel panic, pppoe_release
Hi, Sorry for late reply, was not able to push new kernel on pppoes without permissions (it's production servers), just got OK. I am testing patch on another pppoe server with 9k users, for ~3 days, seems fine. I will test today also on server that was experiencing crashes within 1 day. On 2015-09-10 18:56, Guillaume Nault wrote: On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote: Probably my knowledge of kernel is not sufficient, but i will try few approaches. One of them to add to pppoe_unbind_sock_work: pppox_unbind_sock(sk); +/* Signal the death of the socket. */ +sk->sk_state = PPPOX_DEAD; I don't believe this will fix anything. pppox_unbind_sock() already sets sk->sk_state when necessary. I will wait first, to make sure this patch was causing kernel panic (it needs 24h testing cycle), then i will try this fix. I suspect the problem goes with actions performed on the underlying interface (MAC address, MTU or link state update). This triggers pppoe_flush_dev(), which cleans up the device without announcing it in sk->sk_state. Can you pleas try the following patch? --- diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index 3837ae3..2ed7506 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev) if (po->pppoe_dev == dev && sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) { pppox_unbind_sock(sk); - sk->sk_state = PPPOX_ZOMBIE; sk->sk_state_change(sk); po->pppoe_dev = NULL; dev_put(dev); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.1.0, kernel panic, pppoe_release
Probably my knowledge of kernel is not sufficient, but i will try few approaches. One of them to add to pppoe_unbind_sock_work: pppox_unbind_sock(sk); +/* Signal the death of the socket. */ +sk->sk_state = PPPOX_DEAD; I will wait first, to make sure this patch was causing kernel panic (it needs 24h testing cycle), then i will try this fix. On 2015-07-17 18:36, Dan Williams wrote: On Fri, 2015-07-17 at 12:24 +0300, Denys Fedoryshchenko wrote: As i suspect, this kernel panic caused by recent changes to pppoe. This problem appearing in accel-pppd (server), on loaded servers (2k users and more). Most probably related to changed "pppoe: Use workqueue to die properly when a PADT is received" I will try to reverse this and related patches. While I didn't write the patch, I'm the one that started the process that got it submitted... Could you review the patch quickly too to see if you can spot anything amiss with it, so that it could get fixed up? The original patch does fix a real problem so ideally we don't have to revert the whole thing upstream. Dan On 2015-07-14 13:57, Denys Fedoryshchenko wrote: > Here is panic message from netconsole. Please let me know if any > additional information required. > > Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel > Jul 14 13:49:16 10.0.252.10 NULL pointer dereference > Jul 14 13:49:16 10.0.252.10 at 03f0 > Jul 14 13:49:16 10.0.252.10 [76078.868280] IP: > Jul 14 13:49:16 10.0.252.10 [] > pppoe_release+0x56/0x142 [pppoe] > Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067 > Jul 14 13:49:16 10.0.252.10 PUD 333f17067 > Jul 14 13:49:16 10.0.252.10 PMD 0 > Jul 14 13:49:16 10.0.252.10 > Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: [#1] > Jul 14 13:49:16 10.0.252.10 SMP > Jul 14 13:49:16 10.0.252.10 > Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in: > Jul 14 13:49:16 10.0.252.10 netconsole > Jul 14 13:49:16 10.0.252.10 configfs > Jul 14 13:49:16 10.0.252.10 coretemp > Jul 14 13:49:16 10.0.252.10 sch_fq > Jul 14 13:49:16 10.0.252.10 cls_fw > Jul 14 13:49:16 10.0.252.10 act_police > Jul 14 13:49:16 10.0.252.10 cls_u32 > Jul 14 13:49:16 10.0.252.10 sch_ingress > Jul 14 13:49:16 10.0.252.10 sch_sfq > Jul 14 13:49:16 10.0.252.10 sch_htb > Jul 14 13:49:16 10.0.252.10 pppoe > Jul 14 13:49:16 10.0.252.10 pppox > Jul 14 13:49:16 10.0.252.10 ppp_generic > Jul 14 13:49:16 10.0.252.10 slhc > Jul 14 13:49:16 10.0.252.10 nf_nat_pptp > Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre > Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp > Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre > Jul 14 13:49:16 10.0.252.10 tun > Jul 14 13:49:16 10.0.252.10 xt_REDIRECT > Jul 14 13:49:16 10.0.252.10 nf_nat_redirect > Jul 14 13:49:16 10.0.252.10 xt_set > Jul 14 13:49:16 10.0.252.10 xt_TCPMSS > Jul 14 13:49:16 10.0.252.10 ipt_REJECT > Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4 > Jul 14 13:49:16 10.0.252.10 ts_bm > Jul 14 13:49:16 10.0.252.10 xt_string > Jul 14 13:49:16 10.0.252.10 xt_connmark > Jul 14 13:49:16 10.0.252.10 xt_DSCP > Jul 14 13:49:16 10.0.252.10 xt_mark > Jul 14 13:49:16 10.0.252.10 xt_tcpudp > Jul 14 13:49:16 10.0.252.10 iptable_mangle > Jul 14 13:49:16 10.0.252.10 iptable_filter > Jul 14 13:49:16 10.0.252.10 iptable_nat > Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4 > Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4 > Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4 > Jul 14 13:49:16 10.0.252.10 nf_nat > Jul 14 13:49:16 10.0.252.10 nf_conntrack > Jul 14 13:49:16 10.0.252.10 ip_tables > Jul 14 13:49:16 10.0.252.10 x_tables > Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip > Jul 14 13:49:16 10.0.252.10 ip_set > Jul 14 13:49:16 10.0.252.10 nfnetlink > Jul 14 13:49:16 10.0.252.10 8021q > Jul 14 13:49:16 10.0.252.10 garp > Jul 14 13:49:16 10.0.252.10 mrp > Jul 14 13:49:16 10.0.252.10 stp > Jul 14 13:49:16 10.0.252.10 llc > Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole] > Jul 14 13:49:16 10.0.252.10 > Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm: > accel-pppd Not tainted 4.1.0-build-0074 #7 > Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant > DL320e Gen8 v2, BIOS P80 04/02/2015 > Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti: > 8800b09f4000 task.ti: 8800b09f4000 > Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: > 0010:[] > Jul 14 13:49:16 10.0.252.10 [] > pppoe_release+0x56/0x142 [pppoe] > Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28 > EFLAGS: 00010202 > Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: RBX: > 88032a214400 RCX: > Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI: > 00
Re: 4.1.0, kernel panic, pppoe_release
As i suspect, this kernel panic caused by recent changes to pppoe. This problem appearing in accel-pppd (server), on loaded servers (2k users and more). Most probably related to changed "pppoe: Use workqueue to die properly when a PADT is received" I will try to reverse this and related patches. On 2015-07-14 13:57, Denys Fedoryshchenko wrote: Here is panic message from netconsole. Please let me know if any additional information required. Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel Jul 14 13:49:16 10.0.252.10 NULL pointer dereference Jul 14 13:49:16 10.0.252.10 at 03f0 Jul 14 13:49:16 10.0.252.10 [76078.868280] IP: Jul 14 13:49:16 10.0.252.10 [] pppoe_release+0x56/0x142 [pppoe] Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067 Jul 14 13:49:16 10.0.252.10 PUD 333f17067 Jul 14 13:49:16 10.0.252.10 PMD 0 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: [#1] Jul 14 13:49:16 10.0.252.10 SMP Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in: Jul 14 13:49:16 10.0.252.10 netconsole Jul 14 13:49:16 10.0.252.10 configfs Jul 14 13:49:16 10.0.252.10 coretemp Jul 14 13:49:16 10.0.252.10 sch_fq Jul 14 13:49:16 10.0.252.10 cls_fw Jul 14 13:49:16 10.0.252.10 act_police Jul 14 13:49:16 10.0.252.10 cls_u32 Jul 14 13:49:16 10.0.252.10 sch_ingress Jul 14 13:49:16 10.0.252.10 sch_sfq Jul 14 13:49:16 10.0.252.10 sch_htb Jul 14 13:49:16 10.0.252.10 pppoe Jul 14 13:49:16 10.0.252.10 pppox Jul 14 13:49:16 10.0.252.10 ppp_generic Jul 14 13:49:16 10.0.252.10 slhc Jul 14 13:49:16 10.0.252.10 nf_nat_pptp Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre Jul 14 13:49:16 10.0.252.10 tun Jul 14 13:49:16 10.0.252.10 xt_REDIRECT Jul 14 13:49:16 10.0.252.10 nf_nat_redirect Jul 14 13:49:16 10.0.252.10 xt_set Jul 14 13:49:16 10.0.252.10 xt_TCPMSS Jul 14 13:49:16 10.0.252.10 ipt_REJECT Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4 Jul 14 13:49:16 10.0.252.10 ts_bm Jul 14 13:49:16 10.0.252.10 xt_string Jul 14 13:49:16 10.0.252.10 xt_connmark Jul 14 13:49:16 10.0.252.10 xt_DSCP Jul 14 13:49:16 10.0.252.10 xt_mark Jul 14 13:49:16 10.0.252.10 xt_tcpudp Jul 14 13:49:16 10.0.252.10 iptable_mangle Jul 14 13:49:16 10.0.252.10 iptable_filter Jul 14 13:49:16 10.0.252.10 iptable_nat Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4 Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4 Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4 Jul 14 13:49:16 10.0.252.10 nf_nat Jul 14 13:49:16 10.0.252.10 nf_conntrack Jul 14 13:49:16 10.0.252.10 ip_tables Jul 14 13:49:16 10.0.252.10 x_tables Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip Jul 14 13:49:16 10.0.252.10 ip_set Jul 14 13:49:16 10.0.252.10 nfnetlink Jul 14 13:49:16 10.0.252.10 8021q Jul 14 13:49:16 10.0.252.10 garp Jul 14 13:49:16 10.0.252.10 mrp Jul 14 13:49:16 10.0.252.10 stp Jul 14 13:49:16 10.0.252.10 llc Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole] Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm: accel-pppd Not tainted 4.1.0-build-0074 #7 Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti: 8800b09f4000 task.ti: 8800b09f4000 Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 0010:[] Jul 14 13:49:16 10.0.252.10 [] pppoe_release+0x56/0x142 [pppoe] Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28 EFLAGS: 00010202 Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: RBX: 88032a214400 RCX: Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI: fe01 RDI: 8180d6da Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: 8800b09f7e68 R08: R09: Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: 88031ef6a110 R11: 0293 R12: 88030f8d8fc0 Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: 88030f8d8ff0 R14: 88033115ee40 R15: 8803394e4920 Jul 14 13:49:16 10.0.252.10 [76078.875499] FS: 7f79b602c700() GS:88034746() knlGS: Jul 14 13:49:16 10.0.252.10 [76078.875837] CS: 0010 DS: ES: CR0: 80050033 Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 03f0 CR3: 000335425000 CR4: 001407e0 Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack: Jul 14 13:49:16 10.0.252.10 [76078.876434] 88033ac45c80 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 0001 Jul 14 13:49:16 10.0.252.10 88030f8d8fc0 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.877001] a0120260 Jul 14 13:49:16 10.0.252.10 88030f8d8ff0 Jul 14 13:49:16 10.0.252.10 88033115ee40 Jul 14 13:49:16 10.0.252.10 8803394e4920 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.877564] 8800b0
4.1.0, kernel panic, pppoe_release
Here is panic message from netconsole. Please let me know if any additional information required. Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel Jul 14 13:49:16 10.0.252.10 NULL pointer dereference Jul 14 13:49:16 10.0.252.10 at 03f0 Jul 14 13:49:16 10.0.252.10 [76078.868280] IP: Jul 14 13:49:16 10.0.252.10 [] pppoe_release+0x56/0x142 [pppoe] Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067 Jul 14 13:49:16 10.0.252.10 PUD 333f17067 Jul 14 13:49:16 10.0.252.10 PMD 0 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops: [#1] Jul 14 13:49:16 10.0.252.10 SMP Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in: Jul 14 13:49:16 10.0.252.10 netconsole Jul 14 13:49:16 10.0.252.10 configfs Jul 14 13:49:16 10.0.252.10 coretemp Jul 14 13:49:16 10.0.252.10 sch_fq Jul 14 13:49:16 10.0.252.10 cls_fw Jul 14 13:49:16 10.0.252.10 act_police Jul 14 13:49:16 10.0.252.10 cls_u32 Jul 14 13:49:16 10.0.252.10 sch_ingress Jul 14 13:49:16 10.0.252.10 sch_sfq Jul 14 13:49:16 10.0.252.10 sch_htb Jul 14 13:49:16 10.0.252.10 pppoe Jul 14 13:49:16 10.0.252.10 pppox Jul 14 13:49:16 10.0.252.10 ppp_generic Jul 14 13:49:16 10.0.252.10 slhc Jul 14 13:49:16 10.0.252.10 nf_nat_pptp Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre Jul 14 13:49:16 10.0.252.10 tun Jul 14 13:49:16 10.0.252.10 xt_REDIRECT Jul 14 13:49:16 10.0.252.10 nf_nat_redirect Jul 14 13:49:16 10.0.252.10 xt_set Jul 14 13:49:16 10.0.252.10 xt_TCPMSS Jul 14 13:49:16 10.0.252.10 ipt_REJECT Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4 Jul 14 13:49:16 10.0.252.10 ts_bm Jul 14 13:49:16 10.0.252.10 xt_string Jul 14 13:49:16 10.0.252.10 xt_connmark Jul 14 13:49:16 10.0.252.10 xt_DSCP Jul 14 13:49:16 10.0.252.10 xt_mark Jul 14 13:49:16 10.0.252.10 xt_tcpudp Jul 14 13:49:16 10.0.252.10 iptable_mangle Jul 14 13:49:16 10.0.252.10 iptable_filter Jul 14 13:49:16 10.0.252.10 iptable_nat Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4 Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4 Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4 Jul 14 13:49:16 10.0.252.10 nf_nat Jul 14 13:49:16 10.0.252.10 nf_conntrack Jul 14 13:49:16 10.0.252.10 ip_tables Jul 14 13:49:16 10.0.252.10 x_tables Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip Jul 14 13:49:16 10.0.252.10 ip_set Jul 14 13:49:16 10.0.252.10 nfnetlink Jul 14 13:49:16 10.0.252.10 8021q Jul 14 13:49:16 10.0.252.10 garp Jul 14 13:49:16 10.0.252.10 mrp Jul 14 13:49:16 10.0.252.10 stp Jul 14 13:49:16 10.0.252.10 llc Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole] Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm: accel-pppd Not tainted 4.1.0-build-0074 #7 Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 04/02/2015 Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti: 8800b09f4000 task.ti: 8800b09f4000 Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 0010:[] Jul 14 13:49:16 10.0.252.10 [] pppoe_release+0x56/0x142 [pppoe] Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28 EFLAGS: 00010202 Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX: RBX: 88032a214400 RCX: Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI: fe01 RDI: 8180d6da Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: 8800b09f7e68 R08: R09: Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: 88031ef6a110 R11: 0293 R12: 88030f8d8fc0 Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: 88030f8d8ff0 R14: 88033115ee40 R15: 8803394e4920 Jul 14 13:49:16 10.0.252.10 [76078.875499] FS: 7f79b602c700() GS:88034746() knlGS: Jul 14 13:49:16 10.0.252.10 [76078.875837] CS: 0010 DS: ES: CR0: 80050033 Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 03f0 CR3: 000335425000 CR4: 001407e0 Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack: Jul 14 13:49:16 10.0.252.10 [76078.876434] 88033ac45c80 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 0001 Jul 14 13:49:16 10.0.252.10 88030f8d8fc0 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.877001] a0120260 Jul 14 13:49:16 10.0.252.10 88030f8d8ff0 Jul 14 13:49:16 10.0.252.10 88033115ee40 Jul 14 13:49:16 10.0.252.10 8803394e4920 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.877564] 8800b09f7e88 Jul 14 13:49:16 10.0.252.10 81809e2e Jul 14 13:49:16 10.0.252.10 88031ef6a100 Jul 14 13:49:16 10.0.252.10 0008 Jul 14 13:49:16 10.0.252.10 Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace: Jul 14 13:49:16 10.0.252.10 [76078.878327] [] sock_release+0x1a/0x78 Jul 14 13:49:16 10.0.252.10 [76078.878528] [] s
Re: circular locking, mirred, 2.6.24.2
What does it mean early? I have custom boot scripts, it is also custom system based on busybox. There is a chance that i forgot to bring ifb0 up, but thats it. I think such warning must not appear on any actions in userspace. On Mon, 25 Feb 2008 09:56:46 +, Jarek Poplawski wrote > On 24-02-2008 23:20, Denys Fedoryshchenko wrote: > > 2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i > > understand, they are in 2.6.25 git and it is fixes). > > > > I will send also to private mails QoS rules i am using. > > > > [ 118.840072] === > > [ 118.840158] [ INFO: possible circular locking dependency detected ] > > [ 118.840203] 2.6.24.2-build-0022 #7 > > [ 118.840243] --- > > [ 118.840288] swapper/0 is trying to acquire lock: > > [ 118.840329] (&dev->queue_lock){-+..}, at: [] dev_queue_xmit > > +0x177/0x302 > > [ 118.840490] > > [ 118.840490] but task is already holding lock: > > [ 118.840567] (&p->tcfc_lock){-+..}, at: [] tcf_mirred +0x20/0x180 > > [act_mirred] > > [ 118.840727] > > [ 118.840727] which lock already depends on the new lock. > > [ 118.840728] > > [ 118.840842] > > [ 118.840842] the existing dependency chain (in reverse order) is: > > [ 118.840921] > > [ 118.840921] -> #2 (&p->tcfc_lock){-+..}: > > [ 118.841075][] __lock_acquire+0xa30/0xc19 > > [ 118.841324][] lock_acquire+0x7a/0x94 > > [ 118.841572][] _spin_lock+0x2e/0x58 > > [ 118.841820][] tcf_mirred+0x20/0x180 [act_mirred] > > [ 118.842068][] tcf_action_exec+0x44/0x77 > > [ 118.842344][] u32_classify+0x119/0x24a [cls_u32] > > [ 118.842595][] tc_classify_compat+0x2f/0x5e > > [ 118.842845][] tc_classify+0x1a/0x80 > > [ 118.843092][] ingress_enqueue+0x1a/0x53 [sch_ingress] > > [ 118.843343][] netif_receive_skb+0x296/0x44c > > [ 118.843592][] e100_poll+0x14b/0x26a [e100] > > [ 118.843843][] net_rx_action+0xbf/0x201 > > [ 118.844091][] __do_softirq+0x6f/0xe9 > > [ 118.844343][] do_softirq+0x61/0xc8 > > [ 118.844591][] 0x > > [ 118.844840] > > [ 118.844840] -> #1 (&dev->ingress_lock){-+..}: > > [ 118.844993][] __lock_acquire+0xa30/0xc19 > > [ 118.845242][] lock_acquire+0x7a/0x94 > > [ 118.845489][] _spin_lock+0x2e/0x58 > > [ 118.845737][] qdisc_lock_tree+0x1e/0x21 > > [ 118.845984][] dev_init_scheduler+0xb/0x53 > > [ 118.846235][] register_netdevice+0x2a3/0x2fd > > [ 118.846483][] register_netdev+0x32/0x3f > > [ 118.846730][] loopback_net_init+0x39/0x6c > > [ 118.846980][] register_pernet_operations+0x13/0x15 > > [ 118.847230][] register_pernet_device+0x1f/0x4c > > [ 118.847478][] loopback_init+0xd/0xf > > [ 118.847725][] kernel_init+0x155/0x2c6 > > This looks strange: are you sure your tc scripts aren't started too > early? (Or maybe there are some problems during booting?) > > Regards, > Jarek P. > > > [ 118.847973][] kernel_thread_helper+0x7/0x10 > > [ 118.848225][] 0x > > [ 118.848472] > > [ 118.848472] -> #0 (&dev->queue_lock){-+..}: > > [ 118.848626][] __lock_acquire+0x920/0xc19 > > [ 118.848874][] lock_acquire+0x7a/0x94 > > [ 118.849122][] _spin_lock+0x2e/0x58 > > [ 118.849370][] dev_queue_xmit+0x177/0x302 > > [ 118.849617][] tcf_mirred+0x15f/0x180 [act_mirred] > > [ 118.849866][] tcf_action_exec+0x44/0x77 > > [ 118.850114][] u32_classify+0x119/0x24a [cls_u32] > > [ 118.850366][] tc_classify_compat+0x2f/0x5e > > [ 118.850614][] tc_classify+0x1a/0x80 > > [ 118.850861][] ingress_enqueue+0x1a/0x53 [sch_ingress] > > [ 118.85][] netif_receive_skb+0x296/0x44c > > [ 118.851360][] e100_poll+0x14b/0x26a [e100] > > [ 118.851612][] net_rx_action+0xbf/0x201 > > [ 118.851859][] __do_softirq+0x6f/0xe9 > > [ 118.852106][] do_softirq+0x61/0xc8 > > [ 118.852355][] 0x > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
circular locking, mirred, 2.6.24.2
2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i understand, they are in 2.6.25 git and it is fixes). I will send also to private mails QoS rules i am using. [ 118.840072] === [ 118.840158] [ INFO: possible circular locking dependency detected ] [ 118.840203] 2.6.24.2-build-0022 #7 [ 118.840243] --- [ 118.840288] swapper/0 is trying to acquire lock: [ 118.840329] (&dev->queue_lock){-+..}, at: [] dev_queue_xmit +0x177/0x302 [ 118.840490] [ 118.840490] but task is already holding lock: [ 118.840567] (&p->tcfc_lock){-+..}, at: [] tcf_mirred+0x20/0x180 [act_mirred] [ 118.840727] [ 118.840727] which lock already depends on the new lock. [ 118.840728] [ 118.840842] [ 118.840842] the existing dependency chain (in reverse order) is: [ 118.840921] [ 118.840921] -> #2 (&p->tcfc_lock){-+..}: [ 118.841075][] __lock_acquire+0xa30/0xc19 [ 118.841324][] lock_acquire+0x7a/0x94 [ 118.841572][] _spin_lock+0x2e/0x58 [ 118.841820][] tcf_mirred+0x20/0x180 [act_mirred] [ 118.842068][] tcf_action_exec+0x44/0x77 [ 118.842344][] u32_classify+0x119/0x24a [cls_u32] [ 118.842595][] tc_classify_compat+0x2f/0x5e [ 118.842845][] tc_classify+0x1a/0x80 [ 118.843092][] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 118.843343][] netif_receive_skb+0x296/0x44c [ 118.843592][] e100_poll+0x14b/0x26a [e100] [ 118.843843][] net_rx_action+0xbf/0x201 [ 118.844091][] __do_softirq+0x6f/0xe9 [ 118.844343][] do_softirq+0x61/0xc8 [ 118.844591][] 0x [ 118.844840] [ 118.844840] -> #1 (&dev->ingress_lock){-+..}: [ 118.844993][] __lock_acquire+0xa30/0xc19 [ 118.845242][] lock_acquire+0x7a/0x94 [ 118.845489][] _spin_lock+0x2e/0x58 [ 118.845737][] qdisc_lock_tree+0x1e/0x21 [ 118.845984][] dev_init_scheduler+0xb/0x53 [ 118.846235][] register_netdevice+0x2a3/0x2fd [ 118.846483][] register_netdev+0x32/0x3f [ 118.846730][] loopback_net_init+0x39/0x6c [ 118.846980][] register_pernet_operations+0x13/0x15 [ 118.847230][] register_pernet_device+0x1f/0x4c [ 118.847478][] loopback_init+0xd/0xf [ 118.847725][] kernel_init+0x155/0x2c6 [ 118.847973][] kernel_thread_helper+0x7/0x10 [ 118.848225][] 0x [ 118.848472] [ 118.848472] -> #0 (&dev->queue_lock){-+..}: [ 118.848626][] __lock_acquire+0x920/0xc19 [ 118.848874][] lock_acquire+0x7a/0x94 [ 118.849122][] _spin_lock+0x2e/0x58 [ 118.849370][] dev_queue_xmit+0x177/0x302 [ 118.849617][] tcf_mirred+0x15f/0x180 [act_mirred] [ 118.849866][] tcf_action_exec+0x44/0x77 [ 118.850114][] u32_classify+0x119/0x24a [cls_u32] [ 118.850366][] tc_classify_compat+0x2f/0x5e [ 118.850614][] tc_classify+0x1a/0x80 [ 118.850861][] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 118.85][] netif_receive_skb+0x296/0x44c [ 118.851360][] e100_poll+0x14b/0x26a [e100] [ 118.851612][] net_rx_action+0xbf/0x201 [ 118.851859][] __do_softirq+0x6f/0xe9 [ 118.852106][] do_softirq+0x61/0xc8 [ 118.852355][] 0x [ 118.852602] [ 118.852602] other info that might help us debug this: [ 118.852603] [ 118.852716] 5 locks held by swapper/0: [ 118.852756] #0: (rcu_read_lock){..--}, at: [] net_rx_action +0x50/0x201 [ 118.852940] #1: (rcu_read_lock){..--}, at: [] netif_receive_skb +0xf6/0x44c [ 118.853123] #2: (&dev->ingress_lock){-+..}, at: [] netif_receive_skb+0x282/0x44c [ 118.853309] #3: (&p->tcfc_lock){-+..}, at: [] tcf_mirred +0x20/0x180 [act_mirred] [ 118.853493] #4: (rcu_read_lock){..--}, at: [] dev_queue_xmit +0x11d/0x302 [ 118.853677] [ 118.853677] stack backtrace: [ 118.853753] Pid: 0, comm: swapper Not tainted 2.6.24.2-build-0022 #7 [ 118.853796] [] show_trace_log_lvl+0x1a/0x2f [ 118.853865] [] show_trace+0x12/0x14 [ 118.853932] [] dump_stack+0x6c/0x72 [ 118.853999] [] print_circular_bug_tail+0x5f/0x68 [ 118.854068] [] __lock_acquire+0x920/0xc19 [ 118.854135] [] lock_acquire+0x7a/0x94 [ 118.854205] [] _spin_lock+0x2e/0x58 [ 118.854272] [] dev_queue_xmit+0x177/0x302 [ 118.854340] [] tcf_mirred+0x15f/0x180 [act_mirred] [ 118.854409] [] tcf_action_exec+0x44/0x77 [ 118.854477] [] u32_classify+0x119/0x24a [cls_u32] [ 118.854547] [] tc_classify_compat+0x2f/0x5e [ 118.854615] [] tc_classify+0x1a/0x80 [ 118.854682] [] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 118.854752] [] netif_receive_skb+0x296/0x44c [ 118.854820] [] e100_poll+0x14b/0x26a [e100] [ 118.854890] [] net_rx_action+0xbf/0x201 [ 118.854958] [] __do_softirq+0x6f/0xe9 [ 118.855025] [] do_softirq+0x61/0xc8 -- Denys
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
Server is fully redundant now, so i apply patches (but i apply both, probably it will make system more reliable somehow) and i enable required debug options in kernel. So i will try to catch this bug few more times, probably if it will generate more detailed info over netconsole it will be useful. Is there any project to dump console messages/kernel dump to disk? For example such issues related to networking, and i guess netconsole doesn't always work, especially when network driver is crashed, but tech's on location told there is some messages running non-stop on the screen. Probably some generic code writing such data over x86 INT 13 (or even kernel dump?) to separate partition will be useful to debug this problem. I know there is some 3rd party patches(for example LKCD), but i prefer to not apply them to not add more bugs. I notice some code in MTD(CONFIG_MTD_OOPS), but i am not sure it is correct and will work if i will setup MTD emulation for block device. That just idea. On Sat, 16 Feb 2008 21:45:19 +0100, Jarek Poplawski wrote > On Sat, Feb 16, 2008 at 12:25:31PM +0200, Denys Fedoryshchenko wrote: > > Thanks, i will try it. > > You think lockdep can be buggy? > > Just like every code... But the main reason is it has quite > meaningful overhead, so could be right "in production" only after > lockups happen. But if it doesn't report anything anyway... > > Your report shows there are quite long paths of calls during softirqs > with some actions (ipt + mirred here?) and qdiscs, so if I'm not > wrong with this stack problem, this would need some optimization. > And, of course, there could be some additional bugs involved around too: > otherwise it seems this should happen more often. But I don't expect > you would try to debug this on your servers, so I hope, it simply > will be found BTW some day... > > Regards, > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
Thanks, i will try it. You think lockdep can be buggy? On Sat, 16 Feb 2008 09:00:36 +0100, Jarek Poplawski wrote > Denys Fedoryshchenko wrote, On 02/13/2008 09:13 AM: > > > It is very difficult to reproduce, happened after running about 1month. No > > changes done in classes at time of crash. > > > > Kernel 2.6.24 vanilla > > Hi, > > I could be wrong, but IMHO this looks like stack was overridden here, > so my proposal is to try this: > > CONFIG_DEBUG_STACKOVERFLOW=y > > But, if you're not very interested in reproducing this, you could > also try to turn off some other debugging, especially lockdep. > > Regards, > Jarek P. > > > > > Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP > > Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers: > > > > > Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted > > (2.6.24-build-0021 #26) > > Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[] EFLAGS: 0082 > > CPU: 1 > > Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 > > Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: > > f76494a4 EDX: c1ff5f80 > > Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: > > ESP: f7c29c70 > > Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: > > SS: 0068 > > Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 > > task=f7c20a60 task.ti=f7c28000) > > Feb 10 15:53:22 SHAPER > > Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER c01f0ef4 > > Feb 10 15:53:22 SHAPER c1ff5f80 > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER f76494a8 > > Feb 10 15:53:22 SHAPER c1ff5f78 > > Feb 10 15:53:22 SHAPER > > Feb 10 15:53:22 SHAPER [ 8271.779493] > > Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted > > (2.6.24-build-0021 #26) > > Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[] EFLAGS: 0082 > > CPU: 1 > > Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 > > Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: > > f76494a4 EDX: c1ff5f80 > > Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: > > ESP: f7c29c70 > > Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: > > SS: 0068 > > Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 > > task=f7c20a60 task.ti=f7c28000) > > Feb 10 15:53:22 SHAPER > > Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER f76494a4 > > Feb 10 15:53:22 SHAPER f76494a4 > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG/ spinlock lockup, 2.6.24
This server was working fine under load under FreeBSD, and worked fine before with other tasks under Linux. I dont think it is RAM. Additionally it is server hardware (Dell PowerEdge) with ECC, MCE and other layers, who will report about any hardware issue most probably, and i think even better than memtest. Additionally it is very difficult to run test on it, cause it is in another country, and i have limited access to it (i dont have network KVM). I have similar crashes on completely different hardware with same job (QOS), so i think it is actually some nasty bug in networking. On Fri, 15 Feb 2008 16:24:56 +0100, Bart Van Assche wrote > 2008/2/15 Denys Fedoryshchenko <[EMAIL PROTECTED]>: > > I have random crashes, at least once per week. It is very difficult to catch > > error message, and only recently i setup netconsole. Now i got crash, but > > there is no traceback and only single line came over netconsole, mentioned > > before. > > Did you already run memtest ? You can run memtest by booting from the > Knoppix CD-ROM or DVD. Most Linux distributions also have included > memtest on their bootable distribution CD's/DVD's. > > Bart Van Assche. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG/ spinlock lockup, 2.6.24
: yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm bogomips: 6383.76 clflush size: 64 -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
HTB(?) softlockup, vanilla 2.6.24
03 Feb 10 15:53:22 SHAPER 5b Feb 10 15:53:22 SHAPER 5e Feb 10 15:53:22 SHAPER 5f Feb 10 15:53:22 SHAPER c3 Feb 10 15:53:22 SHAPER 57 Feb 10 15:53:22 SHAPER 89 Feb 10 15:53:22 SHAPER d7 Feb 10 15:53:22 SHAPER 56 Feb 10 15:53:22 SHAPER 53 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER c3 Feb 10 15:53:22 SHAPER 8b Feb 10 15:53:22 SHAPER 50 Feb 10 15:53:22 SHAPER 08 Feb 10 15:53:22 SHAPER 8b Feb 10 15:53:22 SHAPER 30 Feb 10 15:53:22 SHAPER 8b Feb 10 15:53:22 SHAPER 4a Feb 10 15:53:22 SHAPER 04 Feb 10 15:53:22 SHAPER 83 Feb 10 15:53:22 SHAPER e6 Feb 10 15:53:22 SHAPER fc Feb 10 15:53:22 SHAPER 85 Feb 10 15:53:22 SHAPER c9 Feb 10 15:53:22 SHAPER 89 Feb 10 15:53:22 SHAPER 48 Feb 10 15:53:22 SHAPER 08 Feb 10 15:53:22 SHAPER 74 Feb 10 15:53:22 SHAPER 09 Feb 10 15:53:22 SHAPER 8b Feb 10 15:53:22 SHAPER -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel panic on 2.6.24 with esfq patch applied
__remove_hrtimer+0x5d/0x64 Feb 1 09:08:50 SERVER [12380.067861] [] Feb 1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a Feb 1 09:08:50 SERVER [12380.067883] [] Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80 Feb 1 09:08:50 SERVER [12380.067905] [] Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 Feb 1 09:08:50 SERVER [12380.067928] [] Feb 1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27 Feb 1 09:08:50 SERVER [12380.067949] [] Feb 1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f Feb 1 09:08:50 SERVER [12380.067970] [] Feb 1 09:08:50 SERVER hrtimer_start+0x16/0xf4 Feb 1 09:08:50 SERVER [12380.067991] [] Feb 1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21 Feb 1 09:08:50 SERVER [12380.068013] [] Feb 1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb] Feb 1 09:08:50 SERVER [12380.068036] [] Feb 1 09:08:50 SERVER ip_rcv+0x1fc/0x237 Feb 1 09:08:50 SERVER [12380.068057] [] Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb Feb 1 09:08:50 SERVER [12380.068078] [] Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb Feb 1 09:08:50 SERVER [12380.068099] [] Feb 1 09:08:50 SERVER getnstimeofday+0x2b/0xb5 Feb 1 09:08:50 SERVER [12380.068118] [] Feb 1 09:08:50 SERVER clockevents_program_event+0xe0/0xee Feb 1 09:08:50 SERVER [12380.068140] [] Feb 1 09:08:50 SERVER __qdisc_run+0x2a/0x163 Feb 1 09:08:50 SERVER [12380.068161] [] Feb 1 09:08:50 SERVER net_tx_action+0xa8/0xcc Feb 1 09:08:50 SERVER [12380.068180] [] Feb 1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b Feb 1 09:08:50 SERVER [12380.068199] [] Feb 1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b Feb 1 09:08:50 SERVER [12380.068218] [] Feb 1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96 Feb 1 09:08:50 SERVER [12380.068241] [] Feb 1 09:08:50 SERVER __do_softirq+0x5d/0xc1 Feb 1 09:08:50 SERVER [12380.068260] [] Feb 1 09:08:50 SERVER do_softirq+0x32/0x36 Feb 1 09:08:50 SERVER [12380.068279] [] Feb 1 09:08:50 SERVER irq_exit+0x38/0x6b Feb 1 09:08:50 SERVER [12380.068298] [] Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80 Feb 1 09:08:50 SERVER [12380.068319] [] Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30 Feb 1 09:08:50 SERVER [12380.068343] [] Feb 1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40 Feb 1 09:08:50 SERVER [12380.068365] [] Feb 1 09:08:50 SERVER mwait_idle+0x0/0xa Feb 1 09:08:50 SERVER [12380.068384] [] Feb 1 09:08:50 SERVER cpu_idle+0x98/0xb9 Feb 1 09:08:50 SERVER [12380.068403] [] Feb 1 09:08:50 SERVER start_kernel+0x2d7/0x2df Feb 1 09:08:50 SERVER [12380.068422] [] Feb 1 09:08:50 SERVER unknown_bootoption+0x0/0x195 Feb 1 09:08:50 SERVER [12380.068444] === Feb 1 09:08:50 SERVER [12380.068460] Code: Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 08 Feb 1 09:08:50 SERVER 39 Feb 1 09:08:50 SERVER d9 Feb 1 09:08:50 SERVER 0f Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 00 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 04 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER a8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 75 Feb 1 09:08:50 SERVER 14 Feb 1 09:08:50 SERVER 83 Feb 1 09:08:50 SERVER c8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER ea Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 89 Feb 1 09:08:50 SERVER f0 Feb 1 09:08:50 SERVER 83 Feb 1 09:08:50 SERVER 26 Feb 1 09:08:50 SERVER fe Feb 1 09:08:50 SERVER e8 Feb 1 09:08:50 SERVER 1e Feb 1 09:08:50 SERVER fd Feb 1 09:08:50 SERVER ff Feb 1 09:08:50 SERVER ff Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 4e Feb 1 09:08:50 SERVER 04 Feb 1 07:08:49 SERVER unparseable log message: "<8b> " Feb 1 09:08:50 SERVER 59 Feb 1 09:08:50 SERVER 08 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER db Feb 1 09:08:50 SERVER 74 Feb 1 09:08:50 SERVER 06 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 03 Feb 1 09:08:50 SERVER a8 Feb 1 09:08:50 SERVER 01 Feb 1 09:08:50 SERVER 74 Feb 1 09:08:50 SERVER 15 Feb 1 09:08:50 SERVER 8b Feb 1 09:08:50 SERVER 41 Feb 1 09:08:50 SERVER 04 Feb 1 09:08:50 SERVER 85 Feb 1 09:08:50 SERVER c0 Feb 1 09:08:50 SERVER 0f Feb 1 09:08:50 SERVER 84 Feb 1 09:08:50 SERVER c6 Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.068753] EIP: [] Feb 1 09:08:50 SERVER rb_erase+0x110/0x22f Feb 1 09:08:50 SERVER SS:ESP 0068:c037fda8 Feb 1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal exception in interrupt -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
pppoe, /proc/net/pppoe wrong (extra entries)
Hi again I notice strange thing, with /proc/net/pppoe, not sure if it is bug, but for me it looks wrong. cat /proc/net/pppoe there is normal entries of users, but at the end i have 0D00 00:16:D3:0B:F9:34 eth1 4000 00:50:22:00:1C:FC eth1 7E00 00:03:47:BD:34:25 eth1 7E00 00:03:47:BD:34:25 eth1 7E00 00:03:47:BD:34:25 eth1 7E00 00:03:47:BD:34:25 eth1 7E00 00:03:47:BD:34:25 eth1 and last entry duplicates till end. i have script to get customers interfaces, so i am using it to calculate amount of users logged in defaulthost ~ #cat /proc/net/pppoe |grep -i '00:03:47:BD:34:25'|wc -l 40 defaulthost ~ #cat /proc/net/pppoe |wc -l 113 defaulthost ~ #pppctrl |wc -l 73 It means there is 40 extra entries. 00:03:47:BD:34:25 host have established, but only one session. I am seeing similar issue on all remaining pppoe servers, extra entries with same mac at the end. If you need more info or access, please let me know. -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING, tcp_fastretrans_alert, rc6-git11
Just got on one of proxies, under high load. It is a bit old rc, so probably my report not interesting, but since it is production machines, i cannot change too often. Kernel is 2.6.24-rc6-git11 Some sysctl adjustments done. Please tell me if need more information. There is rules in iptables (if it is interesting) Chain PREROUTING (policy ACCEPT 209M packets, 19G bytes) pkts bytes target prot opt in out source destination 0 0 DROP tcp -- eth+ * 0.0.0.0/00.0.0.0/ 0 tcp dpt:1 Chain POSTROUTING (policy ACCEPT 120M packets, 7408M bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 18240 packets, 22M bytes) pkts bytes target prot opt in out source destination << some local networks skipped, not important, similar ACCEPT as next >> 200K 245M ACCEPT all -- * * 0.0.0.0/0 172.16.0.0/16 3930K 236M REDIRECT tcp -- * eth00.0.0.0/00.0.0.0/ 0 tcp flags:0x17/0x02 TOS match 0x04 redir ports 2 112M 6720M REDIRECT tcp -- * eth00.0.0.0/00.0.0.0/ 0 tcp dpt:80 flags:0x17/0x02 redir ports 1 116K 6953K REDIRECT tcp -- * eth00.0.0.0/00.0.0.0/ 0 OWNER UID match 101 tcp flags:0x17/0x02 redir ports 1 [9561199.893090] WARNING: at net/ipv4/tcp_input.c:2391 tcp_fastretrans_alert() [9561199.893161] Pid: 32283, comm: squid Not tainted 2.6.24-rc6-git11-build- 0020 #9 [9561199.893277] [] tcp_ack+0xd32/0x18cc [9561199.893398] [] ipt_do_table+0x416/0x474 [ip_tables] [9561199.893479] [] tcp_rcv_established+0xca/0x7ad [9561199.893566] [] tcp_v4_do_rcv+0x2b/0x330 [9561199.893636] [] nf_ct_deliver_cached_events+0x3e/0x90 [nf_conntrack] [9561199.893759] [] tcp_v4_rcv+0x7c4/0x80f [9561199.893862] [] ip_local_deliver_finish+0xd9/0x148 [9561199.893932] [] ip_rcv_finish+0x2bb/0x2da [9561199.894004] [] ip_rcv+0x1fc/0x237 [9561199.894063] [] ip_rcv_finish+0x0/0x2da [9561199.894122] [] ip_rcv+0x0/0x237 [9561199.894183] [] netif_receive_skb+0x376/0x3e2 [9561199.894273] [] e1000_clean_rx_irq+0x379/0x445 [e1000] [9561199.894388] [] e1000_clean_rx_irq+0x0/0x445 [e1000] [9561199.894462] [] e1000_clean+0x67/0x1f8 [e1000] [9561199.894547] [] net_rx_action+0x8d/0x17c [9561199.894632] [] __do_softirq+0x5d/0xc1 [9561199.894698] [] do_softirq+0x32/0x36 [9561199.894755] [] irq_exit+0x38/0x6b [9561199.894813] [] do_IRQ+0x5c/0x73 [9561199.894867] [] sys_read+0x5f/0x67 [9561199.894936] [] common_interrupt+0x23/0x28 [9561199.895040] === -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html