Re: 4.19.4 nf_conntrack_count kernel panic

2018-11-26 Thread Denys Fedoryshchenko

On 2018-11-26 21:46, Sami Farin wrote:

4.18.20 works OK, but unfortunately 4.18 series is EOL.
I have Ryzen 1600X, 32 GB RAM, Fedora 28, gcc-8.2.1-5, nosmt=force,
igb module for Intel I211,
using XFS filesystems only.

To reproduce, I only do this: connect to VPN using a tunnel (e.g. 
tun0),

start downloading a file with qbittorrent (allow port for incoming
TCP connections in qbittorrent and iptables) and wait a couple of 
minutes.

I am also using ipset and connlimit modules.
I reproduced this bug three times.
With 4.18 I use fq+htb and  with 4.19 I use CAKE for traffic control.

Only this message in kernel log:
[  363.935074] TCP: request_sock_TCP: Possible SYN flooding on port
19044. Dropping request.  Check SNMP counters.
I get this message with both 4.18.20 and 4.19.4.

RIP: 0010:rb_insert_color+0x64
Call Trace:
  nf_conntrack_count [nf_conncount]
  ip_set_test [ip_set]
  connlimit_mt [xt_connlimit]
  set_match_v4 [xt_set]
  ipt_do_table [ip_tables]
  ip_route_input_noref
  nf_hook_slow
  ip_local_deliver
  inet_add_protocol
  ip_rcv
  ip_rcv_finish_core
  __netif_receive_skb_one_core
  netif_receive_skb_internal
  tun_rx_batched
  tun_get_user
  __local_bh_enable_ip
  tun_get_user
  tun_chr_write_iter
  __vfs_write
  vfs_write
  ksys_write
  do_syscall_64
  trace_hardirqs_off_thunk
  entry_SYSCALL_64_after_hwframe

...

Kernel panic - not syncing: Fatal exception in interrupt


Check this patches:
https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=73972&state=*

Relevant discussion:
https://marc.info/?l=linux-netdev&m=154211826106430&w=2


4.15.13 kernel panic, ip_rcv_finish, nf_xfrm_me_harder warnings continue to fill dmesg

2018-04-11 Thread Denys Fedoryshchenko

Apr 11 18:01:34[99194.935520] general protection fault:  [#1] SMP
Apr 11 18:01:34[99194.935998] Modules linked in: pppoe pppox ppp_generic 
slhc ip_set_hash_net xt_nat xt_string xt_connmark xt_TCPMSS xt_mark 
xt_CT xt_set xt_tcpudp ip_set_bitmap_port ip_set nfnetlink
iptable_raw iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 
netconsole configfs 8021q garp mrp stp llc ixgbe dca ipv6
Apr 11 18:01:34[99194.938313] CPU: 23 PID: 150 Comm: ksoftirqd/23 
Tainted: GW4.15.13-build-0135 #4
Apr 11 18:01:34[99194.939258] Hardware name: Intel Corporation 
S2600GZ/S2600GZ, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014

Apr 11 18:01:34[99194.940189] RIP: 0010:ip_rcv_finish+0x2b5/0x2e5
Apr 11 18:01:34[99194.940716] RSP: 0018:c9000cad7cf8 EFLAGS: 
00010286
Apr 11 18:01:34[99194.941214] RAX: 00e2476d RBX: 
88178f944400 RCX: 88179a32a800
Apr 11 18:01:34[99194.941741] RDX: 88178f944400 RSI: 
 RDI: 88178f944400
Apr 11 18:01:34[99194.942234] RBP: 882fd580d000 R08: 
0001 R09: 882fd034ee00
Apr 11 18:01:34[99194.942771] R10: c9000cad7b58 R11: 
e92316b9 R12: 88179a32a8d6
Apr 11 18:01:34[99194.943286] R13: 882fd580d000 R14: 
00ea0008 R15: 882fd580d078
Apr 11 18:01:34[99194.943821] FS:  () 
GS:88303fcc() knlGS:
Apr 11 18:01:34[99194.944779] CS:  0010 DS:  ES:  CR0: 
80050033
Apr 11 18:01:34[99194.945287] CR2: 7f8bb37888f0 CR3: 
00303e209003 CR4: 001606e0

Apr 11 18:01:34[99194.945808] Call Trace:
Apr 11 18:01:34[99194.946307]  ip_rcv+0x2f2/0x325
Apr 11 18:01:34[99194.946816]  ? ip_local_deliver_finish+0x187/0x187
Apr 11 18:01:34[99194.947331]  __netif_receive_skb_core+0x81c/0x89c
Apr 11 18:01:34[99194.947872]  ? napi_complete_done+0xb4/0xba
Apr 11 18:01:34[99194.948391]  ? ixgbe_poll+0xf96/0x104d [ixgbe]
Apr 11 18:01:34[99194.948931]  ? process_backlog+0x8b/0x10d
Apr 11 18:01:34[99194.949424]  process_backlog+0x8b/0x10d
Apr 11 18:01:34[99194.949953]  net_rx_action+0x127/0x2b5
Apr 11 18:01:34[99194.950445]  __do_softirq+0xc1/0x1b1
Apr 11 18:01:34[99194.950951]  ? sort_range+0x17/0x17
Apr 11 18:01:34[99194.951442]  run_ksoftirqd+0x11/0x22
Apr 11 18:01:34[99194.951972]  smpboot_thread_fn+0x121/0x136
Apr 11 18:01:34[99194.952489]  kthread+0xfd/0x105
Apr 11 18:01:34[99194.953018]  ? kthread_create_on_node+0x3a/0x3a
Apr 11 18:01:34[99194.953528]  ret_from_fork+0x1f/0x30
Apr 11 18:01:34[99194.954047] Code: 15 77 9e 99 00 83 7a 7c 00 75 37 83 
b8 2c 01 00 00 00 75 2e 48 8b 43 58 48 89 df 5b 5d 48 83 e0 fe 41 5c 41 
5d 41 5e 48 8b 40 50  e0 83 f8 ee 75 10 49 8b 84 24

90 01 00 00 65 48 ff 80 40 02
Apr 11 18:01:34[99194.955449] RIP: ip_rcv_finish+0x2b5/0x2e5 RSP: 
c9000cad7cf8

Apr 11 18:01:34[99194.956008] ---[ end trace 312b0bf537b4709a ]---
Apr 11 18:01:34[99195.007900] Kernel panic - not syncing: Fatal 
exception in interrupt

Apr 11 18:01:34[99195.008400] Kernel Offset: disabled
Apr 11 18:01:34[99195.013950] Rebooting in 5 seconds..
--


and i reported before about warnings in nf_frm_me_harder, but probably 
nobody have interest to take a look, and it is seems plaguing 4.15.x and 
nearby versions kernels . Here is one of such warnings.


---
Apr 11 00:00:17[34320.802349] dst_release: dst:b32dca17 
refcnt:-2
Apr 11 00:00:19[34323.018468] WARNING: CPU: 7 PID: 0 at 
./include/net/dst.h:256 nf_xfrm_me_harder+0x62/0xfe [nf_nat]
Apr 11 00:00:19[34323.019357] Modules linked in: pppoe pppox ppp_generic 
slhc ip_set_hash_net xt_nat xt_string xt_connmark xt_TCPMSS xt_mark 
xt_CT xt_set xt_tcpudp ip_set_bitmap_port ip_set nfnetlink
iptable_raw iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables 
netconsole configfs 8021q garp mrp stp llc ixgbe dca ipv6
Apr 11 00:00:19[34323.021503] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G   
 W4.15.13-build-0135 #4
Apr 11 00:00:19[34323.022380] Hardware name: Intel Corporation 
S2600GZ/S2600GZ, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014
Apr 11 00:00:19[34323.023261] RIP: 0010:nf_xfrm_me_harder+0x62/0xfe 
[nf_nat]
Apr 11 00:00:19[34323.023737] RSP: 0018:88303fa43c90 EFLAGS: 
00010246
Apr 11 00:00:19[34323.024218] RAX:  RBX: 
8817b2c35200 RCX: 
Apr 11 00:00:19[34323.024703] RDX: 0002 RSI: 
88178fab3700 RDI: 88303fa43cd0
Apr 11 00:00:19[34323.025214] RBP: 822a6180 R08: 
0005 R09: 0001
Apr 11 00:00:19[34323.025717] R10: 00d6 R11: 
8817c945bca0 R12: 0001
Apr 11 00:00:19[34323.026214] R13: 88303fa43d60 R14: 
00ce0008 R15: 8817b7477078
Apr 11 00:00:19[34323.026736] FS:  () 
GS:88303fa4() knlGS:
Apr 11 00:00:19[34323.027680] CS

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-03-03 Thread Denys Fedoryshchenko

On 2018-03-02 19:43, Guillaume Nault wrote:

On Thu, Mar 01, 2018 at 10:07:05PM +0200, Denys Fedoryshchenko wrote:

On 2018-03-01 22:01, Guillaume Nault wrote:
> diff --git a/drivers/net/ppp/ppp_generic.c
> b/drivers/net/ppp/ppp_generic.c
> index 255a5def56e9..2acf4b0eabd1 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int
> unit)
>goto outl;
>
>ppp_lock(ppp);
> +  spin_lock_bh(&pch->downl);
> +  if (!pch->chan) {
> +  /* Don't connect unregistered channels */
> +  ppp_unlock(ppp);
> +  spin_unlock_bh(&pch->downl);


This is obviously wrong. It should have been
+   spin_unlock_bh(&pch->downl);
+   ppp_unlock(ppp);

Sorry, I shouldn't have hurried.
This is fixed in the official version.


> +  ret = -ENOTCONN;
> +  goto outl;
> +  }
> +  spin_unlock_bh(&pch->downl);
>if (pch->file.hdrlen > ppp->file.hdrlen)
>ppp->file.hdrlen = pch->file.hdrlen;
>hdrlen = pch->file.hdrlen + 2;   /* for protocol bytes */
Ok, i will try to test that at night.
Thanks a lot! For me also problem solved anyway by removing 
unit-cache, just

i think it's nice to have bug fixed :)

I think this bug has been there forever, indeed it's good to have it 
fixed.

Thanks a lot for your help (and patience!).

FYI, if you see accel-ppp logs like
"ioctl(PPPIOCCONNECT): Transport endpoint is not connected", then that
means the patch prevented the scenario that was leading to the original
crash.

Out of curiosity, did unit-cache really bring performance improvements
on your workload?
On old kernels it definitely did, due local specifics (electricity 
outages) i might have few thousands of interfaces deleted and created 
again in short period of time.
And before interfaces creation/deletion (especially when there is 
thousands of them) was very expensive.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-03-01 Thread Denys Fedoryshchenko



On 2018-03-01 22:01, Guillaume Nault wrote:

On Tue, Feb 27, 2018 at 07:56:27PM +0100, Guillaume Nault wrote:

On Tue, Feb 27, 2018 at 12:58:55PM +0200, Denys Fedoryshchenko wrote:
> On 2018-02-23 12:07, Guillaume Nault wrote:
> > On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote:
> > > On 2018-02-23 11:38, Guillaume Nault wrote:
> > > > On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote:
> > > > > I'm using accel-ppp that has unit-cache option, i guess for
> > > > > "reusing" ppp
> > > > > interfaces (because creating a lot of interfaces on BRAS with 8k
> > > > > users quite
> > > > > expensive).
> > > > > Maybe it is somehow related and can be that scenario causing this bug?
> > > > >
> > > > Indeed, it'd be interesting to know if unit-cache is part of the
> > > > equation (if it's workable for you to disable it).
> > > Already did that and testing, unfortunately i had to disable KASAN
> > > and full
> > > refcount, as performance hit is too heavy for me. I will try to
> > > enable KASAN
> > > alone tomorrow.
> > >
> > Don't hesitate to post the result even if you can't afford enabling
> > KASAN.
> Till now 4 days and no reboots.
>
That unit-cache information was very useful. I can now reproduce the
issue and work on a fix.


You can try the following patch.

Sorry for the delay, I'm a bit out of time these days.

diff --git a/drivers/net/ppp/ppp_generic.c 
b/drivers/net/ppp/ppp_generic.c

index 255a5def56e9..2acf4b0eabd1 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int 
unit)

goto outl;

ppp_lock(ppp);
+   spin_lock_bh(&pch->downl);
+   if (!pch->chan) {
+   /* Don't connect unregistered channels */
+   ppp_unlock(ppp);
+   spin_unlock_bh(&pch->downl);
+   ret = -ENOTCONN;
+   goto outl;
+   }
+   spin_unlock_bh(&pch->downl);
if (pch->file.hdrlen > ppp->file.hdrlen)
ppp->file.hdrlen = pch->file.hdrlen;
hdrlen = pch->file.hdrlen + 2;   /* for protocol bytes */

Ok, i will try to test that at night.
Thanks a lot! For me also problem solved anyway by removing unit-cache, 
just i think it's nice to have bug fixed :)


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-27 Thread Denys Fedoryshchenko

On 2018-02-23 12:07, Guillaume Nault wrote:

On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote:

On 2018-02-23 11:38, Guillaume Nault wrote:
> On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote:
> > I'm using accel-ppp that has unit-cache option, i guess for
> > "reusing" ppp
> > interfaces (because creating a lot of interfaces on BRAS with 8k
> > users quite
> > expensive).
> > Maybe it is somehow related and can be that scenario causing this bug?
> >
> Indeed, it'd be interesting to know if unit-cache is part of the
> equation (if it's workable for you to disable it).
Already did that and testing, unfortunately i had to disable KASAN and 
full
refcount, as performance hit is too heavy for me. I will try to enable 
KASAN

alone tomorrow.

Don't hesitate to post the result even if you can't afford enabling 
KASAN.

Till now 4 days and no reboots.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-24 Thread Denys Fedoryshchenko

On 2018-02-23 12:07, Guillaume Nault wrote:

On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote:

On 2018-02-23 11:38, Guillaume Nault wrote:
> On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote:
> > I'm using accel-ppp that has unit-cache option, i guess for
> > "reusing" ppp
> > interfaces (because creating a lot of interfaces on BRAS with 8k
> > users quite
> > expensive).
> > Maybe it is somehow related and can be that scenario causing this bug?
> >
> Indeed, it'd be interesting to know if unit-cache is part of the
> equation (if it's workable for you to disable it).
Already did that and testing, unfortunately i had to disable KASAN and 
full
refcount, as performance hit is too heavy for me. I will try to enable 
KASAN

alone tomorrow.

Don't hesitate to post the result even if you can't afford enabling 
KASAN.

Very likely unit-cache is major contributor to this reboots.
After disabling it, it is almost 48h and no reboots yet.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-23 Thread Denys Fedoryshchenko

On 2018-02-23 12:07, Guillaume Nault wrote:

On Fri, Feb 23, 2018 at 11:41:43AM +0200, Denys Fedoryshchenko wrote:

On 2018-02-23 11:38, Guillaume Nault wrote:
> On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote:
> > I'm using accel-ppp that has unit-cache option, i guess for
> > "reusing" ppp
> > interfaces (because creating a lot of interfaces on BRAS with 8k
> > users quite
> > expensive).
> > Maybe it is somehow related and can be that scenario causing this bug?
> >
> Indeed, it'd be interesting to know if unit-cache is part of the
> equation (if it's workable for you to disable it).
Already did that and testing, unfortunately i had to disable KASAN and 
full
refcount, as performance hit is too heavy for me. I will try to enable 
KASAN

alone tomorrow.

Don't hesitate to post the result even if you can't afford enabling 
KASAN.
For sure, i am expecting it to crash even if KASAN not enabled (just i 
wont have clean message what is reason).
Usually it happened for me within 6-10 hours after upgrade at night, 
when load started to increase, i prefer to wait

48h at least, even if no crash.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-23 Thread Denys Fedoryshchenko

On 2018-02-23 11:38, Guillaume Nault wrote:

On Thu, Feb 22, 2018 at 08:51:19PM +0200, Denys Fedoryshchenko wrote:
I'm using accel-ppp that has unit-cache option, i guess for "reusing" 
ppp
interfaces (because creating a lot of interfaces on BRAS with 8k users 
quite

expensive).
Maybe it is somehow related and can be that scenario causing this bug?


Indeed, it'd be interesting to know if unit-cache is part of the
equation (if it's workable for you to disable it).
Already did that and testing, unfortunately i had to disable KASAN and 
full refcount, as performance hit is too heavy for me. I will try to 
enable KASAN alone tomorrow.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-22 Thread Denys Fedoryshchenko

On 2018-02-22 20:30, Guillaume Nault wrote:

On Wed, Feb 21, 2018 at 12:04:30PM -0800, Cong Wang wrote:
On Thu, Feb 15, 2018 at 11:31 AM, Guillaume Nault 
 wrote:

> On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
>> On 2018-02-15 17:55, Guillaume Nault wrote:
>> > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
>> > > Here we go:
>> > >
>> > >   [24558.921549]
>> > > ==
>> > >   [24558.922167] BUG: KASAN: use-after-free in
>> > > ppp_ioctl+0xa6a/0x1522
>> > > [ppp_generic]
>> > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
>> > > accel-pppd/12622
>> > >   [24558.923113]
>> > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
>> > > W
>> > > 4.15.3-build-0134 #1
>> > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
>> > > BIOS P80
>> > > 04/02/2015
>> > >   [24558.924406] Call Trace:
>> > >   [24558.924753]  dump_stack+0x46/0x59
>> > >   [24558.925103]  print_address_description+0x6b/0x23b
>> > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > >   [24558.925797]  kasan_report+0x21b/0x241
>> > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
>> > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
>> > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
>> > >   [24558.927523]  ? kernel_read+0xed/0xed
>> > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
>> > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
>> > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
>> > >   [24558.928898]  vfs_ioctl+0x6e/0x81
>> > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
>> > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
>> > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
>> > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
>> > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
>> > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
>> > >   [24558.931252]  SyS_ioctl+0x39/0x55
>> > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
>> > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
>> > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.932627] RIP: 0033:0x7f302849d8a7
>> > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
>> > > ORIG_RAX:
>> > > 0010
>> > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
>> > > 7f302849d8a7
>> > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
>> > > 3a67
>> > >   [24558.934266] RBP: 7f3029a52b20 R08:  R09:
>> > > 55c8308d8e40
>> > >   [24558.934607] R10: 0008 R11: 0206 R12:
>> > > 7f3023f49358
>> > >   [24558.934947] R13: 7ffe86e5723f R14:  R15:
>> > > 7f3029a53700
>> > >   [24558.935288]
>> > >   [24558.935626] Allocated by task 12622:
>> > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
>> > > [ppp_generic]
>> > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
>> > >   [24558.936640]  SyS_connect+0x14b/0x1b7
>> > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
>> > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.937655]
>> > >   [24558.937993] Freed by task 12622:
>> > >   [24558.938321]  kfree+0xb0/0x11d
>> > >   [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
>> > >   [24558.938994]  __fput+0x2ba/0x51a
>> > >   [24558.939332]  task_work_run+0x11c/0x13d
>> > >   [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
>> > >   [24558.940022]  do_syscall_64+0x2ea/0x31f
>> > >   [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.947099]
>> >
>> > Your first guess was right. It looks like we have an issue with
>> > reference counting on the channels. Can you send me your ppp_generic.o?
>> http://nuclearcat.com/ppp_generic.o
>> Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
>>
> From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called
> concurrently on the same ppp_file. Even if this pp

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-21 Thread Denys Fedoryshchenko

On 2018-02-21 20:55, Guillaume Nault wrote:

On Wed, Feb 21, 2018 at 12:26:51PM +0200, Denys Fedoryshchenko wrote:
It seems even rebuilding seemingly stable version triggering crashes 
too

(but different ones)

Different ones? The trace following your message looks very similar to
your first KASAN report. Or are you refering to the lockup you posted
on Sun, 18 Feb 2018?

Also, which stable versions are you refering to?
Trace i sent in previous email - is latest kernel, vanilla, just more 
debug options and few options disabled.
One of disabled was spitting some errors (it is obviously bug), 
CONFIG_XFRM, in nf_xfrm_me_harder (i reported about it).

And i disabled namespaces, as they are often source of trouble.

Today i will try to revert just:
drivers, net, ppp: convert asyncppp.refcnt from atomic_t to refcount_t
drivers, net, ppp: convert syncppp.refcnt from atomic_t to refcount_t
drivers, net, ppp: convert ppp_file.refcnt from atomic_t to  refcount_t

Because i suspect previously, after reverting this patches i got 
different kernel
panic (and i didn't noticed that, now too late to identify between other 
crashes),

seems it was not KASAN.
I will report results after testing, unfortunately i can't test it more 
than once per day.


"Stable" for me was 4.14.2 - but it looks like on that kernel i am 
getting different issue now.

I will paste it below.

Another observation, just hour ago, i noticed on another server, where i 
am testing 4.15, and 4.14.20
(at moment of testing 4.14.20, but no debug at that moment), when i 
killed accel-pppd (pppoe server software),
with 8k sessions online, i got some weird behaviour, accel-pppd process 
got stuck, same as
ifconfig and "ip link", and even kexec -e didn't worked(got stuck too), 
unless i did kexec -e -x

(so it wont try to make interfaces down on kexec).
I will try to reproduce this bug as well, with debug enabled (lockdep 
and so) i hope it is not related.




I'm interested in the ppp_generic.o file that produced the following
trace. Just to be sure that the differences come from the new debugging
options.

Also kernel config:
https://nuclearcat.com/bughunting/config.txt
https://nuclearcat.com/bughunting/ppp_generic.o

This is in 4.14.2, was seemingly stable before:

[50401.388670] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 1 timed out
[50401.389014] [ cut here ]
[50401.389340] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 
dev_watchdog+0x15c/0x1b9
[50401.389925] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
[50401.391869] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
4.14.2-build-0134 #4
[50401.392191] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[50401.392513] task: 880434d72640 task.stack: c90001914000
[50401.392836] RIP: 0010:dev_watchdog+0x15c/0x1b9
[50401.393155] RSP: 0018:8804364c3e90 EFLAGS: 00010286
[50401.393470] RAX: 0039 RBX: 88042f6e RCX: 

[50401.393787] RDX: 0001 RSI: 0002 RDI: 
828dbc64
[50401.394103] RBP: 8804364c3eb0 R08: 0001 R09: 

[50401.394420] R10: 0002 R11: 8803fa075c00 R12: 
0001
[50401.394739] R13: 0040 R14: 0003 R15: 
81e05108
[50401.395064] FS:  () GS:8804364c() 
knlGS:

[50401.395645] CS:  0010 DS:  ES:  CR0: 80050033
[50401.395970] CR2: 7fff25fc20a8 CR3: 01e09005 CR4: 
001606e0

[50401.396294] Call Trace:
[50401.396613]  
[50401.396934]  ? qdisc_rcu_free+0x3f/0x3f
[50401.397255]  call_timer_fn.isra.4+0x17/0x7b
[50401.397576]  expire_timers+0x6f/0x7e
[50401.397899]  run_timer_softirq+0x6d/0x8f
[50401.398219]  ? ktime_get+0x3b/0x8c
[50401.398540]  ? lapic_next_event+0x18/0x1c
[50401.398862]  ? clockevents_program_event+0xa3/0xbb
[50401.399186]  __do_softirq+0xbc/0x1ab
[50401.399510]  irq_exit+0x4d/0x8e
[50401.399832]  smp_apic_timer_interrupt+0x73/0x80
[50401.400157]  apic_timer_interrupt+0x8d/0xa0
[50401.400480]  
[50401.400801] RIP: 0010:mwait_idle+0x4e/0x61
[50401.401123] RSP: 0018:c90001917ec0 EFLAGS: 0246 ORIG_RAX: 
ff10
[50401.401714] RAX:  RBX: 880434d72640 RCX: 

[50401.402037] RDX:  RSI:  RDI: 

[50401.402362] RBP: c90001917ec0 R08:  R09: 
0001
[50401.402685] R10: c90001917e58 R11:

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-21 Thread Denys Fedoryshchenko
It seems even rebuilding seemingly stable version triggering crashes too 
(but different ones)
Maybe it is coincidence, and bug reproducer appeared in network same 
time i decided to upgrade kernel,

as it happened with xt_MSS(and that bug existed for years).

Deleted quoting, i added more debug options (as much as performance 
degradation allows me).

This is vanilla again:

[14834.090421] 
==

[14834.091157] BUG: KASAN: use-after-free in __list_add_valid+0x69/0xad
[14834.091521] Read of size 8 at addr 8803dbeb8660 by task 
accel-pppd/12636

[14834.091905]
[14834.092282] CPU: 0 PID: 12636 Comm: accel-pppd Not tainted 
4.15.4-build-0134 #1
[14834.092930] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[14834.093320] Call Trace:
[14834.093680]  dump_stack+0xb3/0x13e
[14834.094050]  ? _atomic_dec_and_lock+0x10f/0x10f
[14834.094434]  print_address_description+0x69/0x236
[14834.094814]  ? __list_add_valid+0x69/0xad
[14834.095197]  kasan_report+0x219/0x23f
[14834.095570]  __list_add_valid+0x69/0xad
[14834.095957]  ppp_ioctl+0x1216/0x2201 [ppp_generic]
[14834.096348]  ? ppp_write+0x1cc/0x1cc [ppp_generic]
[14834.096723]  ? get_usage_char.isra.2+0x36/0x36
[14834.097094]  ? packet_poll+0x362/0x362
[14834.097455]  ? lock_downgrade+0x4d0/0x4d0
[14834.097811]  ? rcu_irq_enter_disabled+0x8/0x8
[14834.098187]  ? get_usage_char.isra.2+0x36/0x36
[14834.098561]  ? __fget+0x3b8/0x3eb
[14834.098936]  ? get_usage_char.isra.2+0x36/0x36
[14834.099309]  ? __fget+0x3a0/0x3eb
[14834.099682]  ? get_usage_char.isra.2+0x36/0x36
[14834.100069]  ? __fget+0x3a0/0x3eb
[14834.100443]  ? lock_downgrade+0x4d0/0x4d0
[14834.100814]  ? rcu_irq_enter_disabled+0x8/0x8
[14834.101203]  ? __fget+0x3b8/0x3eb
[14834.101581]  ? expand_files+0x62f/0x62f
[14834.101945]  ? kernel_read+0xed/0xed
[14834.102322]  ? SyS_getpeername+0x28b/0x28b
[14834.102690]  vfs_ioctl+0x6e/0x81
[14834.103049]  do_vfs_ioctl+0xe2f/0xe62
[14834.103413]  ? ioctl_preallocate+0x211/0x211
[14834.103778]  ? __fget_light+0x28c/0x2ca
[14834.104150]  ? iterate_fd+0x2a8/0x2a8
[14834.104526]  ? SyS_rt_sigprocmask+0x12e/0x181
[14834.104876]  ? sigprocmask+0x23f/0x23f
[14834.105231]  ? SyS_write+0x148/0x173
[14834.105580]  ? SyS_read+0x173/0x173
[14834.105943]  SyS_ioctl+0x39/0x55
[14834.106316]  ? do_vfs_ioctl+0xe62/0xe62
[14834.106694]  do_syscall_64+0x262/0x594
[14834.107076]  ? syscall_return_slowpath+0x351/0x351
[14834.107447]  ? up_read+0x17/0x2c
[14834.107806]  ? __do_page_fault+0x68a/0x763
[14834.108171]  ? entry_SYSCALL_64_after_hwframe+0x36/0x9b
[14834.108550]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[14834.108937]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
[14834.109293] RIP: 0033:0x7fc9be3758a7
[14834.109652] RSP: 002b:7fc9bf92aaf8 EFLAGS: 0206 ORIG_RAX: 
0010
[14834.110313] RAX: ffda RBX: 7fc9bdc5e1e3 RCX: 
7fc9be3758a7
[14834.110707] RDX: 7fc9b7ad13e8 RSI: 4004743a RDI: 
4b9f
[14834.111082] RBP: 7fc9bf92ab20 R08:  R09: 
55f07a27fe40
[14834.111471] R10: 0008 R11: 0206 R12: 
7fc9b7ad12d8
[14834.111845] R13: 7ffd06346a6f R14:  R15: 
7fc9bf92b700

[14834.112231]
[14834.112589] Allocated by task 12636:
[14834.112962]  ppp_register_net_channel+0xc4/0x610 [ppp_generic]
[14834.113331]  pppoe_connect+0xe6d/0x1097 [pppoe]
[14834.113691]  SyS_connect+0x19c/0x274
[14834.114054]  do_syscall_64+0x262/0x594
[14834.114421]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
[14834.114792]
[14834.115139] Freed by task 12636:
[14834.115504]  kfree+0xe2/0x154
[14834.115866]  ppp_release+0x11b/0x12a [ppp_generic]
[14834.116240]  __fput+0x342/0x5ba
[14834.116611]  task_work_run+0x15d/0x198
[14834.116973]  exit_to_usermode_loop+0xc7/0x153
[14834.117320]  do_syscall_64+0x53d/0x594
[14834.117694]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
[14834.118067]
[14834.118426] The buggy address belongs to the object at 
8803dbeb8480

[14834.119087] The buggy address is located 480 bytes inside of
[14834.119755] The buggy address belongs to the page:
[14834.120138] page:ea000f6fae00 count:1 mapcount:0 mapping: 
 (null) index:0x8803dbebd580 compound_mapcount: 0

[14834.120817] flags: 0x17ffe0008100(slab|head)
[14834.121171] raw: 17ffe0008100  8803dbebd580 
0001001c001b
[14834.121800] raw: ea000d718020 ea000d32d620 8803f080ee80 


[14834.122415] page dumped because: kasan: bad access detected
[14834.122787]
[14834.123140] Memory state around the buggy address:
[14834.123503]  8803dbeb8500: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[14834.124150]  8803dbeb8580: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[14834.124806] >8803dbeb8600: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb

[14834.125467]^
[14834.125848]  8803dbeb8680: fb fb fb fb fb 

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-20 Thread Denys Fedoryshchenko

On 2018-02-16 20:48, Guillaume Nault wrote:

On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-15 21:42, Guillaume Nault wrote:
> On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote:
> > On 2018-02-15 21:31, Guillaume Nault wrote:
> > > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
> > > > On 2018-02-15 17:55, Guillaume Nault wrote:
> > > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
> > > > > > Here we go:
> > > > > >
> > > > > >   [24558.921549]
> > > > > > ==
> > > > > >   [24558.922167] BUG: KASAN: use-after-free in
> > > > > > ppp_ioctl+0xa6a/0x1522
> > > > > > [ppp_generic]
> > > > > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by 
task
> > > > > > accel-pppd/12622
> > > > > >   [24558.923113]
> > > > > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
> > > > > > W
> > > > > > 4.15.3-build-0134 #1
> > > > > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
> > > > > > BIOS P80
> > > > > > 04/02/2015
> > > > > >   [24558.924406] Call Trace:
> > > > > >   [24558.924753]  dump_stack+0x46/0x59
> > > > > >   [24558.925103]  print_address_description+0x6b/0x23b
> > > > > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > > > >   [24558.925797]  kasan_report+0x21b/0x241
> > > > > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > > > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
> > > > > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
> > > > > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
> > > > > >   [24558.927523]  ? kernel_read+0xed/0xed
> > > > > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
> > > > > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
> > > > > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
> > > > > >   [24558.928898]  vfs_ioctl+0x6e/0x81
> > > > > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
> > > > > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
> > > > > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
> > > > > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
> > > > > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
> > > > > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
> > > > > >   [24558.931252]  SyS_ioctl+0x39/0x55
> > > > > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
> > > > > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
> > > > > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > > > >   [24558.932627] RIP: 0033:0x7f302849d8a7
> > > > > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
> > > > > > ORIG_RAX:
> > > > > > 0010
> > > > > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 
RCX:
> > > > > > 7f302849d8a7
> > > > > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a 
RDI:
> > > > > > 3a67
> > > > > >   [24558.934266] RBP: 7f3029a52b20 R08:  
R09:
> > > > > > 55c8308d8e40
> > > > > >   [24558.934607] R10: 0008 R11: 0206 
R12:
> > > > > > 7f3023f49358
> > > > > >   [24558.934947] R13: 7ffe86e5723f R14:  
R15:
> > > > > > 7f3029a53700
> > > > > >   [24558.935288]
> > > > > >   [24558.935626] Allocated by task 12622:
> > > > > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
> > > > > > [ppp_generic]
> > > > > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
> > > > > >   [24558.936640]  SyS_connect+0x14b/0x1b7
> > > > > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
> > > > > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > > > >   [24558.937655]
> > > > > >   [24558.937993] Freed by task 12622:
> > > > > >   [24558.938321]  kfree+

a lot of WARNING, nf_xfrm_me_harder in 4.15.x

2018-02-18 Thread Denys Fedoryshchenko

Is there any bug with that or it is just some sort of spam?
Cause i am troubleshooting at same time "hard to catch" bug in ppp/pppoe

Workload: pppoe bras
I am going to try last stable 4.14.x after 1-2 days as well, but 
probably i noticed this message appeared there as well, under some 
conditions.


[   49.784216] WARNING: CPU: 4 PID: 0 at ./include/net/dst.h:256 
nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat]
[   49.784847] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 
xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
[   49.786762] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 
4.15.4-build-0134 #2
[   49.787104] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[   49.787448] RIP: 0010:nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat]
[   49.787782] RSP: 0018:8803f23078e0 EFLAGS: 00010246
[   49.788114] RAX:  RBX: 8803d8acad00 RCX: 
11007a875b00
[   49.788463] RDX: 11007b159500 RSI:  RDI: 
8803d8acad48
[   49.788818] RBP: 8803d43ada40 R08: ed007e460f27 R09: 
8803f2307900
[   49.789175] R10: ed007e460f26 R11: 0001 R12: 
11007e460f1c
[   49.789528] R13: 8803d43ada98 R14: 83e2b600 R15: 
8803d8acad80
[   49.789881] FS:  () GS:8803f230() 
knlGS:

[   49.790500] CS:  0010 DS:  ES:  CR0: 80050033
[   49.790850] CR2: 7f758e3aa490 CR3: 000445a0d001 CR4: 
001606e0

[   49.791192] Call Trace:
[   49.791517]  
[   49.791845]  ? __nf_nat_decode_session+0x108/0x108 [nf_nat]
[   49.792180]  ? nf_nat_ipv4_fn+0x33d/0x4df [nf_nat_ipv4]
[   49.792515]  ? iptable_nat_ipv4_fn+0xc/0xc [iptable_nat]
[   49.792849]  nf_nat_ipv4_out+0x235/0x305 [nf_nat_ipv4]
[   49.793183]  ? iptable_nat_ipv4_local_fn+0xc/0xc [iptable_nat]
[   49.793519]  nf_hook_slow+0xb1/0x11b
[   49.793850]  ip_output+0x205/0x243
[   49.794180]  ? ip_mc_output+0x548/0x548
[   49.794508]  ? ip_fragment.constprop.5+0x197/0x197
[   49.794841]  ? iptable_filter_net_init+0x1a/0x1a [iptable_filter]
[   49.795173]  ? nf_hook_slow+0xb1/0x11b
[   49.795504]  ip_forward+0xe9c/0xecb
[   49.795836]  ? ip_forward_finish+0x110/0x110
[   49.796166]  ? ip_frag_mem+0x3d/0x3d
[   49.796493]  ? ip_rcv_finish+0xcf8/0xd91
[   49.796830]  ip_rcv+0x985/0xa12
[   49.797178]  ? ip_local_deliver+0x225/0x225
[   49.797536]  ? ip_local_deliver_finish+0x599/0x599
[   49.797893]  ? ip_local_deliver+0x225/0x225
[   49.798254]  __netif_receive_skb_core+0x10ce/0x1c76
[   49.798613]  ? netif_set_xps_queue+0xbdb/0xbdb
[   49.798972]  ? process_backlog+0x1c5/0x3c0
[   49.799323]  process_backlog+0x1c5/0x3c0
[   49.799674]  net_rx_action+0x3aa/0x840
[   49.800026]  ? napi_complete_done+0x22b/0x22b
[   49.800378]  ? __tick_nohz_idle_enter+0x42b/0x9b3
[   49.800733]  ? get_cpu_iowait_time_us+0x16f/0x16f
[   49.801084]  __do_softirq+0x17f/0x34a
[   49.801411]  ? flush_smp_call_function_queue+0x16a/0x229
[   49.801750]  irq_exit+0x8f/0xf9
[   49.802080]  call_function_single_interrupt+0x92/0xa0
[   49.802420]  
[   49.802765] RIP: 0010:mwait_idle+0x99/0xac
[   49.803106] RSP: 0018:8803f0317ef8 EFLAGS: 0246 ORIG_RAX: 
ff04
[   49.803709] RAX:  RBX: 8803f02e4240 RCX: 

[   49.804042] RDX: 11007e05c848 RSI:  RDI: 

[   49.804372] RBP: 8803f02e4240 R08: 55574f086bb0 R09: 
7f78bd996700
[   49.804705] R10: 8803f0317dd0 R11: 0293 R12: 

[   49.805038] R13: dc00 R14: ed007e05c848 R15: 
8803f02e4240

[   49.805373]  do_idle+0xe6/0x19a
[   49.805700]  cpu_startup_entry+0x18/0x1a
[   49.806033]  secondary_startup_64+0xa5/0xb0
[   49.806359] Code: e0 07 83 c0 03 38 d0 7c 0c 84 d2 74 08 4c 89 ff e8 
65 3f 26 e1 8b 83 80 00 00 00 85 c0 74 0c 8d 50 01 f0 41 0f b1 17 74 04 
eb f0 <0f> ff 48 8d 7d 18 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1

[   49.807242] ---[ end trace 2654a347942730c3 ]---
[   49.807580] dst_release: dst:24366567 refcnt:-1
[  164.894058] WARNING: CPU: 5 PID: 22617 at ./include/net/dst.h:256 
nf_xfrm_me_harder+0x12d/0x2d7 [nf_nat]
[  164.894686] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 
xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter 
iptable_nat nf_conntrack_ipv4 nf_def

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-18 Thread Denys Fedoryshchenko

On 2018-02-16 20:48, Guillaume Nault wrote:

On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-15 21:42, Guillaume Nault wrote:
> On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote:
> > On 2018-02-15 21:31, Guillaume Nault wrote:
> > > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
> > > > On 2018-02-15 17:55, Guillaume Nault wrote:
> > > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
> > > > > > Here we go:
> > > > > >
> > > > > >   [24558.921549]
> > > > > > ==
> > > > > >   [24558.922167] BUG: KASAN: use-after-free in
> > > > > > ppp_ioctl+0xa6a/0x1522
> > > > > > [ppp_generic]
> > > > > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by 
task
> > > > > > accel-pppd/12622
> > > > > >   [24558.923113]
> > > > > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
> > > > > > W
> > > > > > 4.15.3-build-0134 #1
> > > > > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
> > > > > > BIOS P80
> > > > > > 04/02/2015
> > > > > >   [24558.924406] Call Trace:
> > > > > >   [24558.924753]  dump_stack+0x46/0x59
> > > > > >   [24558.925103]  print_address_description+0x6b/0x23b
> > > > > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > > > >   [24558.925797]  kasan_report+0x21b/0x241
> > > > > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > > > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
> > > > > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
> > > > > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
> > > > > >   [24558.927523]  ? kernel_read+0xed/0xed
> > > > > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
> > > > > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
> > > > > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
> > > > > >   [24558.928898]  vfs_ioctl+0x6e/0x81
> > > > > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
> > > > > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
> > > > > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
> > > > > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
> > > > > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
> > > > > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
> > > > > >   [24558.931252]  SyS_ioctl+0x39/0x55
> > > > > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
> > > > > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
> > > > > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > > > >   [24558.932627] RIP: 0033:0x7f302849d8a7
> > > > > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
> > > > > > ORIG_RAX:
> > > > > > 0010
> > > > > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 
RCX:
> > > > > > 7f302849d8a7
> > > > > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a 
RDI:
> > > > > > 3a67
> > > > > >   [24558.934266] RBP: 7f3029a52b20 R08:  
R09:
> > > > > > 55c8308d8e40
> > > > > >   [24558.934607] R10: 0008 R11: 0206 
R12:
> > > > > > 7f3023f49358
> > > > > >   [24558.934947] R13: 7ffe86e5723f R14:  
R15:
> > > > > > 7f3029a53700
> > > > > >   [24558.935288]
> > > > > >   [24558.935626] Allocated by task 12622:
> > > > > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
> > > > > > [ppp_generic]
> > > > > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
> > > > > >   [24558.936640]  SyS_connect+0x14b/0x1b7
> > > > > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
> > > > > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > > > >   [24558.937655]
> > > > > >   [24558.937993] Freed by task 12622:
> > > > > >   [24558.938321]  kfree+

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-16 Thread Denys Fedoryshchenko

On 2018-02-15 21:42, Guillaume Nault wrote:

On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-15 21:31, Guillaume Nault wrote:
> On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
> > On 2018-02-15 17:55, Guillaume Nault wrote:
> > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
> > > > Here we go:
> > > >
> > > >   [24558.921549]
> > > > ==
> > > >   [24558.922167] BUG: KASAN: use-after-free in
> > > > ppp_ioctl+0xa6a/0x1522
> > > > [ppp_generic]
> > > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
> > > > accel-pppd/12622
> > > >   [24558.923113]
> > > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
> > > > W
> > > > 4.15.3-build-0134 #1
> > > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
> > > > BIOS P80
> > > > 04/02/2015
> > > >   [24558.924406] Call Trace:
> > > >   [24558.924753]  dump_stack+0x46/0x59
> > > >   [24558.925103]  print_address_description+0x6b/0x23b
> > > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > >   [24558.925797]  kasan_report+0x21b/0x241
> > > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> > > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
> > > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
> > > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
> > > >   [24558.927523]  ? kernel_read+0xed/0xed
> > > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
> > > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
> > > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
> > > >   [24558.928898]  vfs_ioctl+0x6e/0x81
> > > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
> > > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
> > > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
> > > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
> > > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
> > > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
> > > >   [24558.931252]  SyS_ioctl+0x39/0x55
> > > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
> > > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
> > > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > >   [24558.932627] RIP: 0033:0x7f302849d8a7
> > > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
> > > > ORIG_RAX:
> > > > 0010
> > > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
> > > > 7f302849d8a7
> > > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
> > > > 3a67
> > > >   [24558.934266] RBP: 7f3029a52b20 R08:  R09:
> > > > 55c8308d8e40
> > > >   [24558.934607] R10: 0008 R11: 0206 R12:
> > > > 7f3023f49358
> > > >   [24558.934947] R13: 7ffe86e5723f R14:  R15:
> > > > 7f3029a53700
> > > >   [24558.935288]
> > > >   [24558.935626] Allocated by task 12622:
> > > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
> > > > [ppp_generic]
> > > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
> > > >   [24558.936640]  SyS_connect+0x14b/0x1b7
> > > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
> > > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > >   [24558.937655]
> > > >   [24558.937993] Freed by task 12622:
> > > >   [24558.938321]  kfree+0xb0/0x11d
> > > >   [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
> > > >   [24558.938994]  __fput+0x2ba/0x51a
> > > >   [24558.939332]  task_work_run+0x11c/0x13d
> > > >   [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
> > > >   [24558.940022]  do_syscall_64+0x2ea/0x31f
> > > >   [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > >   [24558.947099]
> > >
> > > Your first guess was right. It looks like we have an issue with
> > > reference counting on the channels. Can you send me your ppp_generic.o?
> > http://nuclearcat.com/ppp_generic.o
> > Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
> >
> From what I can see, ppp_release() and ioct

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-15 Thread Denys Fedoryshchenko

On 2018-02-15 21:31, Guillaume Nault wrote:

On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-15 17:55, Guillaume Nault wrote:
> On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
> > Here we go:
> >
> >   [24558.921549]
> > ==
> >   [24558.922167] BUG: KASAN: use-after-free in
> > ppp_ioctl+0xa6a/0x1522
> > [ppp_generic]
> >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
> > accel-pppd/12622
> >   [24558.923113]
> >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
> > W
> > 4.15.3-build-0134 #1
> >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
> > BIOS P80
> > 04/02/2015
> >   [24558.924406] Call Trace:
> >   [24558.924753]  dump_stack+0x46/0x59
> >   [24558.925103]  print_address_description+0x6b/0x23b
> >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> >   [24558.925797]  kasan_report+0x21b/0x241
> >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
> >   [24558.926829]  ? sock_sendmsg+0x89/0x99
> >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
> >   [24558.927523]  ? kernel_read+0xed/0xed
> >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
> >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
> >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
> >   [24558.928898]  vfs_ioctl+0x6e/0x81
> >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
> >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
> >   [24558.929907]  ? sigsuspend+0x13e/0x13e
> >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
> >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
> >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
> >   [24558.931252]  SyS_ioctl+0x39/0x55
> >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
> >   [24558.931942]  do_syscall_64+0x1b1/0x31f
> >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >   [24558.932627] RIP: 0033:0x7f302849d8a7
> >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
> > ORIG_RAX:
> > 0010
> >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
> > 7f302849d8a7
> >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
> > 3a67
> >   [24558.934266] RBP: 7f3029a52b20 R08:  R09:
> > 55c8308d8e40
> >   [24558.934607] R10: 0008 R11: 0206 R12:
> > 7f3023f49358
> >   [24558.934947] R13: 7ffe86e5723f R14:  R15:
> > 7f3029a53700
> >   [24558.935288]
> >   [24558.935626] Allocated by task 12622:
> >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
> > [ppp_generic]
> >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
> >   [24558.936640]  SyS_connect+0x14b/0x1b7
> >   [24558.936975]  do_syscall_64+0x1b1/0x31f
> >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >   [24558.937655]
> >   [24558.937993] Freed by task 12622:
> >   [24558.938321]  kfree+0xb0/0x11d
> >   [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
> >   [24558.938994]  __fput+0x2ba/0x51a
> >   [24558.939332]  task_work_run+0x11c/0x13d
> >   [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
> >   [24558.940022]  do_syscall_64+0x2ea/0x31f
> >   [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >   [24558.947099]
>
> Your first guess was right. It looks like we have an issue with
> reference counting on the channels. Can you send me your ppp_generic.o?
http://nuclearcat.com/ppp_generic.o
Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)


From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called
concurrently on the same ppp_file. Even if this ppp_file was pointed at
by two different file descriptors, I can't see how this could defeat
the reference counting mechanism. I'm going to think more about it.

Can you test with CONFIG_REFCOUNT_FULL? (and keep
d780cd44e3ce ("drivers, net, ppp: convert ppp_file.refcnt from
atomic_t to refcount_t")).
Ok, i will try that tonight. On vanilla kernel or reversing mentioned in 
previous email patch?


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-15 Thread Denys Fedoryshchenko

On 2018-02-15 17:55, Guillaume Nault wrote:

On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:

Here we go:

  [24558.921549]
==
  [24558.922167] BUG: KASAN: use-after-free in 
ppp_ioctl+0xa6a/0x1522

[ppp_generic]
  [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
accel-pppd/12622
  [24558.923113]
  [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
W

4.15.3-build-0134 #1
  [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS 
P80

04/02/2015
  [24558.924406] Call Trace:
  [24558.924753]  dump_stack+0x46/0x59
  [24558.925103]  print_address_description+0x6b/0x23b
  [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
  [24558.925797]  kasan_report+0x21b/0x241
  [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
  [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
  [24558.926829]  ? sock_sendmsg+0x89/0x99
  [24558.927176]  ? __vfs_write+0xd9/0x4ad
  [24558.927523]  ? kernel_read+0xed/0xed
  [24558.927872]  ? SyS_getpeername+0x18c/0x18c
  [24558.928213]  ? bit_waitqueue+0x2a/0x2a
  [24558.928561]  ? wake_atomic_t_function+0x115/0x115
  [24558.928898]  vfs_ioctl+0x6e/0x81
  [24558.929228]  do_vfs_ioctl+0xa00/0xb10
  [24558.929571]  ? sigprocmask+0x1a6/0x1d0
  [24558.929907]  ? sigsuspend+0x13e/0x13e
  [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
  [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
  [24558.930904]  ? sigprocmask+0x1d0/0x1d0
  [24558.931252]  SyS_ioctl+0x39/0x55
  [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
  [24558.931942]  do_syscall_64+0x1b1/0x31f
  [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
  [24558.932627] RIP: 0033:0x7f302849d8a7
  [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 
ORIG_RAX:

0010
  [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
7f302849d8a7
  [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
3a67
  [24558.934266] RBP: 7f3029a52b20 R08:  R09:
55c8308d8e40
  [24558.934607] R10: 0008 R11: 0206 R12:
7f3023f49358
  [24558.934947] R13: 7ffe86e5723f R14:  R15:
7f3029a53700
  [24558.935288]
  [24558.935626] Allocated by task 12622:
  [24558.935972]  ppp_register_net_channel+0x5f/0x5c6 
[ppp_generic]

  [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
  [24558.936640]  SyS_connect+0x14b/0x1b7
  [24558.936975]  do_syscall_64+0x1b1/0x31f
  [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
  [24558.937655]
  [24558.937993] Freed by task 12622:
  [24558.938321]  kfree+0xb0/0x11d
  [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
  [24558.938994]  __fput+0x2ba/0x51a
  [24558.939332]  task_work_run+0x11c/0x13d
  [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
  [24558.940022]  do_syscall_64+0x2ea/0x31f
  [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
  [24558.947099]


Your first guess was right. It looks like we have an issue with
reference counting on the channels. Can you send me your ppp_generic.o?

http://nuclearcat.com/ppp_generic.o
Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-15 Thread Denys Fedoryshchenko

On 2018-02-14 19:25, Guillaume Nault wrote:

On Wed, Feb 14, 2018 at 06:49:19PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-14 18:47, Guillaume Nault wrote:
> On Wed, Feb 14, 2018 at 06:29:34PM +0200, Denys Fedoryshchenko wrote:
> > On 2018-02-14 18:07, Guillaume Nault wrote:
> > > On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote:
> > > > Hi,
> > > >
> > > > Upgraded kernel to 4.15.3, still it crashes after while (several
> > > > hours,
> > > > cannot do bisect, as it is production server).
> > > >
> > > > dev ppp # gdb ppp_generic.o
> > > > GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1
> > > > <>
> > > > Reading symbols from ppp_generic.o...done.
> > > > (gdb) list *ppp_push+0x73
> > > > 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663).
> > > > 1658list = list->next;
> > > > 1659pch = list_entry(list, struct channel, 
clist);
> > > > 1660
> > > > 1661spin_lock(&pch->downl);
> > > > 1662if (pch->chan) {
> > > > 1663if 
(pch->chan->ops->start_xmit(pch->chan, skb))
> > > > 1664ppp->xmit_pending = NULL;
> > > > 1665} else {
> > > > 1666/* channel got unregistered */
> > > > 1667kfree_skb(skb);
> > > >
> > > >
> > > I expect a memory corruption. Do you have the possibility to run with
> > > KASAN by any chance?
> > I will try to enable it tonight. For now i reverted "drivers, net,
> > ppp:
> > convert ppp_file.refcnt from atomic_t to refcount_t" for test.
> >
> This commit looks good to me. Do you have doubts about it because it's
> new in 4.15? Does it mean that your last known-good kernel is 4.14?

I am just doing "manual" bisect, checking all possibilities, and 
picking

patch to revert randomly.


Must be a painful process. Are all of your networking modules required?
With luck, you might be able to isolate a faulty module in fewer steps.


Yes, correct, my known-good is 4.14.2.


Good to know.

Let me know if you can get a KASAN trace.

Here we go:

  [24558.921549] 
==
  [24558.922167] BUG: KASAN: use-after-free in 
ppp_ioctl+0xa6a/0x1522 [ppp_generic]
  [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task 
accel-pppd/12622

  [24558.923113]
  [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G  
  W4.15.3-build-0134 #1
  [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS 
P80 04/02/2015

  [24558.924406] Call Trace:
  [24558.924753]  dump_stack+0x46/0x59
  [24558.925103]  print_address_description+0x6b/0x23b
  [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
  [24558.925797]  kasan_report+0x21b/0x241
  [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
  [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
  [24558.926829]  ? sock_sendmsg+0x89/0x99
  [24558.927176]  ? __vfs_write+0xd9/0x4ad
  [24558.927523]  ? kernel_read+0xed/0xed
  [24558.927872]  ? SyS_getpeername+0x18c/0x18c
  [24558.928213]  ? bit_waitqueue+0x2a/0x2a
  [24558.928561]  ? wake_atomic_t_function+0x115/0x115
  [24558.928898]  vfs_ioctl+0x6e/0x81
  [24558.929228]  do_vfs_ioctl+0xa00/0xb10
  [24558.929571]  ? sigprocmask+0x1a6/0x1d0
  [24558.929907]  ? sigsuspend+0x13e/0x13e
  [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
  [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
  [24558.930904]  ? sigprocmask+0x1d0/0x1d0
  [24558.931252]  SyS_ioctl+0x39/0x55
  [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
  [24558.931942]  do_syscall_64+0x1b1/0x31f
  [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
  [24558.932627] RIP: 0033:0x7f302849d8a7
  [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206 
ORIG_RAX: 0010
  [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX: 
7f302849d8a7
  [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI: 
3a67
  [24558.934266] RBP: 7f3029a52b20 R08:  R09: 
55c8308d8e40
  [24558.934607] R10: 0008 R11: 0206 R12: 
7f3023f49358
  [24558.934947] R13: 7ffe86e5723f R14:  R15: 
7f3029a53700

  [24558.935288]
  [24558.935626] Allocated by task 12622:
  [24558.935972]  ppp_register_net_channel+0x5f/0x5c6 [ppp_generic]
  [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
  [24558.936640]  SyS_connect+0x14b/0x1b7
  [24558.936975]  do_syscall_64+0x1b1/0x31f
  [24558.937319]  entry_S

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-14 Thread Denys Fedoryshchenko

On 2018-02-14 18:47, Guillaume Nault wrote:

On Wed, Feb 14, 2018 at 06:29:34PM +0200, Denys Fedoryshchenko wrote:

On 2018-02-14 18:07, Guillaume Nault wrote:
> On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote:
> > Hi,
> >
> > Upgraded kernel to 4.15.3, still it crashes after while (several
> > hours,
> > cannot do bisect, as it is production server).
> >
> > dev ppp # gdb ppp_generic.o
> > GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1
> > <>
> > Reading symbols from ppp_generic.o...done.
> > (gdb) list *ppp_push+0x73
> > 0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663).
> > 1658  list = list->next;
> > 1659  pch = list_entry(list, struct channel, clist);
> > 1660
> > 1661  spin_lock(&pch->downl);
> > 1662  if (pch->chan) {
> > 1663  if (pch->chan->ops->start_xmit(pch->chan, 
skb))
> > 1664  ppp->xmit_pending = NULL;
> > 1665  } else {
> > 1666  /* channel got unregistered */
> > 1667  kfree_skb(skb);
> >
> >
> I expect a memory corruption. Do you have the possibility to run with
> KASAN by any chance?
I will try to enable it tonight. For now i reverted "drivers, net, 
ppp:

convert ppp_file.refcnt from atomic_t to refcount_t" for test.


This commit looks good to me. Do you have doubts about it because it's
new in 4.15? Does it mean that your last known-good kernel is 4.14?


I am just doing "manual" bisect, checking all possibilities, and picking 
patch to revert randomly.

Yes, correct, my known-good is 4.14.2.


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-14 Thread Denys Fedoryshchenko

On 2018-02-14 18:07, Guillaume Nault wrote:

On Wed, Feb 14, 2018 at 03:17:23PM +0200, Denys Fedoryshchenko wrote:

Hi,

Upgraded kernel to 4.15.3, still it crashes after while (several 
hours,

cannot do bisect, as it is production server).

dev ppp # gdb ppp_generic.o
GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1
<>
Reading symbols from ppp_generic.o...done.
(gdb) list *ppp_push+0x73
0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663).
1658list = list->next;
1659pch = list_entry(list, struct channel, clist);
1660
1661spin_lock(&pch->downl);
1662if (pch->chan) {
1663if (pch->chan->ops->start_xmit(pch->chan, skb))
1664ppp->xmit_pending = NULL;
1665} else {
1666/* channel got unregistered */
1667kfree_skb(skb);



I expect a memory corruption. Do you have the possibility to run with
KASAN by any chance?
I will try to enable it tonight. For now i reverted "drivers, net, ppp: 
convert ppp_file.refcnt from atomic_t to refcount_t" for test.


ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-14 Thread Denys Fedoryshchenko

Hi,

Upgraded kernel to 4.15.3, still it crashes after while (several hours, 
cannot do bisect, as it is production server).


dev ppp # gdb ppp_generic.o
GNU gdb (Gentoo 7.12.1 vanilla) 7.12.1
<>
Reading symbols from ppp_generic.o...done.
(gdb) list *ppp_push+0x73
0x681 is in ppp_push (drivers/net/ppp/ppp_generic.c:1663).
1658list = list->next;
1659pch = list_entry(list, struct channel, clist);
1660
1661spin_lock(&pch->downl);
1662if (pch->chan) {
1663if (pch->chan->ops->start_xmit(pch->chan, skb))
1664ppp->xmit_pending = NULL;
1665} else {
1666/* channel got unregistered */
1667kfree_skb(skb);



Feb 14 08:32:00  [17937.863304] general protection fault:  [#1] 
SMP
Feb 14 08:32:00  [17937.863638] Modules linked in: pppoe pppox 
ppp_generic slhc netconsole configfs coretemp nf_nat_pptp 
nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE 
nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
Feb 14 08:32:00  [17937.865619] CPU: 6 PID: 12543 Comm: accel-pppd 
Not tainted 4.15.3-build-0134 #4
Feb 14 08:32:00  [17937.866211] Hardware name: HP ProLiant DL320e 
Gen8 v2, BIOS P80 04/02/2015
Feb 14 08:32:00  [17937.866542] RIP: 0010:ppp_push+0x73/0x4ec 
[ppp_generic]
Feb 14 08:32:00  [17937.866865] RSP: 0018:c90001fa7d50 EFLAGS: 
00010282
Feb 14 08:32:00  [17937.867191] RAX: 0fd54d16ec03 RBX: 
8803eeb207b8 RCX: 0101
Feb 14 08:32:00  [17937.867517] RDX:  RSI: 
8803f9fb5000 RDI: 8803eed1e443
Feb 14 08:32:00  [17937.867844] RBP: 8803f9fb5000 R08: 
0001 R09: 
Feb 14 08:32:00  [17937.868171] R10: 7f0a75fba758 R11: 
0293 R12: 8021
Feb 14 08:32:00  [17937.868499] R13: 8804144c7880 R14: 
8021 R15: 8804144c7800
Feb 14 08:32:00  [17937.868824] FS:  7f0a7ecd8700() 
GS:88043418() knlGS:
Feb 14 08:32:00  [17937.869408] CS:  0010 DS:  ES:  CR0: 
80050033
Feb 14 08:32:00  [17937.869729] CR2: 7fa87a187978 CR3: 
00042a6cd005 CR4: 001606e0

Feb 14 08:32:00  [17937.870053] Call Trace:
Feb 14 08:32:00  [17937.870375]  ? 
__kmalloc_node_track_caller+0xb5/0xd6
Feb 14 08:32:00  [17937.870700]  __ppp_xmit_process+0x35/0x4c6 
[ppp_generic]
Feb 14 08:32:00  [17937.871025]  ppp_xmit_process+0x35/0x88 
[ppp_generic]

Feb 14 08:32:00  [17937.871350]  ppp_write+0xb1/0xbb [ppp_generic]
Feb 14 08:32:00  [17937.871678]  __vfs_write+0x1c/0x118
Feb 14 08:32:00  [17937.872003]  ? SyS_epoll_ctl+0x399/0x871
Feb 14 08:32:00  [17937.872328]  vfs_write+0xc6/0x169
Feb 14 08:32:00  [17937.872654]  SyS_write+0x48/0x81
Feb 14 08:32:00  [17937.872980]  do_syscall_64+0x5f/0xea
Feb 14 08:32:00  [17937.873310]  
entry_SYSCALL_64_after_hwframe+0x21/0x86

Feb 14 08:32:00  [17937.873638] RIP: 0033:0x7f0a7e4bfb2d
Feb 14 08:32:00  [17937.873963] RSP: 002b:7f0a7ecd7b00 EFLAGS: 
0293 ORIG_RAX: 0001
Feb 14 08:32:00  [17937.874554] RAX: ffda RBX: 
7f0a7d00b1e3 RCX: 7f0a7e4bfb2d
Feb 14 08:32:00  [17937.874881] RDX: 000c RSI: 
7f0a74175c80 RDI: 3ef8
Feb 14 08:32:00  [17937.875207] RBP: 7f0a7ecd7b30 R08: 
 R09: 55776e7a5e40
Feb 14 08:32:00  [17937.875536] R10: 7f0a75fba758 R11: 
0293 R12: 7f0a7550dd18
Feb 14 08:32:00  [17937.875863] R13: 7ffd4c941eaf R14: 
 R15: 7f0a7ecd8700
Feb 14 08:32:00  [17937.876190] Code: 94 00 00 00 49 89 ff 0f ba e0 
0a 72 43 48 8b 5f 68 48 8d 7b e8 e8 88 4f 84 e1 48 8b 7b b8 48 85 ff 74 
10 48 8b 47 08 48 8b 34 24  10 85 c0 75 0b eb 14 48 8b 3c 2

4 e8 d8 6c 76 e1 49 c7 87 c8
Feb 14 08:32:00  [17937.877071] RIP: ppp_push+0x73/0x4ec 
[ppp_generic] RSP: c90001fa7d50
Feb 14 08:32:00  [17937.877435] ---[ end trace 30a3cc6a49109783 
]---
Feb 14 08:32:00  [17937.878370] Kernel panic - not syncing: Fatal 
exception in interrupt

Feb 14 08:32:00  [17937.878715] Kernel Offset: disabled
Feb 14 08:32:00  [17937.879771] Rebooting in 5 seconds..


4.15.2 kernel panic, nat, ppp bug?

2018-02-12 Thread Denys Fedoryshchenko

Hello,

Got this and then server rebooted with panic (second message).
Workload: pppoe BRAS, lost of shapers, ppp interfaces

Please let me know if i need to provide more information

Feb 12 06:00:58  [13750.606169] WARNING: CPU: 6 PID: 0 at 
./include/net/dst.h:256 nf_xfrm_me_harder+0x52/0xd9 [nf_nat]
Feb 12 06:00:58  [13750.606747] Modules linked in: pppoe pppox 
ppp_generic slhc netconsole configfs coretemp nf_nat_pptp 
nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE 
nf_dup_ipv4
 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptabl
e_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
Feb 12 06:00:58  [13750.608695] CPU: 6 PID: 0 Comm: swapper/6 Not 
tainted 4.15.2-build-0134 #5
Feb 12 06:00:58  [13750.609017] Hardware name: HP ProLiant DL320e 
Gen8 v2, BIOS P80 04/02/2015
Feb 12 06:00:58  [13750.609345] RIP: 
0010:nf_xfrm_me_harder+0x52/0xd9 [nf_nat]
Feb 12 06:00:58  [13750.609667] RSP: 0018:880434183c88 EFLAGS: 
00010246
Feb 12 06:00:58  [13750.609985] RAX:  RBX: 
8803f997ce00 RCX: 
Feb 12 06:00:58  [13750.610306] RDX: 0001 RSI: 
880406c09a00 RDI: 880434183cc8
Feb 12 06:00:58  [13750.610629] RBP: 822a81c0 R08: 
0005 R09: 0001
Feb 12 06:00:58  [13750.610949] R10: 00ce R11: 
88043154c320 R12: 0001
Feb 12 06:00:58  [13750.611274] R13: 880434183d50 R14: 
00e20008 R15: 88042e320078
Feb 12 06:00:58  [13750.611599] FS:  () 
GS:88043418() knlGS:
Feb 12 06:00:58  [13750.612181] CS:  0010 DS:  ES:  CR0: 
80050033
Feb 12 06:00:58  [13750.612500] CR2: 7f12eed3a140 CR3: 
000446209003 CR4: 001606e0

Feb 12 06:00:58  [13750.612823] Call Trace:
Feb 12 06:00:58  [13750.613138]  
Feb 12 06:00:58  [13750.613457]  nf_nat_ipv4_out+0xa5/0xb9 
[nf_nat_ipv4]

Feb 12 06:00:58  [13750.613780]  nf_hook_slow+0x31/0x93
Feb 12 06:00:58  [13750.614101]  ip_output+0x93/0xaf
Feb 12 06:00:58  [13750.614417]  ? 
ip_fragment.constprop.5+0x6e/0x6e

Feb 12 06:00:58  [13750.614739]  ip_forward+0x36d/0x378
Feb 12 06:00:58  [13750.615057]  ? ip_frag_mem+0x7/0x7
Feb 12 06:00:58  [13750.615376]  ip_rcv+0x2f0/0x325
Feb 12 06:00:58  [13750.615698]  ? 
ip_local_deliver_finish+0x1a8/0x1a8
Feb 12 06:00:58  [13750.616019]  
__netif_receive_skb_core+0x535/0x8b5

Feb 12 06:00:58  [13750.616340]  ? kmem_cache_free_bulk+0x21b/0x233
Feb 12 06:00:58  [13750.616661]  ? process_backlog+0x99/0x115
Feb 12 06:00:58  [13750.616981]  process_backlog+0x99/0x115
Feb 12 06:00:58  [13750.617300]  net_rx_action+0x11c/0x28a
Feb 12 06:00:58  [13750.617620]  __do_softirq+0xc8/0x1bf
Feb 12 06:00:58  [13750.617941]  irq_exit+0x49/0x88
Feb 12 06:00:58  [13750.618262]  
call_function_single_interrupt+0x92/0xa0

Feb 12 06:00:58  [13750.618587]  
Feb 12 06:00:58  [13750.618907] RIP: 0010:mwait_idle+0x4c/0x5e
Feb 12 06:00:58  [13750.619227] RSP: 0018:c9000192bf08 EFLAGS: 
0246 ORIG_RAX: ff04
Feb 12 06:00:58  [13750.619803] RAX:  RBX: 
88043296cc80 RCX: 
Feb 12 06:00:58  [13750.620120] RDX:  RSI: 
 RDI: 
Feb 12 06:00:58  [13750.620436] RBP:  R08: 
00525ccae333e271 R09: 00023738
Feb 12 06:00:58  [13750.620750] R10: c9000192be98 R11: 
000236e0 R12: 
Feb 12 06:00:58  [13750.621065] R13: 0006 R14: 
88043296cc80 R15: 88043296cc80
Feb 12 06:00:58  [13750.621394]  ? 
rcu_eqs_enter_common.constprop.54+0x57/0x5f

Feb 12 06:00:58  [13750.621714]  do_idle+0xa8/0x130
Feb 12 06:00:58  [13750.622032]  cpu_startup_entry+0x18/0x1a
Feb 12 06:00:58  [13750.622349]  secondary_startup_64+0xa5/0xb0
Feb 12 06:00:58  [13750.622667] Code: 48 83 e6 fe 48 83 7e 48 00 74 
07 48 8b b6 80 01 00 00 8b 86 80 00 00 00 85 c0 74 0f 8d 50 01 f0 0f b1 
96 80 00 00 00 74 04 eb ed <0f> ff 48 8b 4b 18 48 8d 54 24 08

 45 31 c0 48 89 ef e8 44 91 8d
Feb 12 06:00:58  [13750.623533] ---[ end trace 807c68f3da1711db 
]---
Feb 12 06:00:58  [13750.623863] dst_release: dst:ad86ddff 
refcnt:-1



Feb 12 09:40:45  [26937.094365] WARNING: CPU: 5 PID: 0 at 
./include/net/dst.h:256 nf_xfrm_me_harder+0x52/0xd9 [nf_nat]
Feb 12 09:40:45  [26937.094958] Modules linked in: pppoe pppox 
ppp_generic slhc netconsole configfs coretemp nf_nat_pptp 
nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE 
nf_dup_ipv4
 xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptabl
e_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_

Re: e1000e hardware unit hangs

2018-01-24 Thread Denys Fedoryshchenko

On 2018-01-24 20:31, Ben Greear wrote:

On 01/24/2018 08:34 AM, Neftin, Sasha wrote:

On 1/24/2018 18:11, Alexander Duyck wrote:
On Tue, Jan 23, 2018 at 3:46 PM, Ben Greear  
wrote:

Hello,

Anyone have any more suggestions for making e1000e work better?  
This is

from a 4.9.65+ kernel,
with these additional e1000e patches applied:

e1000e: Fix error path in link detection
e1000e: Fix wrong comment related to link detection
e1000e: Fix return value test
e1000e: Separate signaling for link check/link up
e1000e: Avoid receiver overrun interrupt bursts


Most of these patches shouldn't address anything that would trigger 
Tx

hangs. They are mostly related to just link detection.

Test case is simply to run 3 tcp connections each trying to send 
56Kbps

of bi-directional
data between a pair of e1000e interfaces :)

No OOM related issues are seen on this kernel...similar test on 4.13 
showed

some OOM
issues, but I have not debugged that yet...


Really a question like this probably belongs on e1000-devel or
intel-wired-lan so I have added those lists and the e1000e maintainer
to the thread.

It would be useful if you could provide more information about the
device itself such as the ID and the kind of test you are running.
Keep in mind the e1000e driver supports a pretty broad swath of
devices so we need to narrow things down a bit.


please, also re-check if your kernel include:
e1000e: fix buffer overrun while the I219 is processing DMA 
transactions

e1000e: fix the use of magic numbers for buffer overrun issue
where you take fresh version of kernel?


Hello,

I tried adding those two patches, but I still see this splat shortly
after starting
my test.  The kernel I am using is here:

https://github.com/greearb/linux-ct-4.13

I've seen similar issues at least back to the 4.0 kernel, including
stock kernels and my
own kernels with additional patches.

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG:
eth2 (e1000e): transmit queue 0 timed out, trans_start: 4295298499,
wd-timeout: 5000 jiffies: 4295304192 tx-queues: 1
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: [ cut
here ]
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 0
PID: 0 at
/home/greearb/git/linux-4.13.dev.y/net/sched/sch_generic.c:322
dev_watchdog+0x228/0x250
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in:
nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c
cfg80211 macvlan wanlink(O) pktgen bnep bluetooth f...ss tpm_tis ip
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CPU: 0 PID: 0
Comm: swapper/0 Tainted: G   O4.13.16+ #22
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Hardware name:
Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: task:
81e104c0 task.stack: 81e0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RIP:
0010:dev_watchdog+0x228/0x250
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RSP:
0018:88042fc03e50 EFLAGS: 00010282
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RAX:
0086 RBX:  RCX: 
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RDX:
88042fc15b40 RSI: 88042fc0dbf8 RDI: 88042fc0dbf8
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RBP:
88042fc03e98 R08: 0001 R09: 03c4
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R10:
 R11: 03c4 R12: 1388
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R13:
000100050dc3 R14: 88041767 R15: 000100052400
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: FS:
() GS:88042fc0()
knlGS:
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CS:  0010 DS: 
ES:  CR0: 80050033
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CR2:
01d14000 CR3: 01e09000 CR4: 001406f0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Call Trace:
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ? 
qdisc_rcu_free+0x40/0x40
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
call_timer_fn+0x30/0x160
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ? 
qdisc_rcu_free+0x40/0x40

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
run_timer_softirq+0x1f0/0x450
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ?
lapic_next_deadline+0x21/0x30
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ?
clockevents_program_event+0x78/0xf0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
__do_softirq+0xc1/0x2c0

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  irq_exit+0xb1/0xc0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
smp_apic_timer_interrupt+0x38/0x50
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
apic_timer_interrupt+0x89/0x90
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: 

Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]

2017-10-07 Thread Denys Fedoryshchenko

On 2017-10-07 15:09, SviMik wrote:


Unfortunately, netconsole has managed to send a kernel panic trace
only once, and it's not related to this bug. Looks like something
crashes really hard to make netconsole unusable.
In some cases i had luck with pstore, when netconsole failed me 
(especially networking bugs), it stores panic messages more reliably, 
especially on recent platforms who have ERST and EFI.

https://www.kernel.org/doc/Documentation/ABI/testing/pstore


Question about "prevent dst uses after free" and WARNING in nf_xfrm_me_harder / refcnt / 4.13.3

2017-10-02 Thread Denys Fedoryshchenko

Hi,

I'm running now 4.13.3, is this patch required for 4.13 as well?
(it doesnt apply cleanly, as in 4.13 tcp_prequeue use 
skb_dst_force_safe, so i just renamed it there to skb_dst_force )


This is what i get on PPPoE BRAS on this kernel, patch applied
(no idea if its related to patch, but just mentioning i applied it, as 
it's not vanilla 4.13.3)


[ 7858.579600] [ cut here ]
[ 7858.579818] WARNING: CPU: 2 PID: 0 at ./include/net/dst.h:254 
nf_xfrm_me_harder+0x61/0xec [nf_nat]
[ 7858.580160] Modules linked in: cls_fw act_police cls_u32 sch_ingress 
sch_htb pppoe pppox ppp_generic slhc netconsole configfs coretemp 
nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre 
tun xt_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT 
nf_reject_ipv4 xt_set ts_bm xt_string xt_connmark xt_DSCP xt_mark 
xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnetlink iptable_mangle 
iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
[ 7858.581255] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 
4.13.3-build-0133 #27
[ 7858.581456] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[ 7858.581659] task: 880434e6a700 task.stack: c90001904000
[ 7858.581862] RIP: 0010:nf_xfrm_me_harder+0x61/0xec [nf_nat]
[ 7858.582061] RSP: 0018:880436483bc0 EFLAGS: 00010246
[ 7858.582259] RAX:  RBX: 822df000 RCX: 
8803ee9028ce
[ 7858.582461] RDX: 0014 RSI: 88041cd82900 RDI: 
880436483bf8
[ 7858.582661] RBP: 880436483c20 R08: 81e0b400 R09: 
b916
[ 7858.582865] R10: 8803ee9028e8 R11:  R12: 
880401e92100
[ 7858.583068] R13: 0001 R14: 822df000 R15: 
88042e280078
[ 7858.583269] FS:  () GS:88043648() 
knlGS:

[ 7858.583608] CS:  0010 DS:  ES:  CR0: 80050033
[ 7858.583809] CR2: 7f9b2886fc9c CR3: 000429223000 CR4: 
001406e0

[ 7858.584013] Call Trace:
[ 7858.584209]  
[ 7858.584408]  ? nf_nat_ipv4_fn+0x12e/0x189 [nf_nat_ipv4]
[ 7858.584605]  nf_nat_ipv4_out+0xb6/0xd3 [nf_nat_ipv4]
[ 7858.584807]  iptable_nat_ipv4_out+0x15/0x17 [iptable_nat]
[ 7858.585010]  nf_hook_slow+0x2a/0x9a
[ 7858.585209]  ip_output+0x96/0xb4
[ 7858.585410]  ? ip_fragment.constprop.5+0x7c/0x7c
[ 7858.585610]  ip_forward_finish+0x5b/0x60
[ 7858.585811]  ip_forward+0x36d/0x37a
[ 7858.586010]  ? ip_frag_mem+0x11/0x11
[ 7858.586207]  ip_rcv_finish+0x2f9/0x304
[ 7858.586406]  ip_rcv+0x32a/0x337
[ 7858.586604]  ? ip_local_deliver_finish+0x1bb/0x1bb
[ 7858.586808]  __netif_receive_skb_core+0x4f0/0x847
[ 7858.587009]  __netif_receive_skb+0x18/0x5a
[ 7858.587208]  ? __netif_receive_skb+0x18/0x5a
[ 7858.587407]  process_backlog+0xa4/0x127
[ 7858.587606]  net_rx_action+0x11e/0x2d8
[ 7858.587811]  ? sched_clock_cpu+0x15/0x9b
[ 7858.588013]  __do_softirq+0xe7/0x23a
[ 7858.588210]  irq_exit+0x52/0x93
[ 7858.588408]  smp_call_function_single_interrupt+0x33/0x35
[ 7858.588610]  call_function_single_interrupt+0x83/0x90
[ 7858.588811] RIP: 0010:mwait_idle+0x93/0x13c
[ 7858.589007] RSP: 0018:c90001907eb0 EFLAGS: 0246 ORIG_RAX: 
ff04
[ 7858.589347] RAX:  RBX: 880434e6a700 RCX: 

[ 7858.589548] RDX:  RSI:  RDI: 

[ 7858.589750] RBP: c90001907ec0 R08:  R09: 
0001
[ 7858.589952] R10: c90001907e58 R11: 024d R12: 
0002
[ 7858.590149] R13:  R14: 880434e6a700 R15: 
880434e6a700

[ 7858.590347]  
[ 7858.590541]  arch_cpu_idle+0xf/0x11
[ 7858.590738]  default_idle_call+0x25/0x27
[ 7858.590938]  do_idle+0xb8/0x150
[ 7858.591133]  cpu_startup_entry+0x1f/0x21
[ 7858.591332]  start_secondary+0xe8/0xeb
[ 7858.591531]  secondary_startup_64+0x9f/0x9f
[ 7858.591729] Code: 83 7e 48 00 74 07 48 8b b6 80 01 00 00 8b 86 80 00 
00 00 85 c0 74 14 8d 50 01 f0 0f b1 96 80 00 00 00 0f 94 c2 84 d2 75 04 
eb e8 <0f> ff 49 8b 4c 24 18 48 8d 55 a0 45 31 c0 48 89 df e8 d9 de 95

[ 7858.592239] ---[ end trace c089174999ff4fc3 ]---
[ 7858.592448] dst_release: dst:88041cd82900 refcnt:-1
[ 8139.130003] igb :07:00.0 eth0: igb: eth0 NIC Link is Down
[ 8139.130309] igb :07:00.0 eth0: Reset adapter
[ 8164.431523] igb :07:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: RX/TX
[ 9149.190518] perf: interrupt took too long (3132 > 3128), lowering 
kernel.perf_event_max_sample_rate to 63000

[17205.528640] [ cut here ]
[17205.528855] WARNING: CPU: 0 PID: 0 at ./include/net/dst.h:254 
nf_xfrm_me_harder+0x61/0xec [nf_nat]
[17205.529197] Modules linked in: cls_fw act_police cls_u32 sch_ingress 
sch_htb pppoe pppox ppp_generic slhc netconsole configfs coretemp 
nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre 
tun xt_REDIREC

Re: [PATCH] bgmac: Remove all offloading features, including GRO.

2017-09-15 Thread Denys Fedoryshchenko

On 2017-09-16 03:18, Eric Dumazet wrote:

On Fri, 2017-09-15 at 17:10 -0700, ros...@gmail.com wrote:

Ok fair enough. Will only disable GRO in the driver.


Well, do not even try.

NETIF_F_SOFT_FEATURES is set by core networking stack in
register_netdevice(), ( commit 212b573f5552c60265da721ff9ce32e3462a2cdd
)

Absolutely no driver disables GRO (excepts the ones playing with LRO)

I believe also iperf is definitely inconclusive test.
Except iperf there is lot of different workloads and configurations, 
that might have different results.


Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko

On 2017-09-13 20:20, Eric Dumazet wrote:

On Wed, 2017-09-13 at 20:12 +0300, Denys Fedoryshchenko wrote:

For my case, as load increased now, i am hitting same issue (i tried 
to

play with quantum / bursts as well, didnt helped):

tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class
show dev eth3.777 classid 1:111
class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit
ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 
10b/1

mpu 0b level 0
  Sent 864151559 bytes 730566 pkt (dropped 15111, overlimits 0 
requeues

0)
  backlog 73968000b 39934p requeues 0
  lended: 499867 borrowed: 0 giants: 0
  tokens: 608 ctokens: 121



You have drops (and ~40,000 packets in backlog)



class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit
ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 
10b/1

mpu 0b level 0
  Sent 1469352160 bytes 1243649 pkt (dropped 42933, overlimits 0 
requeues

0)
  backlog 82536047b 39963p requeues 0
  lended: 810475 borrowed: 0 giants: 0
  tokens: 612 ctokens: 122

(1469352160-864151559)/5*8
968320961.6000
Less than 1Gbit and it's being throttled


It is not : "overlimits 0"  means this class was not throttled.
Overlimits never appear in HTB as i know, here is simulation on this 
class that have constant "at least" 1G traffic, i throttled it to 1Kbit 
to simulate forced drops:


shapernew ~ # sh /etc/shaper.cfg;sleep 1;tc -s -d class show dev 
eth3.777 classid 1:111;tc qdisc del dev eth3.777 root
class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 1Kbit 
ceil 1Kbit linklayer ethernet burst 31280b/1 mpu 0b cburst 31280b/1 mpu 
0b level 0

 Sent 134350019 bytes 117520 pkt (dropped 7819, overlimits 0 requeues 0)
 backlog 7902126b 4976p requeues 0
 lended: 86694 borrowed: 0 giants: 0
 tokens: -93750 ctokens: -93750



Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko

On 2017-09-13 19:55, Eric Dumazet wrote:

On Wed, 2017-09-13 at 09:42 -0700, Eric Dumazet wrote:

On Wed, 2017-09-13 at 19:27 +0300, Denys Fedoryshchenko wrote:
> On 2017-09-13 19:16, Eric Dumazet wrote:
> > On Wed, 2017-09-13 at 18:34 +0300, Denys Fedoryshchenko wrote:
> >> Well, probably i am answering my own question, removing estimator from
> >> classes seems drastically improve situation.
> >> It seems estimator has some issues that cause shaper to behave
> >> incorrectly (throttling traffic while it should not).
> >> But i guess thats a bug?
> >> As i was not able to predict such bottleneck by CPU load measurements.
> >
> > Well, there was a reason we disabled HTB class estimators by default ;)
> >
> >
> > 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=64153ce0a7b61b2a5cacb01805cbf670142339e9
>
> As soon as disabling it solve my problem - i'm fine, hehe, but i guess
> other people who might hit this problem, should be aware how to find
> reason.
> They should not be disappointed in Linux :)

Well, if they enable rate estimators while kernel does not set them by
default, they get what they want, at a cost.

> Because i can't measure this bottleneck before it happens, i'm seeing on
> mpstat all cpu's are idle, and same time traffic is throttled.

Normally things were supposed to get much better in linux-4.10

( 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1c0d32fde5bdf1184bc274f864c09799278a1114 
)


But I apparently added a scaling bug.

I will try :

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 
0385dece1f6fe5e26df1ce5f40956a79a2eebbf4..7c1ffd6f950172c1915d8e5fa2b5e3f77e4f4c78 
100644

--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -83,10 +83,10 @@ static void est_timer(unsigned long arg)
u64 rate, brate;

est_fetch_counters(est, &b);
-   brate = (b.bytes - est->last_bytes) << (8 - est->ewma_log);
+   brate = (b.bytes - est->last_bytes) << (10 - est->ewma_log - 
est->intvl_log);

brate -= (est->avbps >> est->ewma_log);

-   rate = (u64)(b.packets - est->last_packets) << (8 - 
est->ewma_log);
+   rate = (u64)(b.packets - est->last_packets) << (10 - 
est->ewma_log - est->intvl_log);

rate -= (est->avpps >> est->ewma_log);

write_seqcount_begin(&est->seq);



Much better indeed

# tc -s -d class sh dev eth0 classid 7002:11 ; sleep 10 ;tc -s -d class
sh dev eth0 classid 7002:11

class htb 7002:11 parent 7002:1 prio 5 quantum 20 rate 5Gbit ceil
5Gbit linklayer ethernet burst 8b/1 mpu 0b cburst 8b/1 mpu 0b
level 0 rate_handle 1
 Sent 389085117074 bytes 256991500 pkt (dropped 0, overlimits 5926926
requeues 0)
 rate 4999Mbit 412762pps backlog 136260b 2p requeues 0
 TCP pkts/rtx 256991584/0 bytes 389085252840/0
 lended: 5961250 borrowed: 0 giants: 0
 tokens: -1664 ctokens: -1664

class htb 7002:11 parent 7002:1 prio 5 quantum 20 rate 5Gbit ceil
5Gbit linklayer ethernet burst 8b/1 mpu 0b cburst 8b/1 mpu 0b
level 0 rate_handle 1
 Sent 395336315580 bytes 261120429 pkt (dropped 0, overlimits 6021776
requeues 0)
 rate 4999Mbit 412788pps backlog 68Kb 2p requeues 0
 TCP pkts/rtx 261120469/0 bytes 395336384730/0
 lended: 6056793 borrowed: 0 giants: 0
 tokens: -1478 ctokens: -1478


echo "(395336315580-389085117074)/10*8" | bc
5000958800
For my case, as load increased now, i am hitting same issue (i tried to 
play with quantum / bursts as well, didnt helped):


tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc -s -d class 
show dev eth3.777 classid 1:111
class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit 
ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 
mpu 0b level 0
 Sent 864151559 bytes 730566 pkt (dropped 15111, overlimits 0 requeues 
0)

 backlog 73968000b 39934p requeues 0
 lended: 499867 borrowed: 0 giants: 0
 tokens: 608 ctokens: 121

class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 20Gbit 
ceil 100Gbit linklayer ethernet burst 10b/1 mpu 0b cburst 10b/1 
mpu 0b level 0
 Sent 1469352160 bytes 1243649 pkt (dropped 42933, overlimits 0 requeues 
0)

 backlog 82536047b 39963p requeues 0
 lended: 810475 borrowed: 0 giants: 0
 tokens: 612 ctokens: 122

(1469352160-864151559)/5*8
968320961.6000
Less than 1Gbit and it's being throttled

Total bandwidth:

class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 
10b/1 mpu 0b cburst 10b/1 mpu 0b level 7

 Sent 7839730635 bytes 8537393 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: 123 ctokens: 123

class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 
10b/1 mpu 0b cburst 10b/1 mpu 0b level 7
 Sent 11043190453 bytes 12008366 pkt (dropped 0, overlimits 0 requeues 
0)

 backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: 124 ctokens: 124

694kpps
5.1Gbit


Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko

On 2017-09-13 19:16, Eric Dumazet wrote:

On Wed, 2017-09-13 at 18:34 +0300, Denys Fedoryshchenko wrote:

Well, probably i am answering my own question, removing estimator from
classes seems drastically improve situation.
It seems estimator has some issues that cause shaper to behave
incorrectly (throttling traffic while it should not).
But i guess thats a bug?
As i was not able to predict such bottleneck by CPU load measurements.


Well, there was a reason we disabled HTB class estimators by default ;)


https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=64153ce0a7b61b2a5cacb01805cbf670142339e9


As soon as disabling it solve my problem - i'm fine, hehe, but i guess 
other people who might hit this problem, should be aware how to find 
reason.

They should not be disappointed in Linux :)
Because i can't measure this bottleneck before it happens, i'm seeing on 
mpstat all cpu's are idle, and same time traffic is throttled.


Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko

On 2017-09-13 18:51, Eric Dumazet wrote:

On Wed, 2017-09-13 at 18:20 +0300, Denys Fedoryshchenko wrote:

Hi,

I noticed after increasing bandwidth over some amount HTB started to
throttle classes it should not throttle.
Also estimated rate in htb totally wrong, while byte counters is
correct.

Is there any overflow or something?


Thanks Denys for the report, I will take a look at this, since I
probably introduced some regression.
It's definitely not something recent, this system was on older kernel 
with uptime over 200 days, and this bottleneck was present, i noticed it 
long time before.
But never tried to remove estimators (increasing burst/cburst to insane 
values saved me for a while).


Re: HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko
Well, probably i am answering my own question, removing estimator from 
classes seems drastically improve situation.
It seems estimator has some issues that cause shaper to behave 
incorrectly (throttling traffic while it should not).

But i guess thats a bug?
As i was not able to predict such bottleneck by CPU load measurements.

On 2017-09-13 18:20, Denys Fedoryshchenko wrote:

Hi,

I noticed after increasing bandwidth over some amount HTB started to
throttle classes it should not throttle.
Also estimated rate in htb totally wrong, while byte counters is 
correct.


Is there any overflow or something?

X520 card (but XL710 same)
br1 8000.90e2ba86c38c   no  eth3.1777
eth3.777
br2 8000.90e2ba86c38d   no  eth3.360
eth3.361

Inbound traffic is coming over one vlan, leaving another vlan.
Shaper is just bunch of classes and u32 filters, with few fw filters.
qdisc is pie

I put totally high values to not reach them, tried to change
quantum/burst/cburst but... stats below.

First, "root" class is 1:1 showing  rate 18086Mbit, which is
physically impossible.

Class 1:111 showing 5355Mbit, while real traffic is ~1.5Gbit

shaper /etc # tc -s -d class show dev eth3.777 classid 1:111;sleep
5;tc -s -d class show dev eth3.777 classid 1:111
class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 6487632263 bytes 5235525 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5529Mbit 557534pps backlog 0b 0p requeues 0
 lended: 2423323 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 7438601731 bytes 6003811 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5631Mbit 568214pps backlog 36624b 8p requeues 0
 lended: 2772486 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

(7438601731-6487632263)/5*8 = 1.521.551.148

And most important some classes suffering, while they should not (not
reaching limits)
class htb 1:95 parent 1:1 leaf 95: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 13556762059 bytes 17474559 pkt (dropped 16017, overlimits 0 
requeues 0)

 rate 2524Mbit 414197pps backlog 31969245b 34513p requeues 0
 lended: 13995723 borrowed: 0 giants: 0
 tokens: 111 ctokens: -2





Full classes stats:

class htb 1:100 parent 1:1 leaf 100: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 116 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
 rate 8bit 0pps backlog 0b 0p requeues 0
 lended: 2 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

class htb 1:120 parent 1:1 leaf 120: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 531230043 bytes 782130 pkt (dropped 0, overlimits 0 requeues 0)
 rate 132274Kbit 25240pps backlog 0b 0p requeues 0
 lended: 540693 borrowed: 0 giants: 0
 tokens: 109 ctokens: -2

class htb 1:50 parent 1:1 leaf 50: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 773472109 bytes 587335 pkt (dropped 0, overlimits 0 requeues 0)
 rate 215929Kbit 20503pps backlog 0b 0p requeues 0
 lended: 216614 borrowed: 0 giants: 0
 tokens: 91 ctokens: -4

class htb 1:70 parent 1:1 leaf 70: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 1574768 bytes 6194 pkt (dropped 0, overlimits 0 requeues 0)
 rate 406272bit 214pps backlog 0b 0p requeues 0
 lended: 5758 borrowed: 0 giants: 0
 tokens: 101 ctokens: -3

class htb 1:90 parent 1:1 leaf 90: prio 0 quantum 5 rate 1Kbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 3206 bytes 53 pkt (dropped 0, overlimits 0 requeues 0)
 rate 848bit 1pps backlog 0b 0p requeues 0
 lended: 53 borrowed: 0 giants: 0

class htb 1:110 parent 1:1 leaf 110: prio 0 quantum 5 rate 10Gbit
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu
0b level 0
 Sent 17205952113 bytes 12926008 pkt (dropped 239, overlimits 0 
requeues 0)

 rate 4433Mbit 416825pps backlog 5847785b 2394p requeues 0
 lended: 7021696 borrowed: 0 giants: 0
 tokens: 91 ctokens: -4

class htb 1:45 root leaf 45: prio 0 quantum 5 rate 80Mbit ceil
80Mbit linklayer ethernet burst 1b/1 mpu 0b cburst 1b/1 mpu 0b
level 0
 Sent 2586 bytes 45 pkt (dropped 0, overlimits 0 requeues 0)
 rate 456bit 1pps backlog 0b 0p requeues 0
 lended: 45 borrowed: 0 giants: 0
 tokens: 15540 ctokens: 15540

class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst
0b/1 mpu 0b cburst 0b/1 mpu 0b level 7
 Sent 7227721

HTB going crazy over ~5Gbit/s (4.12.9, but problem present in older kernels as well)

2017-09-13 Thread Denys Fedoryshchenko

Hi,

I noticed after increasing bandwidth over some amount HTB started to 
throttle classes it should not throttle.
Also estimated rate in htb totally wrong, while byte counters is 
correct.


Is there any overflow or something?

X520 card (but XL710 same)
br1 8000.90e2ba86c38c   no  eth3.1777
eth3.777
br2 8000.90e2ba86c38d   no  eth3.360
eth3.361

Inbound traffic is coming over one vlan, leaving another vlan.
Shaper is just bunch of classes and u32 filters, with few fw filters.
qdisc is pie

I put totally high values to not reach them, tried to change 
quantum/burst/cburst but... stats below.


First, "root" class is 1:1 showing  rate 18086Mbit, which is physically 
impossible.


Class 1:111 showing 5355Mbit, while real traffic is ~1.5Gbit

shaper /etc # tc -s -d class show dev eth3.777 classid 1:111;sleep 5;tc 
-s -d class show dev eth3.777 classid 1:111
class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 6487632263 bytes 5235525 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5529Mbit 557534pps backlog 0b 0p requeues 0
 lended: 2423323 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 7438601731 bytes 6003811 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5631Mbit 568214pps backlog 36624b 8p requeues 0
 lended: 2772486 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

(7438601731-6487632263)/5*8 = 1.521.551.148

And most important some classes suffering, while they should not (not 
reaching limits)
class htb 1:95 parent 1:1 leaf 95: prio 0 quantum 5 rate 10Gbit ceil 
100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0
 Sent 13556762059 bytes 17474559 pkt (dropped 16017, overlimits 0 
requeues 0)

 rate 2524Mbit 414197pps backlog 31969245b 34513p requeues 0
 lended: 13995723 borrowed: 0 giants: 0
 tokens: 111 ctokens: -2





Full classes stats:

class htb 1:100 parent 1:1 leaf 100: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 116 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
 rate 8bit 0pps backlog 0b 0p requeues 0
 lended: 2 borrowed: 0 giants: 0
 tokens: 124 ctokens: -1

class htb 1:120 parent 1:1 leaf 120: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 531230043 bytes 782130 pkt (dropped 0, overlimits 0 requeues 0)
 rate 132274Kbit 25240pps backlog 0b 0p requeues 0
 lended: 540693 borrowed: 0 giants: 0
 tokens: 109 ctokens: -2

class htb 1:50 parent 1:1 leaf 50: prio 0 quantum 5 rate 10Gbit ceil 
100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 773472109 bytes 587335 pkt (dropped 0, overlimits 0 requeues 0)
 rate 215929Kbit 20503pps backlog 0b 0p requeues 0
 lended: 216614 borrowed: 0 giants: 0
 tokens: 91 ctokens: -4

class htb 1:70 parent 1:1 leaf 70: prio 0 quantum 5 rate 10Gbit ceil 
100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 1574768 bytes 6194 pkt (dropped 0, overlimits 0 requeues 0)
 rate 406272bit 214pps backlog 0b 0p requeues 0
 lended: 5758 borrowed: 0 giants: 0
 tokens: 101 ctokens: -3

class htb 1:90 parent 1:1 leaf 90: prio 0 quantum 5 rate 1Kbit ceil 
100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0

 Sent 3206 bytes 53 pkt (dropped 0, overlimits 0 requeues 0)
 rate 848bit 1pps backlog 0b 0p requeues 0
 lended: 53 borrowed: 0 giants: 0

class htb 1:110 parent 1:1 leaf 110: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0
 Sent 17205952113 bytes 12926008 pkt (dropped 239, overlimits 0 requeues 
0)

 rate 4433Mbit 416825pps backlog 5847785b 2394p requeues 0
 lended: 7021696 borrowed: 0 giants: 0
 tokens: 91 ctokens: -4

class htb 1:45 root leaf 45: prio 0 quantum 5 rate 80Mbit ceil 
80Mbit linklayer ethernet burst 1b/1 mpu 0b cburst 1b/1 mpu 0b 
level 0

 Sent 2586 bytes 45 pkt (dropped 0, overlimits 0 requeues 0)
 rate 456bit 1pps backlog 0b 0p requeues 0
 lended: 45 borrowed: 0 giants: 0
 tokens: 15540 ctokens: 15540

class htb 1:1 root rate 100Gbit ceil 100Gbit linklayer ethernet burst 
0b/1 mpu 0b cburst 0b/1 mpu 0b level 7
 Sent 72277215121 bytes 72693012 pkt (dropped 0, overlimits 0 requeues 
0)

 rate 18086Mbit 2304729pps backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: -4 ctokens: -4

class htb 1:111 parent 1:1 leaf 111: prio 0 quantum 5 rate 10Gbit 
ceil 100Gbit linklayer ethernet burst 1b/1 mpu 0b cburst 0b/1 mpu 0b 
level 0
 Sent 21977384237 bytes 17697345

Re: ipset losing entries on its own

2017-09-06 Thread Denys Fedoryshchenko

On 2017-09-06 13:08, Akshat Kakkar wrote:

I am having ipset 6.32

The hash type is hash:ip

I am adding/deleting IP addresses to it dynamically using scripts.

However, it has been observed that at times few IPs (3-4 out of 4000)
are not found in the set though it was added. Also, logs show there
was not request for deletion of that IP from IPSet.

Is it a bug?


I think you should try to make script to create at least reproducible 
scenario
And sure post more info about your setup (kernel version, vanilla or 
distro)


Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-08-25 Thread Denys Fedoryshchenko

On 2017-08-25 08:21, Florian Westphal wrote:

Denys Fedoryshchenko  wrote:

>>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>>> approx 2gbps of pppoe users traffic) and noticed that after while server
>>> rebooting(i have set reboot on panic and etc).
>>> I can't run serial console, and in pstore / netconsole there is nothing.
>>> Best i got is some very short message about softlockup in ipmi, but as
>>> storage very limited there - it is near useless.
>>>
>>> By preliminary testing (can't do it much, as it's production) - it seems
>>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>>
>>Wild guess here, does this help?
>>
>>diff --git a/net/netfilter/nf_conntrack_helper.c
>>b/net/netfilter/nf_conntrack_helper.c
>>--- a/net/netfilter/nf_conntrack_helper.c
>>+++ b/net/netfilter/nf_conntrack_helper.c
>>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
>>struct nf_conn *tmpl,
>>help = nf_ct_helper_ext_add(ct, helper, flags);
>>if (help == NULL)
>>return -ENOMEM;
>>+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
>
>sigh, stupid typo, should be no ';' at the end above.
Sorry, is there any plans to push this to 4.12 stable queue?


No, sorry, this patch adds the extension for all connections
that use a helper, but the nat extension is only used/required by pptp
helper (and masquerade).

Thing is that this patch should not be needed, I will have
to review pptp again, maybe i missed a case where the extension is not
added.

Do you happen to have an oops backtrace?

That might speed this up a bit.
There is nothing in netconsole, and also nothing ERST pstore, i found 
reason just by guessing.

Its totally headless also (no screen, no serial console).
I can try to attach USB serial for serial console, but not sure it will 
help.
If there is any other way to catch - i can try it, but as it's 
production server, so i can't "crash it" more than once per day.





Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-08-24 Thread Denys Fedoryshchenko

On 2017-07-24 19:20, Florian Westphal wrote:

Florian Westphal  wrote:

Denys Fedoryshchenko  wrote:
> Hi,
>
> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> approx 2gbps of pppoe users traffic) and noticed that after while server
> rebooting(i have set reboot on panic and etc).
> I can't run serial console, and in pstore / netconsole there is nothing.
> Best i got is some very short message about softlockup in ipmi, but as
> storage very limited there - it is near useless.
>
> By preliminary testing (can't do it much, as it's production) - it seems
> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.

Wild guess here, does this help?

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c

--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
struct nf_conn *tmpl,

help = nf_ct_helper_ext_add(ct, helper, flags);
if (help == NULL)
return -ENOMEM;
+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));


sigh, stupid typo, should be no ';' at the end above.

Sorry, is there any plans to push this to 4.12 stable queue?


Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-07-26 Thread Denys Fedoryshchenko

On 2017-07-24 19:20, Florian Westphal wrote:

Florian Westphal  wrote:

Denys Fedoryshchenko  wrote:
> Hi,
>
> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> approx 2gbps of pppoe users traffic) and noticed that after while server
> rebooting(i have set reboot on panic and etc).
> I can't run serial console, and in pstore / netconsole there is nothing.
> Best i got is some very short message about softlockup in ipmi, but as
> storage very limited there - it is near useless.
>
> By preliminary testing (can't do it much, as it's production) - it seems
> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.

Wild guess here, does this help?

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c

--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
struct nf_conn *tmpl,

help = nf_ct_helper_ext_add(ct, helper, flags);
if (help == NULL)
return -ENOMEM;
+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));


sigh, stupid typo, should be no ';' at the end above.


Tested-by: Denys Fedoryshchenko 

Tested and no more hangs for 2 days, definitely improvement.
Any chance it will go to stable 4.12.x and new kernel?

Thank you very much!


Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-07-25 Thread Denys Fedoryshchenko

On 2017-07-24 19:20, Florian Westphal wrote:

Florian Westphal  wrote:

Denys Fedoryshchenko  wrote:
> Hi,
>
> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> approx 2gbps of pppoe users traffic) and noticed that after while server
> rebooting(i have set reboot on panic and etc).
> I can't run serial console, and in pstore / netconsole there is nothing.
> Best i got is some very short message about softlockup in ipmi, but as
> storage very limited there - it is near useless.
>
> By preliminary testing (can't do it much, as it's production) - it seems
> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.

Wild guess here, does this help?

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c

--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
struct nf_conn *tmpl,

help = nf_ct_helper_ext_add(ct, helper, flags);
if (help == NULL)
return -ENOMEM;
+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));


sigh, stupid typo, should be no ';' at the end above.


Tested, it looks like not hanging anymore (before it was hanging within 
10 minutes)

Probably i will wait 24h testing cycle.


nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-07-24 Thread Denys Fedoryshchenko

Hi,

I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, 
handling approx 2gbps of pppoe users traffic) and noticed that after 
while server rebooting(i have set reboot on panic and etc).

I can't run serial console, and in pstore / netconsole there is nothing.
Best i got is some very short message about softlockup in ipmi, but as 
storage very limited there - it is near useless.


By preliminary testing (can't do it much, as it's production) - it seems 
following lines causing issue, they worked in 4.11.8 and no more in 
4.12.3.


iptables -t raw -A PREROUTING -p tcp -m tcp --dport 1723 -j CT --helper 
pptp
iptables -t raw -A PREROUTING -p tcp -m tcp --sport 1723 -j CT --helper 
pptp


(there is no solid examples for helpers, not sure second line is 
necessary)


I will try to do more tests tonight (lockdep debug and etc), but maybe 
someone have idea what might be wrong?


Re: [PATCH net] netfilter: xt_TCPMSS: add more sanity tests on tcph->doff

2017-04-20 Thread Denys Fedoryshchenko

On 2017-04-08 23:24, Pablo Neira Ayuso wrote:

On Mon, Apr 03, 2017 at 10:55:11AM -0700, Eric Dumazet wrote:

From: Eric Dumazet 

Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS

I have provided three patches to fix this issue, either in xt_TCPMSS 
or

in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.


Applied to nf.git, thanks!

Any plans to queue it to stable trees?
It seems affected kernel for years.


Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-03 Thread Denys Fedoryshchenko

On 2017-04-03 15:09, Eric Dumazet wrote:

On Mon, 2017-04-03 at 11:10 +0300, Denys Fedoryshchenko wrote:


I modified patch a little as:
if (th->doff * 4 < sizeof(_tcph)) {
  par->hotdrop = true;
  WARN_ON_ONCE(!tcpinfo->option);
  return false;
}

And it did triggered WARN once at morning, and didn't hit KASAN. I 
will

run for a while more, to see if it is ok, and then if stable, will try
to enable SFQ again.


Excellent news !
We will post an official fix today, thanks a lot for this detective 
work

Denys.

I am not sure it is finally fixed, maybe we need test more?
I'm doing extensive tests today with identical configuration (i had to 
run fifo, because customer cannot afford anymore outages). I've dded sfq 
now different way, and identical config i will run after 3 hours approx.


Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-03 Thread Denys Fedoryshchenko

On 2017-04-02 20:26, Eric Dumazet wrote:

On Sun, 2017-04-02 at 10:14 -0700, Eric Dumazet wrote:


Could that be that netfilter does not abort earlier if TCP header is
completely wrong ?



Yes, I wonder if this patch would be better, unless we replicate the
th->doff sanity check in all netfilter modules dissecting TCP frames.

diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index
ade024c90f4f129a7c384e9e1cbfdb8ffe73065f..8cb4eadd5ba1c20e74bc27ee52a0bc36a5b26725
100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -103,11 +103,11 @@ static bool tcp_mt(const struct sk_buff *skb,
struct xt_action_param *par)
if (!NF_INVF(tcpinfo, XT_TCP_INV_FLAGS,
 		 (((unsigned char *)th)[13] & tcpinfo->flg_mask) == 
tcpinfo->flg_cmp))

return false;
+   if (th->doff * 4 < sizeof(_tcph)) {
+   par->hotdrop = true;
+   return false;
+   }
if (tcpinfo->option) {
-   if (th->doff * 4 < sizeof(_tcph)) {
-   par->hotdrop = true;
-   return false;
-   }
if (!tcp_find_option(tcpinfo->option, skb, par->thoff,
 th->doff*4 - sizeof(_tcph),
 tcpinfo->invflags & XT_TCP_INV_OPTION,

I modified patch a little as:
if (th->doff * 4 < sizeof(_tcph)) {
 par->hotdrop = true;
 WARN_ON_ONCE(!tcpinfo->option);
 return false;
}

And it did triggered WARN once at morning, and didn't hit KASAN. I will 
run for a while more, to see if it is ok, and then if stable, will try 
to enable SFQ again.


Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko

On 2017-04-02 15:32, Eric Dumazet wrote:

On Sun, 2017-04-02 at 15:25 +0300, Denys Fedoryshchenko wrote:

> */
I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for
curiosity, if this condition are triggered. Is it fine like that?


Sure.


It didnt triggered WARN_ON, and with both patches here is one more 
KASAN.
What i noticed also after this KASAN, there is many others start to 
trigger in TCPMSS and locking up server by flood.

There is heavy netlink activity, it is pppoe server with lot of shapers.
I noticed there left sfq by mistake, usually i am removing it, because 
it may trigger kernel panic too (and hard to trace reason).

I will try with pfifo instead, after 6 hours.

Here is full log with others: https://nuclearcat.com/kasan.txt


[ 2033.914478] 
==
[ 2033.914855] BUG: KASAN: slab-out-of-bounds in tcpmss_tg4+0x6cc/0xee4 
[xt_TCPMSS] at addr 8802bfe18140

[ 2033.915218] Read of size 1 by task swapper/1/0
[ 2033.915437] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.10.8-build-0136-debug #7
[ 2033.915787] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[ 2033.916010] Call Trace:
[ 2033.916229]  
[ 2033.916449]  dump_stack+0x99/0xd4
[ 2033.916662]  ? _atomic_dec_and_lock+0x15d/0x15d
[ 2033.916886]  ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.917110]  kasan_object_err+0x21/0x81
[ 2033.917335]  kasan_report+0x527/0x69d
[ 2033.917557]  ? tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.917772]  __asan_report_load1_noabort+0x19/0x1b
[ 2033.917995]  tcpmss_tg4+0x6cc/0xee4 [xt_TCPMSS]
[ 2033.918222]  ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS]
[ 2033.918451]  ? udp_mt+0x45a/0x45a [xt_tcpudp]
[ 2033.918669]  ? __fib_validate_source+0x46b/0xcd1
[ 2033.918895]  ipt_do_table+0x1432/0x1573 [ip_tables]
[ 2033.919114]  ? ip_tables_net_init+0x15/0x15 [ip_tables]
[ 2033.919338]  ? ip_route_input_slow+0xe9f/0x17e3
[ 2033.919562]  ? rt_set_nexthop+0x9a7/0x9a7
[ 2033.919790]  ? ip_tables_net_exit+0xe/0x15 [ip_tables]
[ 2033.920008]  ? tcf_action_exec+0x14a/0x18c
[ 2033.920227]  ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle]
[ 2033.920451]  ? iptable_filter_net_exit+0x92/0x92 [iptable_filter]
[ 2033.920667]  iptable_filter_hook+0xc0/0x1c8 [iptable_filter]
[ 2033.920882]  nf_hook_slow+0x7d/0x121
[ 2033.921105]  ip_forward+0x1183/0x11c6
[ 2033.921321]  ? ip_forward_finish+0x168/0x168
[ 2033.921542]  ? ip_frag_mem+0x43/0x43
[ 2033.921755]  ? iptable_nat_net_exit+0x92/0x92 [iptable_nat]
[ 2033.921981]  ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4]
[ 2033.922199]  ip_rcv_finish+0xf4c/0xf5b
[ 2033.922420]  ip_rcv+0xb41/0xb72
[ 2033.922635]  ? ip_local_deliver+0x282/0x282
[ 2033.922847]  ? ip_local_deliver_finish+0x6e6/0x6e6
[ 2033.923073]  ? ip_local_deliver+0x282/0x282
[ 2033.923291]  __netif_receive_skb_core+0x1b27/0x21bf
[ 2033.923510]  ? netdev_rx_handler_register+0x1a6/0x1a6
[ 2033.923736]  ? kasan_slab_free+0x137/0x154
[ 2033.923954]  ? save_stack_trace+0x1b/0x1d
[ 2033.924170]  ? kasan_slab_free+0xaa/0x154
[ 2033.924387]  ? net_rx_action+0x6ad/0x6dc
[ 2033.924611]  ? __do_softirq+0x22b/0x5df
[ 2033.924826]  ? irq_exit+0x8a/0xfe
[ 2033.925048]  ? do_IRQ+0x13d/0x155
[ 2033.925269]  ? common_interrupt+0x83/0x83
[ 2033.925483]  ? mwait_idle+0x15a/0x30d
[ 2033.925704]  ? napi_gro_flush+0x1d0/0x1d0
[ 2033.925928]  ? start_secondary+0x2cc/0x2d5
[ 2033.926142]  ? start_cpu+0x14/0x14
[ 2033.926354]  __netif_receive_skb+0x5e/0x191
[ 2033.926576]  process_backlog+0x295/0x573
[ 2033.926799]  ? __netif_receive_skb+0x191/0x191
[ 2033.927022]  napi_poll+0x311/0x745
[ 2033.927245]  ? napi_complete_done+0x3b4/0x3b4
[ 2033.927460]  ? igb_msix_ring+0x2d/0x35
[ 2033.927679]  net_rx_action+0x2e8/0x6dc
[ 2033.927903]  ? napi_poll+0x745/0x745
[ 2033.928133]  ? sched_clock_cpu+0x1f/0x18c
[ 2033.928360]  ? rps_trigger_softirq+0x181/0x1e4
[ 2033.928592]  ? __tick_nohz_idle_enter+0x465/0xa6d
[ 2033.928817]  ? rps_may_expire_flow+0x29b/0x29b
[ 2033.929038]  ? irq_work_run+0x2c/0x2e
[ 2033.929253]  __do_softirq+0x22b/0x5df
[ 2033.929464]  ? smp_call_function_single_async+0x17d/0x17d
[ 2033.929680]  irq_exit+0x8a/0xfe
[ 2033.929905]  smp_call_function_single_interrupt+0x8d/0x90
[ 2033.930136]  call_function_single_interrupt+0x83/0x90
[ 2033.930365] RIP: 0010:mwait_idle+0x15a/0x30d
[ 2033.930581] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: 
ff04
[ 2033.930934] RAX:  RBX: 8802d1000c80 RCX: 

[ 2033.931160] RDX: 11005a200190 RSI:  RDI: 

[ 2033.931383] RBP: 8802d1017e98 R08: ed00583c4fc1 R09: 
0080
[ 2033.931596] R10: 8802d1017d80 R11: ed00583c4fc1 R12: 
0001
[ 2033.931808] R13:  R14: 8802d1000c80 R15: 
dc00

[ 2033.932031]  
[ 2033.932247]  arch_cpu_idle+0xf/0x11
[ 2033.932472]  default_idle_call+0x59/0x5c
[ 2033.932686]  do_idle+0x11c/0x217
[ 2033.932906]  cpu_startup_entry+0x1

Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko

On 2017-04-02 15:19, Eric Dumazet wrote:

On Sun, 2017-04-02 at 04:54 -0700, Eric Dumazet wrote:

On Sun, 2017-04-02 at 13:45 +0200, Florian Westphal wrote:
> Eric Dumazet  wrote:
> > - for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
> > + for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
> >   if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
> >   u_int16_t oldmss;
>
> maybe I am low on caffeeine but this looks fine, for tcp header with
> only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets 20-23 
which seems ok.

I am definitely low on caffeine ;)

An issue in this function is that we might add the missing MSS option,
without checking that TCP options are already full.

But this should not cause a KASAN splat, only some malformed TCP 
packet


(tcph->doff would wrap)


Something like that maybe.

diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index
27241a767f17b4b27d24095a31e5e9a2d3e29ce4..1465aaf0e3a15d69d105d0a50b0429b11b6439d3
100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -151,7 +151,9 @@ tcpmss_mangle_packet(struct sk_buff *skb,
 */
if (len > tcp_hdrlen)
return 0;
-
+   /* tcph->doff is 4 bits wide, do not wrap its value to 0 */
+   if (tcp_hdrlen >= 15 * 4)
+   return 0;
/*
 * MSS Option not found ?! add it..
 */
I will add also WARN_ON_ONCE(tcp_hdrlen >= 15 * 4) before, for 
curiosity, if this condition are triggered. Is it fine like that?


Re: KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko

On 2017-04-02 14:45, Florian Westphal wrote:

Eric Dumazet  wrote:
-	for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {
+	for (i = sizeof(struct tcphdr); i < tcp_hdrlen - TCPOLEN_MSS; i += 
optlen(opt, i)) {

if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
u_int16_t oldmss;


maybe I am low on caffeeine but this looks fine, for tcp header with
only tcpmss this boils down to "20 <= 24 - 4" so we acccess offsets
20-23 which seems ok.
It seems some non-standard(or corrupted) packets are passing, because 
even on ~1G server it might cause corruption once per several days, 
KASAN seems need less time to trigger.


I am not aware how things working, but:
[25181.875696] Memory state around the buggy address:
[25181.875919]  8802975fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876275]  88029760: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876628] >880297600080: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.876984]
^
[25181.877203]  880297600100: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[25181.877569]  880297600180: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00

Why all data here is zero? I guess it should be some packet data?


KASAN, xt_TCPMSS finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko

Repost, due being sleepy missed few important points.

I am searching reasons of crashes for multiple conntrack enabled 
servers, usually they point to conntrack, but i suspect use after free 
might be somewhere else,

so i tried to enable KASAN.
And seems i got something after few hours, and it looks related to all 
crashes, because on all that servers who rebooted i had MSS adjustment 
(--clamp-mss-to-pmtu or --set-mss).

Please let me know if any additional information needed.

[25181.855611] 
==
[25181.855985] BUG: KASAN: use-after-free in tcpmss_tg4+0x682/0xe9c 
[xt_TCPMSS] at addr 8802976000ea

[25181.856344] Read of size 1 by task swapper/1/0
[25181.856555] page:ea000a5d8000 count:0 mapcount:0 mapping: 
 (null) index:0x0

[25181.856909] flags: 0x1000()
[25181.857123] raw: 1000   

[25181.857630] raw: ea000b0444a0 ea000a0b1f60  


[25181.857996] page dumped because: kasan: bad access detected
[25181.858214] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.10.8-build-0133-debug #3
[25181.858571] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[25181.858786] Call Trace:
[25181.859000]  
[25181.859215]  dump_stack+0x99/0xd4
[25181.859423]  ? _atomic_dec_and_lock+0x15d/0x15d
[25181.859644]  ? __dump_page+0x447/0x4e3
[25181.859859]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860080]  kasan_report+0x577/0x69d
[25181.860291]  ? __ip_route_output_key_hash+0x14ce/0x1503
[25181.860512]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860736]  __asan_report_load1_noabort+0x19/0x1b
[25181.860956]  tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.861180]  ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS]
[25181.861407]  ? udp_mt+0x45a/0x45a [xt_tcpudp]
[25181.861634]  ? __fib_validate_source+0x46b/0xcd1
[25181.861860]  ipt_do_table+0x1432/0x1573 [ip_tables]
[25181.862088]  ? igb_msix_ring+0x2d/0x35
[25181.862318]  ? ip_tables_net_init+0x15/0x15 [ip_tables]
[25181.862537]  ? ip_route_input_slow+0xe9f/0x17e3
[25181.862759]  ? handle_irq_event_percpu+0x141/0x141
[25181.862985]  ? rt_set_nexthop+0x9a7/0x9a7
[25181.863203]  ? ip_tables_net_exit+0xe/0x15 [ip_tables]
[25181.863419]  ? tcf_action_exec+0xce/0x18c
[25181.863628]  ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle]
[25181.863856]  ? iptable_filter_net_exit+0x92/0x92 [iptable_filter]
[25181.864084]  iptable_filter_hook+0xc0/0x1c8 [iptable_filter]
[25181.864311]  nf_hook_slow+0x7d/0x121
[25181.864536]  ip_forward+0x1183/0x11c6
[25181.864752]  ? ip_forward_finish+0x168/0x168
[25181.864967]  ? ip_frag_mem+0x43/0x43
[25181.865194]  ? iptable_nat_net_exit+0x92/0x92 [iptable_nat]
[25181.865423]  ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4]
[25181.865648]  ip_rcv_finish+0xf4c/0xf5b
[25181.865861]  ip_rcv+0xb41/0xb72
[25181.866086]  ? ip_local_deliver+0x282/0x282
[25181.866308]  ? ip_local_deliver_finish+0x6e6/0x6e6
[25181.866524]  ? ip_local_deliver+0x282/0x282
[25181.866752]  __netif_receive_skb_core+0x1b27/0x21bf
[25181.866971]  ? netdev_rx_handler_register+0x1a6/0x1a6
[25181.867186]  ? enqueue_hrtimer+0x232/0x240
[25181.867401]  ? hrtimer_start_range_ns+0xd1c/0xd4b
[25181.867630]  ? __ppp_xmit_process+0x101f/0x104e [ppp_generic]
[25181.867852]  ? hrtimer_cancel+0x20/0x20
[25181.868081]  ? ppp_push+0x1402/0x1402 [ppp_generic]
[25181.868301]  ? __pskb_pull_tail+0xb0f/0xb25
[25181.868523]  ? ppp_xmit_process+0x47/0xaf [ppp_generic]
[25181.868749]  __netif_receive_skb+0x5e/0x191
[25181.868968]  process_backlog+0x295/0x573
[25181.869180]  ? __netif_receive_skb+0x191/0x191
[25181.869401]  napi_poll+0x311/0x745
[25181.869611]  ? napi_complete_done+0x3b4/0x3b4
[25181.869836]  ? __qdisc_run+0x4ec/0xb7f
[25181.870061]  ? sch_direct_xmit+0x60b/0x60b
[25181.870286]  net_rx_action+0x2e8/0x6dc
[25181.870512]  ? napi_poll+0x745/0x745
[25181.870732]  ? rps_trigger_softirq+0x181/0x1e4
[25181.870956]  ? rps_may_expire_flow+0x29b/0x29b
[25181.871184]  ? irq_work_run+0x2c/0x2e
[25181.871411]  __do_softirq+0x22b/0x5df
[25181.871629]  ? smp_call_function_single_async+0x17d/0x17d
[25181.871854]  irq_exit+0x8a/0xfe
[25181.872069]  smp_call_function_single_interrupt+0x8d/0x90
[25181.872297]  call_function_single_interrupt+0x83/0x90
[25181.872519] RIP: 0010:mwait_idle+0x15a/0x30d
[25181.872733] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: 
ff04
[25181.873091] RAX:  RBX: 8802d1000c80 RCX: 

[25181.873311] RDX: 11005a200190 RSI:  RDI: 

[25181.873532] RBP: 8802d1017e98 R08: 003f R09: 
7f75f7fff700
[25181.873751] R10: 8802d1017d80 R11: 8802c9b0 R12: 
0001
[25181.873971] R13:  R14: 8802d1000c80 R15: 
dc00

[25181.874182]  
[25181.874393]  arch_cpu_idle+0xf/0x11
[25181.874602]  default_idle_call+0x59/0x5c
[25181.874818]  do_idle+0x11c/0x217
[2

finally found nasty use-after-free bug? 4.10.8

2017-04-02 Thread Denys Fedoryshchenko
I am searching reasons of crashes for multiple NAT servers, and tried to 
enable KASAN.
It seems i got something, and it looks very possible related to all 
crashes, because on all that servers i have MSS.



[25181.855611] 
==
[25181.855985] BUG: KASAN: use-after-free in tcpmss_tg4+0x682/0xe9c 
[xt_TCPMSS] at addr 8802976000ea

[25181.856344] Read of size 1 by task swapper/1/0
[25181.856555] page:ea000a5d8000 count:0 mapcount:0 mapping: 
 (null) index:0x0

[25181.856909] flags: 0x1000()
[25181.857123] raw: 1000   

[25181.857630] raw: ea000b0444a0 ea000a0b1f60  


[25181.857996] page dumped because: kasan: bad access detected
[25181.858214] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.10.8-build-0133-debug #3
[25181.858571] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[25181.858786] Call Trace:
[25181.859000]  
[25181.859215]  dump_stack+0x99/0xd4
[25181.859423]  ? _atomic_dec_and_lock+0x15d/0x15d
[25181.859644]  ? __dump_page+0x447/0x4e3
[25181.859859]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860080]  kasan_report+0x577/0x69d
[25181.860291]  ? __ip_route_output_key_hash+0x14ce/0x1503
[25181.860512]  ? tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.860736]  __asan_report_load1_noabort+0x19/0x1b
[25181.860956]  tcpmss_tg4+0x682/0xe9c [xt_TCPMSS]
[25181.861180]  ? tcpmss_tg4_check+0x287/0x287 [xt_TCPMSS]
[25181.861407]  ? udp_mt+0x45a/0x45a [xt_tcpudp]
[25181.861634]  ? __fib_validate_source+0x46b/0xcd1
[25181.861860]  ipt_do_table+0x1432/0x1573 [ip_tables]
[25181.862088]  ? igb_msix_ring+0x2d/0x35
[25181.862318]  ? ip_tables_net_init+0x15/0x15 [ip_tables]
[25181.862537]  ? ip_route_input_slow+0xe9f/0x17e3
[25181.862759]  ? handle_irq_event_percpu+0x141/0x141
[25181.862985]  ? rt_set_nexthop+0x9a7/0x9a7
[25181.863203]  ? ip_tables_net_exit+0xe/0x15 [ip_tables]
[25181.863419]  ? tcf_action_exec+0xce/0x18c
[25181.863628]  ? iptable_mangle_net_exit+0x92/0x92 [iptable_mangle]
[25181.863856]  ? iptable_filter_net_exit+0x92/0x92 [iptable_filter]
[25181.864084]  iptable_filter_hook+0xc0/0x1c8 [iptable_filter]
[25181.864311]  nf_hook_slow+0x7d/0x121
[25181.864536]  ip_forward+0x1183/0x11c6
[25181.864752]  ? ip_forward_finish+0x168/0x168
[25181.864967]  ? ip_frag_mem+0x43/0x43
[25181.865194]  ? iptable_nat_net_exit+0x92/0x92 [iptable_nat]
[25181.865423]  ? nf_nat_ipv4_in+0xf0/0x209 [nf_nat_ipv4]
[25181.865648]  ip_rcv_finish+0xf4c/0xf5b
[25181.865861]  ip_rcv+0xb41/0xb72
[25181.866086]  ? ip_local_deliver+0x282/0x282
[25181.866308]  ? ip_local_deliver_finish+0x6e6/0x6e6
[25181.866524]  ? ip_local_deliver+0x282/0x282
[25181.866752]  __netif_receive_skb_core+0x1b27/0x21bf
[25181.866971]  ? netdev_rx_handler_register+0x1a6/0x1a6
[25181.867186]  ? enqueue_hrtimer+0x232/0x240
[25181.867401]  ? hrtimer_start_range_ns+0xd1c/0xd4b
[25181.867630]  ? __ppp_xmit_process+0x101f/0x104e [ppp_generic]
[25181.867852]  ? hrtimer_cancel+0x20/0x20
[25181.868081]  ? ppp_push+0x1402/0x1402 [ppp_generic]
[25181.868301]  ? __pskb_pull_tail+0xb0f/0xb25
[25181.868523]  ? ppp_xmit_process+0x47/0xaf [ppp_generic]
[25181.868749]  __netif_receive_skb+0x5e/0x191
[25181.868968]  process_backlog+0x295/0x573
[25181.869180]  ? __netif_receive_skb+0x191/0x191
[25181.869401]  napi_poll+0x311/0x745
[25181.869611]  ? napi_complete_done+0x3b4/0x3b4
[25181.869836]  ? __qdisc_run+0x4ec/0xb7f
[25181.870061]  ? sch_direct_xmit+0x60b/0x60b
[25181.870286]  net_rx_action+0x2e8/0x6dc
[25181.870512]  ? napi_poll+0x745/0x745
[25181.870732]  ? rps_trigger_softirq+0x181/0x1e4
[25181.870956]  ? rps_may_expire_flow+0x29b/0x29b
[25181.871184]  ? irq_work_run+0x2c/0x2e
[25181.871411]  __do_softirq+0x22b/0x5df
[25181.871629]  ? smp_call_function_single_async+0x17d/0x17d
[25181.871854]  irq_exit+0x8a/0xfe
[25181.872069]  smp_call_function_single_interrupt+0x8d/0x90
[25181.872297]  call_function_single_interrupt+0x83/0x90
[25181.872519] RIP: 0010:mwait_idle+0x15a/0x30d
[25181.872733] RSP: 0018:8802d1017e78 EFLAGS: 0246 ORIG_RAX: 
ff04
[25181.873091] RAX:  RBX: 8802d1000c80 RCX: 

[25181.873311] RDX: 11005a200190 RSI:  RDI: 

[25181.873532] RBP: 8802d1017e98 R08: 003f R09: 
7f75f7fff700
[25181.873751] R10: 8802d1017d80 R11: 8802c9b0 R12: 
0001
[25181.873971] R13:  R14: 8802d1000c80 R15: 
dc00

[25181.874182]  
[25181.874393]  arch_cpu_idle+0xf/0x11
[25181.874602]  default_idle_call+0x59/0x5c
[25181.874818]  do_idle+0x11c/0x217
[25181.875039]  cpu_startup_entry+0x1f/0x21
[25181.875258]  start_secondary+0x2cc/0x2d5
[25181.875481]  start_cpu+0x14/0x14
[25181.875696] Memory state around the buggy address:
[25181.875919]  8802975fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[25181.876275]  fff

Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2017-03-31 Thread Denys Fedoryshchenko
I am not sure if it is same issue, but panics still happen, but much 
less. Same server, nat.
I will upgrade to latest 4.10.x build, because for this one i dont have 
files anymore (for symbols and etc).


 [864288.511464] Modules linked in: nf_conntrack_netlink nf_nat_pptp 
nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp 
nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net 
ip_set nfnetlink iptable_raw iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp 
llc bonding ixgbe dca
 [864288.512740] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
4.10.1-build-0132 #2
 [864288.513005] Hardware name: Intel Corporation S2600WTT/S2600WTT, 
BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016

 [864288.513454] task: 881038cb6000 task.stack: c9000c678000
 [864288.513719] RIP: 0010:nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat]
 [864288.513980] RSP: 0018:88103fc43ba0 EFLAGS: 00010206
 [864288.514237] RAX: 140504021ad8 RBX: 881004021ad8 RCX: 
0100
 [864288.514677] RDX: 140504021ad8 RSI: 88103279628c RDI: 
88103279628c
 [864288.515117] RBP: 88103fc43be0 R08: c9003b47b558 R09: 
0004
 [864288.515558] R10: 8820083d00ce R11: 881038480b00 R12: 
881004021a40
 [864288.515998] R13:  R14: a00d406e R15: 
c90036e11000
 [864288.516438] FS:  () GS:88103fc4() 
knlGS:

 [864288.516882] CS:  0010 DS:  ES:  CR0: 80050033
 [864288.517142] CR2: 7fbfc303f978 CR3: 00202267c000 CR4: 
001406e0

 [864288.517580] Call Trace:
 [864288.517831]  
 [864288.518090]  __nf_ct_ext_destroy+0x3f/0x57 [nf_conntrack]
 [864288.518352]  nf_conntrack_free+0x25/0x55 [nf_conntrack]
 [864288.518615]  destroy_conntrack+0x80/0x8c [nf_conntrack]
 [864288.518880]  nf_conntrack_destroy+0x19/0x1b
 [864288.519137]  nf_ct_gc_expired+0x6e/0x71 [nf_conntrack]
 [864288.519400]  __nf_conntrack_find_get+0x89/0x2ab [nf_conntrack]
 [864288.519663]  nf_conntrack_in+0x1ec/0x877 [nf_conntrack]
 [864288.519925]  ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4]
 [864288.520185]  nf_hook_slow+0x2a/0x9a
 [864288.520439]  ip_rcv+0x318/0x337
 [864288.520692]  ? ip_local_deliver_finish+0x1ba/0x1ba
 [864288.520953]  __netif_receive_skb_core+0x607/0x852
 [864288.521213]  ? kmem_cache_free_bulk+0x232/0x274
 [864288.521471]  __netif_receive_skb+0x18/0x5a
 [864288.521727]  process_backlog+0x90/0x113
 [864288.521981]  net_rx_action+0x114/0x2dc
 [864288.522238]  ? sched_clock_cpu+0x15/0x94
 [864288.522496]  __do_softirq+0xe7/0x259
 [864288.522753]  irq_exit+0x52/0x93
 [864288.523006]  smp_call_function_single_interrupt+0x33/0x35
 [864288.523267]  call_function_single_interrupt+0x83/0x90
 [864288.523531] RIP: 0010:mwait_idle+0x9e/0x125
 [864288.523786] RSP: 0018:c9000c67beb0 EFLAGS: 0246 ORIG_RAX: 
ff04
 [864288.524229] RAX:  RBX: 881038cb6000 RCX: 

 [864288.524669] RDX:  RSI:  RDI: 

 [864288.525110] RBP: c9000c67bec0 R08: 0001 R09: 

 [864288.525551] R10: c9000c67be50 R11:  R12: 
0011
 [864288.525991] R13:  R14: 881038cb6000 R15: 
881038cb6000

 [864288.526429]  
 [864288.526682]  arch_cpu_idle+0xf/0x11
 [864288.526937]  default_idle_call+0x25/0x27
 [864288.527193]  do_idle+0xb6/0x15d
 [864288.527446]  cpu_startup_entry+0x1f/0x21
 [864288.527702]  start_secondary+0xe8/0xeb
 [864288.527961]  start_cpu+0x14/0x14
 [864288.528212] Code: 48 89 f7 48 89 75 c8 e8 6e e8 8f e1 8b 45 c4 48 
8b 75 c8 48 83 c0 08 4d 8d 04 c7 49 8b 04 c7 a8 01 75 46 48 39 c3 74 1e 
48 89 c2 <48> 8b 7a 08 48 85 ff 0f 84 b3 00 00 00 48 39 fb 0f 84 9e 00 
00
 [864288.528905] RIP: nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat] RSP: 
88103fc43ba0

 [864288.529362] ---[ end trace e3c40a5e4bf43e26 ]---
 [864288.567835] Kernel panic - not syncing: Fatal exception in 
interrupt

 [864288.568122] Kernel Offset: disabled
 [864288.587619] Rebooting in 5 seconds..


__nf_conntrack_find_get - NMI watchdog, 4.10.5

2017-03-25 Thread Denys Fedoryshchenko

Hi,

While applying/removing shapers on few thousands of ppp interfaces got 
pppoe server rebooted with this message:
[51306.144984] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[swapper/0:0]
[51306.145319] Modules linked in: sch_sfq cls_fw act_police cls_u32 
sch_ingress sch_htb pppoe pppox ppp_generic slhc netconsole configfs 
coretemp nf_nat_pptp nf_nat_proto_gre nf_conntr
ack_pptp nf_conntrack_proto_gre tun xt_REDIRECT nf_nat_redirect xt_nat 
xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set ts_bm xt_string xt_connmark 
xt_DSCP xt_mark xt_tcpudp ip_set_hash_net ip_set_hash_ip ip_set nfnet
link iptable_mangle iptable_filter iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 8021q 
garp mrp stp llc
[51306.146381] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.10.5-build-0132 #2
[51306.146577] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[51306.146775] task: 8200e4c0 task.stack: 8200
[51306.146976] RIP: 0010:__nf_conntrack_find_get+0x23/0x2ab 
[nf_conntrack]
[51306.147173] RSP: 0018:880436403c50 EFLAGS: 0203 ORIG_RAX: 
ff10
[51306.147505] RAX: 60fbc8a03408 RBX: 7fffc4020277d00a RCX: 
b9a11ba7
[51306.147703] RDX: 88042a40 RSI: 81c03e00 RDI: 
820d2340
[51306.147900] RBP: 880436403c80 R08: 4f33ff15 R09: 
820d2340
[51306.148098] R10: 88041032bd10 R11:  R12: 
880436403cd8
[51306.148295] R13: 820d2340 R14: 81c03e00 R15: 
0005cd08
[51306.148492] FS:  () GS:88043640() 
knlGS:

[51306.148824] CS:  0010 DS:  ES:  CR0: 80050033
[51306.149020] CR2: 7f455449f768 CR3: 00042bd64000 CR4: 
001406f0

[51306.149217] Call Trace:
[51306.149410]  
[51306.149605]  nf_conntrack_in+0x1ec/0x877 [nf_conntrack]
[51306.149804]  ? _raw_read_unlock_bh+0x20/0x22
[51306.15]  ? ppp_input+0x14c/0x157 [ppp_generic]
[51306.150196]  ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4]
[51306.150394]  nf_hook_slow+0x2a/0x9a
[51306.150589]  ip_rcv+0x318/0x337
[51306.150782]  ? ip_local_deliver_finish+0x1ba/0x1ba
[51306.150980]  __netif_receive_skb_core+0x607/0x852
[51306.151178]  ? swiotlb_sync_single+0x16/0x24
[51306.151373]  __netif_receive_skb+0x18/0x5a
[51306.151566]  process_backlog+0x90/0x113
[51306.151761]  net_rx_action+0x114/0x2dc
[51306.151955]  ? igb_msix_ring+0x2e/0x36
[51306.152151]  __do_softirq+0xe7/0x259
[51306.152347]  irq_exit+0x52/0x93
[51306.152541]  do_IRQ+0xaa/0xc2
[51306.152735]  common_interrupt+0x83/0x83
[51306.152931] RIP: 0010:mwait_idle+0x9e/0x125
[51306.153125] RSP: 0018:82003e28 EFLAGS: 0246 ORIG_RAX: 
ff1d
[51306.153459] RAX:  RBX: 8200e4c0 RCX: 

[51306.153656] RDX:  RSI:  RDI: 

[51306.153853] RBP: 82003e38 R08: 009e R09: 

[51306.154051] R10: 82003dc8 R11:  R12: 

[51306.154247] R13:  R14: 8200e4c0 R15: 
8200e4c0

[51306.15]  
[51306.157951]  arch_cpu_idle+0xf/0x11
[51306.158143]  default_idle_call+0x25/0x27
[51306.158336]  do_idle+0xb6/0x15d
[51306.158527]  cpu_startup_entry+0x1f/0x21
[51306.158721]  rest_init+0x77/0x79
[51306.158915]  start_kernel+0x3c9/0x3d6
[51306.159109]  x86_64_start_reservations+0x2a/0x2c
[51306.159307]  x86_64_start_kernel+0x16a/0x178
[51306.159507]  start_cpu+0x14/0x14
[51306.159701] Code: e8 ed d3 93 e1 5b 5d c3 0f 1f 44 00 00 55 89 c8 48 
89 e5 41 57 41 56 49 89 f6 41 55 49 89 fd 41 54 49 89 d4 53 41 50 48 89 
45 d0 <8b> 05 b4 b1 00 00 a8 01 74 04 f3

 90 eb f2 44 8b 3d ad b1 00 00
[51306.160194] Kernel panic - not syncing: softlockup: hung tasks
[51306.160392] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L  
4.10.5-build-0132 #2
[51306.160725] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015

[51306.160922] Call Trace:
[51306.161115]  
[51306.161310]  dump_stack+0x4d/0x63
[51306.161507]  panic+0xd2/0x215
[51306.161702]  watchdog_timer_fn+0x1a9/0x1cb
[51306.161896]  __hrtimer_run_queues+0xe4/0x1e3
[51306.162092]  ? ktime_get_update_offsets_now+0x4f/0xef
[51306.162288]  hrtimer_interrupt+0xa5/0x167
[51306.162483]  local_apic_timer_interrupt+0x4b/0x4e
[51306.162679]  smp_apic_timer_interrupt+0x38/0x48
[51306.162876]  apic_timer_interrupt+0x83/0x90
[51306.163073] RIP: 0010:__nf_conntrack_find_get+0x23/0x2ab 
[nf_conntrack]
[51306.163269] RSP: 0018:880436403c50 EFLAGS: 0203 ORIG_RAX: 
ff10
[51306.163601] RAX: 60fbc8a03408 RBX: 7fffc4020277d00a RCX: 
b9a11ba7
[51306.163798] RDX: 88042a40 RSI: 81c03e00 RDI: 
820d2340
[51306.163996] RBP: 880436403c80 R08: 4f33ff15 R09: 
820d2340
[51306.164195] R10: 88041032bd10 R11:  R12: 
880436403cd8
[51306.164394] R13: 8

4.9.4 panic, nf_conntrack_tuple_taken

2017-02-12 Thread Denys Fedoryshchenko

Hi,

Seems i'm quite "lucky" and hitting another bug.
This time it is different server, but i believe i've seen this bug on 
few pppoe servers, but here it is happening once per 1-2 days.


Out of tree patch applied, to optimize gc heuristics. I don't exclude 
(but very small chance) hardware issue, and this bug very hard to call 
trace/panic message,
i dont know why, but it was not storing it in pstore, and once stored 
only half of message.
It happens on 4.9.9 as well, but didnt captured call trace yet, if it is 
same or not, this is only one trace i was able to catch.
Also might be related to fragmentation/tunnels, because reboots started 
when i ran ipip ddos protection tunnel.


<4>[160340.861244] general protection fault:  [#1] SMP
<4>[160340.861527] Modules linked in: ioatdma w83l786ng w83l785ts w83795 
w83793 w83792d w83791d w83781d w83627ehf vt8231 via686a tmp421 tmp401 
tmp102 thmc50 tc74 smsc47m192 smm665 sis5595 sht21 sht15 pmbus_core 
pcf8591 ntc_thermistor nct7904 nct7802 nct6775 mcp3021 max6697 max6650 
max6642 max6639 max31790 max197 max1668 max1619 max16065 max ltc4261 
ltc4245 ltc4215 ltc4151 ltc2990 lm95245 lm95241 lm95234 lm93 lm92 lm90 
lm87 lm85 lm83 lm80 lm78 lm77 lm75 lm73 lm70 lm63 lineage_pem k8temp 
k10temp jc42 ina3221 ina2xx ina209 ibmpex ibmaem i5k_amb i5500_temp 
hwmon_vid hih6130 gpio_fan gl518sm g760a ftsteutates fschmd fam15h_power 
f75375s emc6w201 emc2103 emc1403 ds620 ds1621 coretemp asus_atk0110 
asc7621 amc6821 adt7x10 adt7470 adt7462 adt7411 ads7871 ads7828 ads1015 
adm1031 adm1029 adm1021 adcxx ad7418 ad7414
<4>[160340.870563]  ad7314 acpi_power_meter cls_u32 sch_pie sch_htb msr 
ipmi_devintf ipmi_si ipmi_msghandler xt_nat xt_set xt_mark xt_connmark 
iptable_raw xt_CT ip_set_hash_net ip_set nfnetlink xt_hl xt_TCPMSS 
xt_tcpudp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp 
nf_conntrack_proto_gre iptable_filter iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables 
x_tables netconsole configfs ipip tunnel4 ip_tunnel 8021q garp mrp stp 
llc ixgbe dca
<4>[160340.875258] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 
4.9.4-build-0130 #4
<4>[160340.875529] Hardware name: Supermicro X10SLM+-LN4F/X10SLM+-LN4F, 
BIOS 3.0a 12/17/2015

<4>[160340.875981] task: 88040d5bd5c0 task.stack: c9000194
<4>[160340.876247] RIP: 0010:[]  [] 
nf_conntrack_tuple_taken+0x68/0x196 [nf_conntrack]

<4>[160340.876789] RSP: 0018:88041fdc37c0  EFLAGS: 00010246
<4>[160340.877053] RAX: 02530d1f RBX: ffb00404024062c8 RCX: 
0001
<4>[160340.877506] RDX: 1f2f RSI: f3476b40 RDI: 
8803f9542640
<4>[160340.877956] RBP: 88041fdc37f0 R08: 2682c87d R09: 
5bf0500a
<4>[160340.878410] R10: 001e6b01 R11: 3a8b60eb R12: 
88041fdc3800
<4>[160340.878860] R13: 4aeb R14: 880407304780 R15: 
820b2dc0
<4>[160340.879315] FS:  () GS:88041fdc() 
knlGS:

<4>[160340.879771] CS:  0010 DS:  ES:  CR0: 80050033
<4>[160340.880036] CR2: 0062da00 CR3: 02007000 CR4: 
001406e0

<4>[160340.880483] Stack:
<4>[160340.880743]  88040728 880407304780 880407304780 

<4>[160340.881472]  0008 001e6b01 88041fdc3830 
a00b8209
<4>[160340.882197]  fa50655f  0e50f05b0002bb01 


<4>[160340.882930] Call Trace:
<4>[160340.883185]  
<4>[160340.883264]  [] nf_nat_used_tuple+0x24/0x2b 
[nf_nat]
<4>[160340.883789]  [] nf_nat_setup_info+0x2bf/0x805 
[nf_nat]
<4>[160340.884062]  [] ? 
nf_nat_bysource_hash+0xb0/0xb0 [nf_nat]
<4>[160340.884331]  [] xt_snat_target_v0+0x65/0x67 
[xt_nat]
<4>[160340.884599]  [] ipt_do_table+0x28e/0x5a2 
[ip_tables]
<4>[160340.884868]  [] ? ipt_do_table+0x586/0x5a2 
[ip_tables]
<4>[160340.885135]  [] ? iptable_nat_ipv4_fn+0x12/0x12 
[iptable_nat]
<4>[160340.890247]  [] iptable_nat_do_chain+0x1a/0x1c 
[iptable_nat]
<4>[160340.890701]  [] nf_nat_ipv4_fn+0xeb/0x177 
[nf_nat_ipv4]
<4>[160340.890970]  [] nf_nat_ipv4_out+0x35/0x37 
[nf_nat_ipv4]
<4>[160340.891239]  [] iptable_nat_ipv4_out+0x10/0x12 
[iptable_nat]

<4>[160340.891697]  [] nf_iterate+0x34/0x57
<4>[160340.891960]  [] nf_hook_slow+0x2b/0x91
<4>[160340.892224]  [] ip_output+0x99/0xb6
<4>[160340.892493]  [] ? 
ip_fragment.constprop.5+0x77/0x77

<4>[160340.892766]  [] ip_forward_finish+0x53/0x58
<4>[160340.893034]  [] ip_forward+0x32d/0x33a
<4>[160340.893296]  [] ? ip_frag_mem+0x3e/0x3e
<4>[160340.893563]  [] ip_rcv_finish+0x2e8/0x2f3
<4>[160340.893828]  [] ip_rcv+0x318/0x325
<4>[160340.894095]  [] ? 
ip_local_deliver_finish+0x109/0x109
<4>[160340.894365]  [] 
__netif_receive_skb_core+0x5cf/0x807

<4>[160340.894631]  [] ? tcp4_gro_receive+0x17b/0x17f
<4>[160340.894902]  [] ? inet_gro_receive+0x229/0x239
<4>[160340.895170]  [] __netif_receive_skb+0x13/0x55
<4>[160340.895439]  [] 
netif_receive_skb_internal+0x3b/0x7

Re: 4.9 conntrack performance issues

2017-01-30 Thread Denys Fedoryshchenko

On 2017-01-30 13:26, Guillaume Nault wrote:

On Sun, Jan 15, 2017 at 01:05:58AM +0200, Denys Fedoryshchenko wrote:

Hi!

Sorry if i added someone wrongly to CC, please let me know, if i 
should

remove.
I just run successfully 4.9 on my nat several days ago, and seems 
panic

issue disappeared.


Hi Denys,

After two weeks running Linux 4.9, do you confirm that the original
issue[1] is gone?

Regards,

Guillaume

[1]: https://www.spinics.net/lists/netdev/msg410795.html
Yes, no more reboots at all and 4.9 patched for gc issues seems 
significantly better for NAT performance (CPU load lower almost twice 
than previous kernels, i dont have exact numbers).


Re: 4.9 conntrack performance issues

2017-01-14 Thread Denys Fedoryshchenko

On 2017-01-15 02:29, Florian Westphal wrote:

Denys Fedoryshchenko  wrote:

On 2017-01-15 01:53, Florian Westphal wrote:
>Denys Fedoryshchenko  wrote:
>
>I suspect you might also have to change
>
>1011 } else if (expired_count) {
>1012 gc_work->next_gc_run /= 2U;
>1013 next_run = msecs_to_jiffies(1);
>1014 } else {
>
>line 2013 to
>next_run = msecs_to_jiffies(HZ / 2);


I think its wrong to rely on "expired_count", with these
kinds of numbers (up to 10k entries are scanned per round
in Denys setup, its basically always going to be > 0.

I think we should only decide to scan more frequently if
eviction ratio is large, say, we found more than 1/4 of
entries to be stale.

I sent a small patch offlist that does just that.


>How many total connections is the machine handling on average?
>And how many new/delete events happen per second?
1-2 million connections, at current moment 988k
I dont know if it is correct method to measure events rate:

NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l
conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown.
40027
NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l
conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown.
40951


Thanks, thats exactly what I was looking for.
So I am not at all surprised that gc_worker eats cpu cycles...

It is not peak time, so values can be 2-3 higher at peak time, but 
even

right now, it is hogging one core, leaving only 20% idle left,
while others are 80-83% idle.


I agree its a bug.


>>   |--54.65%--gc_worker
>>   |  |
>>   |   --3.58%--nf_ct_gc_expired
>>   | |
>>   | |--1.90%--nf_ct_delete
>
>I'd be interested to see how often that shows up on other cores
>(from packet path).
Other CPU's totally different:
This is top entry
99.60% 0.00%  swapper  [kernel.kallsyms][k] 
start_secondary

|
---start_secondary
   |
--99.42%--cpu_startup_entry
  |

[..]


|--36.02%--process_backlog
 | |  
|

|  |
 | |  
|

|   --35.64%--__netif_receive_skb

gc_worker didnt appeared on other core at all.
Or i am checking something wrong?


Look for "nf_ct_gc_expired" and "nf_ct_delete".
Its going to be deep down in the call graph.
I tried my best to record as much data as possible, but it doesnt show 
it in callgraph, just a little bit in statistics:


 0.01% 0.00%  swapper  [nf_conntrack][k] 
nf_ct_delete
 0.01% 0.00%  swapper  [nf_conntrack][k] 
nf_ct_gc_expired

And thats it.


Re: 4.9 conntrack performance issues

2017-01-14 Thread Denys Fedoryshchenko

On 2017-01-15 01:53, Florian Westphal wrote:

Denys Fedoryshchenko  wrote:

[ CC Nicolas since he also played with gc heuristics in the past ]

Sorry if i added someone wrongly to CC, please let me know, if i 
should

remove.
I just run successfully 4.9 on my nat several days ago, and seems 
panic
issue disappeared. But i started to face another issue, it seems 
garbage

collector is hogging one of CPU's.

It was handling load very well at 4.8 and below, it might be still 
fine, but

i suspect queues that belong to hogged cpu might experience issues.


The worker doesn't grab locks for long and calls scheduler for every
bucket to give a chance for other threads to run.

It also doesn't block softinterrupts.

Is there anything can be done to improve cpu load distribution or 
reduce

single core load?


No, I am afraid we don't export any of the heuristics as tuneables so
far.

You could try changing defaults in net/netfilter/nf_conntrack_core.c:

#define GC_MAX_BUCKETS_DIV  64u
/* upper bound of scan intervals */
#define GC_INTERVAL_MAX (2 * HZ)
/* maximum conntracks to evict per gc run */
#define GC_MAX_EVICTS   256u

(the first two result in ~2 minute worst case timeout detection
 on a fully idle system).

For instance you could use

GC_MAX_BUCKETS_DIV -> 128
GC_INTERVAL_MAX-> 30 * HZ

(This means that it takes one hour for a dead connection to be picked
 up on an idle system, but thats only relevant in case you use
 conntrack events to log when connection went down and need more 
precise

 accounting).

Not a big deal in my case.



I suspect you might also have to change

1011 } else if (expired_count) {
1012 gc_work->next_gc_run /= 2U;
1013 next_run = msecs_to_jiffies(1);
1014 } else {

line 2013 to
next_run = msecs_to_jiffies(HZ / 2);

or something like this to not have frequent rescans.

OK


The gc is also done from the packet path (i.e. accounted
towards (k)softirq).

How many total connections is the machine handling on average?
And how many new/delete events happen per second?

1-2 million connections, at current moment 988k
I dont know if it is correct method to measure events rate:

NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l
conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown.
40027
NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l
conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown.
40951

It is not peak time, so values can be 2-3 higher at peak time, but even 
right now, it is hogging one core, leaving only 20% idle left,

while others are 80-83% idle.




88.98% 0.00%  kworker/24:1  [kernel.kallsyms]   [k]
process_one_work
|
---process_one_work
   |
   |--54.65%--gc_worker
   |  |
   |   --3.58%--nf_ct_gc_expired
   | |
   | |--1.90%--nf_ct_delete


I'd be interested to see how often that shows up on other cores
(from packet path).

Other CPU's totally different:
This is top entry
99.60% 0.00%  swapper  [kernel.kallsyms][k] start_secondary
|
---start_secondary
   |
--99.42%--cpu_startup_entry
  |
   --98.04%--default_idle_call
 arch_cpu_idle
 |
 
|--48.58%--call_function_single_interrupt

 |  |
 |   
--46.36%--smp_call_function_single_interrupt
 | 
smp_trace_call_function_single_interrupt

 | |
 | 
|--44.18%--irq_exit

 | |  |
 | |  
|--43.37%--__do_softirq
 | |  |  
|
 | |  |  
 --43.18%--net_rx_action
 | |  |  
   |
 | |  |  
   |--36.02%--process_backlog
 | |  |  
   |  |
 | |  |  
   |   --35.64%--__netif_receive_skb



gc_worker didnt appeared on other core at all.
Or i am checking something wrong?






4.9 conntrack performance issues

2017-01-14 Thread Denys Fedoryshchenko

Hi!

Sorry if i added someone wrongly to CC, please let me know, if i should 
remove.
I just run successfully 4.9 on my nat several days ago, and seems panic 
issue disappeared. But i started to face another issue, it seems garbage 
collector is hogging one of CPU's.


Here is my data:
2xE5-2640 v3
396G ram
2x10G (bonding) with approx 14-15G load at peak time
It was handling load very well at 4.8 and below, it might be still fine, 
but i suspect queues that belong to hogged cpu might experience issues.
Is there anything can be done to improve cpu load distribution or reduce 
single core load?


net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 1236021
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 6553600
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 0
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 20
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 20
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 20
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 20
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 30
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 6553600


it is non-peak values, as adjustments i have shorter than default 
timeouts. Changing net.netfilter.nf_conntrack_buckets to higher value 
doesn't fix issue.

I noticed that one of CPU's hogged (N24 in this case):

Linux 4.9.2-build-0127 (NAT)01/14/17_x86_64_(32 CPU)

23:01:54 CPU%usr   %nice%sys %iowait%irq   %soft  %steal 
 %guest   %idle
23:02:04 all0.090.001.600.010.00   28.280.00 
   0.00   70.01
23:02:04   00.110.000.000.000.00   32.380.00 
   0.00   67.51
23:02:04   10.120.000.120.000.00   29.910.00 
   0.00   69.86
23:02:04   20.230.000.110.000.00   29.570.00 
   0.00   70.09
23:02:04   30.110.000.110.110.00   28.800.00 
   0.00   70.86
23:02:04   40.230.000.110.110.00   31.410.00 
   0.00   68.14
23:02:04   50.110.000.000.000.00   29.280.00 
   0.00   70.61
23:02:04   60.110.000.110.000.00   31.810.00 
   0.00   67.96
23:02:04   70.110.000.110.000.00   32.690.00 
   0.00   67.08
23:02:04   80.000.000.230.000.00   42.120.00 
   0.00   57.64
23:02:04   90.110.000.000.000.00   30.860.00 
   0.00   69.02
23:02:04  100.110.000.110.000.00   30.930.00 
   0.00   68.84
23:02:04  110.000.000.110.000.00   32.730.00 
   0.00   67.16
23:02:04  120.110.000.110.000.00   29.850.00 
   0.00   69.92
23:02:04  130.000.000.000.000.00   30.960.00 
   0.00   69.04
23:02:04  140.000.000.000.000.00   30.090.00 
   0.00   69.91
23:02:04  150.000.000.110.000.00   30.630.00 
   0.00   69.26
23:02:04  160.110.000.000.000.00   25.880.00 
   0.00   74.01
23:02:04  170.110.000.000.000.00   22.820.00 
   0.00   77.07
23:02:04  180.110.000.000.000.00   23.750.00 
   0.00   76.14
23:02:04  190.110.000.110.000.00   24.860.00 
   0.00   74.92
23:02:04  200.110.000.110.110.00   24.480.00 
   0.00   75.19
23:02:04  210.220.000.110.000.00   23.430.00 
   0.00   76.24
23:02:04  220.110.000.110.000.00   25.460.00 
   0.00   74.32
23:02:04  230.000.000.110.000.00   25.470.00 
   0.00   74.41
23:02:04  240.000.00   45.060.000.00   42.180.00 
   0.00   12.76
23:02:04  250.110.000.110.110.00   25.220.00 
   0.00   74.46
23:02:04  260.110.000.000.110.00   23.390.00 
   0.00   76.39
23:02:04  270.220.000.110.000.00   23.830.00 
   0.00   75.85
23:02:04  280.110.000.110.000.00   24.100.00 
   0.00   75.68
23:02:04  290.110.000.110.

Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2017-01-11 Thread Denys Fedoryshchenko

On 2017-01-11 19:22, Guillaume Nault wrote:

Cc: netfilter-de...@vger.kernel.org, I'm afraid I'll need some help
for this case.

On Sat, Dec 17, 2016 at 09:48:13PM +0200, Denys Fedoryshchenko wrote:

Hi,

I posted recently several netfilter related crashes, didn't got any 
answers,

one of them started to happen quite often on loaded NAT (17Gbps),
so after trying endless ways to make it stable, i found out that in
backtrace i can often see timers, and this bug probably appearing on 
older

releases,
i've seen such backtrace with timer fired for conntrack on them.
I disabled Intel turbo for cpus on this loaded NAT, and voila, panic
disappeared for 2nd day!
* by wrmsr -a 0x1a0 0x4000850089
I am not sure timers is the reason, but probably turbo creating some
condition for bug.



Re-formatting the stack-trace for easier reference:

[28904.162607] BUG: unable to handle kernel NULL pointer dereference
at 0008
[28904.163210] IP: []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.163745] PGD 0
[28904.164058] Oops: 0002 [#1] SMP
[28904.164323] Modules linked in: nf_nat_pptp nf_nat_proto_gre
xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat
xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT
xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables
netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca
[28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 
4.8.14-build-0124 #2

[28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT,
BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015
[28904.168853] task: 885fa42e8c40 task.stack: 885fa42f
[28904.169114] RIP: 0010:[] []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246
[28904.169901] RAX:  RBX: 885fbccc RCX: 
885fbccc0010
[28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 
885fbccc
[28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 
0100
[28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 
885f87a1c140
[28904.170971] R13: 0005d948 R14: 000ea942 R15: 
885f87a1c160

[28904.171237] FS: () GS:885fbccc()
knlGS:
[28904.171688] CS: 0010 DS:  ES:  CR0: 80050033
[28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 
001406e0

[28904.172231] Stack:
[28904.172482] 885f87a1c140 820a1405 885fbccc3e28
a00abb30
[28904.173182] 0002820a1405 885f87a1c140 885f99a28201

[28904.173884]  820050c8 885fbccc3e58
a00abc62
[28904.174585] Call Trace:
[28904.174835] 
[28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2
[nf_conntrack]
[28904.175613] [] nf_ct_delete+0x109/0x12c 
[nf_conntrack]
[28904.175894] [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
[28904.176169] [] death_by_timeout+0xd/0xf 
[nf_conntrack]

[28904.176443] [] call_timer_fn.isra.5+0x17/0x6b
[28904.176714] [] expire_timers+0x6f/0x7e
[28904.176975] [] run_timer_softirq+0x69/0x8b
[28904.177238] [] ? 
clockevents_program_event+0xd0/0xe8

[28904.177504] [] __do_softirq+0xbd/0x1aa
[28904.177765] [] irq_exit+0x37/0x7c
[28904.178026] [] 
smp_trace_apic_timer_interrupt+0x7b/0x88

[28904.178300] [] smp_apic_timer_interrupt+0x9/0xb
[28904.178565] [] apic_timer_interrupt+0x7c/0x90
[28904.178835] 
[28904.178907] [] ? mwait_idle+0x64/0x7a
[28904.179436] [] ? 
atomic_notifier_call_chain+0x13/0x15

[28904.179712] [] arch_cpu_idle+0xa/0xc
[28904.179976] [] default_idle_call+0x27/0x29
[28904.180244] [] cpu_startup_entry+0x11d/0x1c7
[28904.180508] [] start_secondary+0xe8/0xeb
[28904.180767] Code: 80 2f 0b 82 48 89 df e8 da 90 84 e1 48 8b 43 10
49 8d 54 24 10 48 8d 4b 10 49 89 4c 24 18 a8 01 49 89 44 24 10 48 89
53 10 75 04 <89> 50 08 c6 03 00 5b 41 5c 5d c3 48 8b 05 10 be 00 00 89
f6
[28904.185546] RIP []
nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.186065] RSP 
[28904.186319] CR2: 0008
[28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]---
[28904.186860] Kernel panic - not syncing: Fatal exception in interrupt
[28904.187155] Kernel Offset: disabled
[28904.187419] Rebooting in 5 seconds..
[28909.193662] ACPI MEMORY or I/O RESET_REG.

And here's decodecode's output:

All code

   0:   80 2f 0bsubb   $0xb,(%rdi)
   3:   82  (bad)
   4:   48 89 dfmov%rbx,%rdi
   7:   e8 da 90 84 e1  callq  0xe18490e6
   c:   48 8b 43 10 mov0x10(%rbx),%rax
  10:   49 8d 54 24 10  lea0x10(%r12),%rdx
  15:   48 8d 4b 10 lea0x10(%rbx),%rcx
  19:   49 89 4c 24 18  mov%rcx,0x18(%r12)
  1e:   a8 01   test   $0x1,%al

Re: 4.9.2 panic, __skb_flow_dissect, gro?

2017-01-10 Thread Denys Fedoryshchenko

Yes, it is in the list (ixgbe)

On 2017-01-11 02:16, Ian Kumlien wrote:

Added David Miller to CC since he said it was queued for stable, maybe
he can comment

On Wed, Jan 11, 2017 at 12:49 AM, Denys Fedoryshchenko
 wrote:
It seems this patch solve issue. I hope it will go to stable asap, 
because

without it loaded routers crashing almost instantly on 4.9.


I'm also worried that you could trigger it remotely

I suspect the following:
intel: fm10k, i40e, i40ev, igb, ixgbe, ixgbevf
mellanox: mlx4, mlx5
qlogic: qede

since skb_flow_dissect is called by eth_get_headlen in these drivers...

My machine was running with igb when it happened, is your network
driver in the list?

David: Let me know if i can help with the -stable bit in anyway, i've
been surprised to see it miss .1 and .2


commit  d0af683407a26a4437d8fa6e283ea201f2ae8146 (patch)
treee769779cf59b0b7b50a68db5d0b8897a7cb6 
/net/core/flow_dissector.c

parent  94ba998b63c41e92da1b2f0cd8679e038181ef48 (diff)
flow_dissector: Update pptp handling to avoid null pointer deref.
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to 
__skb_header_pointer

except the pptp handling which is still calling skb_header_pointer.


Re: 4.9.2 panic, __skb_flow_dissect, gro?

2017-01-10 Thread Denys Fedoryshchenko
It seems this patch solve issue. I hope it will go to stable asap, 
because without it loaded routers crashing almost instantly on 4.9.


commit  d0af683407a26a4437d8fa6e283ea201f2ae8146 (patch)
treee769779cf59b0b7b50a68db5d0b8897a7cb6 /net/core/flow_dissector.c
parent  94ba998b63c41e92da1b2f0cd8679e038181ef48 (diff)
flow_dissector: Update pptp handling to avoid null pointer deref.
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.

On 2017-01-11 01:26, Denys Fedoryshchenko wrote:

Hi,

Got panic message on 4.9.2 with latest patches from stable-queue,
probably it affects all 4.9 version

Panic message:

dmesg-erst-6374119981415661569:<6>[   23.110324] ip_set: protocol 6
dmesg-erst-6374119981415661569:<1>[   28.117455] BUG: unable to handle
kernel NULL pointer dereference at 0078
dmesg-erst-6374119981415661569:<1>[   28.118036] IP:
[] __skb_flow_dissect+0x73f/0x931
dmesg-erst-6374119981415661569:<4>[   28.118360] PGD 0
dmesg-erst-6374119981415661569:<4>[   28.118427]
dmesg-erst-6374119981415661569:<4>[   28.118730] Oops:  [#1] SMP
dmesg-erst-6374119981415661569:<4>[   28.118977] Modules linked in:
xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat
xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT
xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables
8021q garp mrp stp llc netconsole configfs bonding ixgbe dca
ipmi_watchdog ipmi_si acpi_ipmi ipmi_msghandler
dmesg-erst-6374119981415661569:<4>[   28.122784] CPU: 4 PID: 0 Comm:
swapper/4 Not tainted 4.9.2-build-0127 #3
dmesg-erst-6374119981415661569:<4>[   28.123042] Hardware name: Intel
Corporation S2600WTT/S2600WTT, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
dmesg-erst-6374119981415661569:<4>[   28.123488] task:
882fa6af24c0 task.stack: c90031338000
dmesg-erst-6374119981415661569:<4>[   28.123742] RIP:
0010:[]  []
__skb_flow_dissect+0x73f/0x931
dmesg-erst-6374119981415661569:<4>[   28.124243] RSP:
0018:882fbfb03ce8  EFLAGS: 00010206
dmesg-erst-6374119981415661569:<4>[   28.124497] RAX: 0130
RBX: 0022 RCX: 882f9eabb000
dmesg-erst-6374119981415661569:<4>[   28.124756] RDX: 0010
RSI: 882f9eabb026 RDI: 002f
dmesg-erst-6374119981415661569:<4>[   28.125015] RBP: 882fbfb03d78
R08: 000c R09: 882f9eabb022
dmesg-erst-6374119981415661569:<4>[   28.125275] R10: 0140
R11: 0001 R12: 0b88
dmesg-erst-6374119981415661569:<4>[   28.125532] R13: 882fbfb03d9c
R14:  R15: 820c11a0
dmesg-erst-6374119981415661569:<4>[   28.125792] FS:
() GS:882fbfb0()
knlGS:
dmesg-erst-6374119981415661569:<4>[   28.126227] CS:  0010 DS: 
ES:  CR0: 80050033
dmesg-erst-6374119981415661569:<4>[   28.126482] CR2: 0078
CR3: 00607f007000 CR4: 001406e0
dmesg-erst-6374119981415661569:<4>[   28.126741] Stack:
dmesg-erst-6374119981415661569:<4>[   28.126983]  882fbfb03cf8
81885afb 0001bfb03d88 818953b5
dmesg-erst-6374119981415661569:<4>[   28.127675]  882fbfb03d9c
2f08 882f9eabb000 882fbfb03d48
dmesg-erst-6374119981415661569:<4>[   28.128350]  818ef3e4
882fa4177400 004e 
dmesg-erst-6374119981415661569:<4>[   28.129027] Call Trace:
dmesg-erst-6374119981415661569:<4>[   28.129271]  
dmesg-erst-6374119981415661569:<4>[   28.129340]  []
? kfree_skb+0x25/0x27
dmesg-erst-6374119981415661569:<4>[   28.129655]  []
? __netif_receive_skb_core+0x61b/0x807
dmesg-erst-6374119981415661569:<4>[   28.129917]  []
? udp4_gro_receive+0x1f6/0x256
dmesg-erst-6374119981415661569:<4>[   28.130174]  []
eth_get_headlen+0x4c/0x82
dmesg-erst-6374119981415661569:<4>[   28.130435]  []
ixgbe_clean_rx_irq+0x546/0x924 [ixgbe]
dmesg-erst-6374119981415661569:<4>[   28.130694]  []
ixgbe_poll+0x4ef/0x679 [ixgbe]
dmesg-erst-6374119981415661569:<4>[   28.130952]  []
net_rx_action+0x107/0x27d
dmesg-erst-6374119981415661569:<4>[   28.131207]  []
__do_softirq+0xb5/0x1a3
dmesg-erst-6374119981415661569:<4>[   28.131460]  []
irq_exit+0x4d/0x8e
dmesg-erst-6374119981415661569:<4>[   28.131712]  []
do_IRQ+0xaa/0xc2
dmesg-erst-6374119981415661569:<4>[   28.131965]  []
common_interrupt+0x7c/0x7c
dmesg-erst-6374119981415661569:<4>[   28.132217]  
dmesg-erst-6374119981415661569:<4>[   28.132286]  []
? mwait_idle+0x4e/0x61
dmesg-erst-637

4.9.2 panic, __skb_flow_dissect, gro?

2017-01-10 Thread Denys Fedoryshchenko

Hi,

Got panic message on 4.9.2 with latest patches from stable-queue, 
probably it affects all 4.9 version


Panic message:

dmesg-erst-6374119981415661569:<6>[   23.110324] ip_set: protocol 6
dmesg-erst-6374119981415661569:<1>[   28.117455] BUG: unable to handle 
kernel NULL pointer dereference at 0078
dmesg-erst-6374119981415661569:<1>[   28.118036] IP: 
[] __skb_flow_dissect+0x73f/0x931

dmesg-erst-6374119981415661569:<4>[   28.118360] PGD 0
dmesg-erst-6374119981415661569:<4>[   28.118427]
dmesg-erst-6374119981415661569:<4>[   28.118730] Oops:  [#1] SMP
dmesg-erst-6374119981415661569:<4>[   28.118977] Modules linked in: 
xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat 
xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT 
xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw 
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_filter ip_tables x_tables 8021q garp mrp stp 
llc netconsole configfs bonding ixgbe dca ipmi_watchdog ipmi_si 
acpi_ipmi ipmi_msghandler
dmesg-erst-6374119981415661569:<4>[   28.122784] CPU: 4 PID: 0 Comm: 
swapper/4 Not tainted 4.9.2-build-0127 #3
dmesg-erst-6374119981415661569:<4>[   28.123042] Hardware name: Intel 
Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0019.101220160604 
10/12/2016
dmesg-erst-6374119981415661569:<4>[   28.123488] task: 882fa6af24c0 
task.stack: c90031338000
dmesg-erst-6374119981415661569:<4>[   28.123742] RIP: 
0010:[]  [] 
__skb_flow_dissect+0x73f/0x931
dmesg-erst-6374119981415661569:<4>[   28.124243] RSP: 
0018:882fbfb03ce8  EFLAGS: 00010206
dmesg-erst-6374119981415661569:<4>[   28.124497] RAX: 0130 
RBX: 0022 RCX: 882f9eabb000
dmesg-erst-6374119981415661569:<4>[   28.124756] RDX: 0010 
RSI: 882f9eabb026 RDI: 002f
dmesg-erst-6374119981415661569:<4>[   28.125015] RBP: 882fbfb03d78 
R08: 000c R09: 882f9eabb022
dmesg-erst-6374119981415661569:<4>[   28.125275] R10: 0140 
R11: 0001 R12: 0b88
dmesg-erst-6374119981415661569:<4>[   28.125532] R13: 882fbfb03d9c 
R14:  R15: 820c11a0
dmesg-erst-6374119981415661569:<4>[   28.125792] FS:  
() GS:882fbfb0() knlGS:
dmesg-erst-6374119981415661569:<4>[   28.126227] CS:  0010 DS:  ES: 
 CR0: 80050033
dmesg-erst-6374119981415661569:<4>[   28.126482] CR2: 0078 
CR3: 00607f007000 CR4: 001406e0

dmesg-erst-6374119981415661569:<4>[   28.126741] Stack:
dmesg-erst-6374119981415661569:<4>[   28.126983]  882fbfb03cf8 
81885afb 0001bfb03d88 818953b5
dmesg-erst-6374119981415661569:<4>[   28.127675]  882fbfb03d9c 
2f08 882f9eabb000 882fbfb03d48
dmesg-erst-6374119981415661569:<4>[   28.128350]  818ef3e4 
882fa4177400 004e 

dmesg-erst-6374119981415661569:<4>[   28.129027] Call Trace:
dmesg-erst-6374119981415661569:<4>[   28.129271]  
dmesg-erst-6374119981415661569:<4>[   28.129340]  [] ? 
kfree_skb+0x25/0x27
dmesg-erst-6374119981415661569:<4>[   28.129655]  [] ? 
__netif_receive_skb_core+0x61b/0x807
dmesg-erst-6374119981415661569:<4>[   28.129917]  [] ? 
udp4_gro_receive+0x1f6/0x256
dmesg-erst-6374119981415661569:<4>[   28.130174]  [] 
eth_get_headlen+0x4c/0x82
dmesg-erst-6374119981415661569:<4>[   28.130435]  [] 
ixgbe_clean_rx_irq+0x546/0x924 [ixgbe]
dmesg-erst-6374119981415661569:<4>[   28.130694]  [] 
ixgbe_poll+0x4ef/0x679 [ixgbe]
dmesg-erst-6374119981415661569:<4>[   28.130952]  [] 
net_rx_action+0x107/0x27d
dmesg-erst-6374119981415661569:<4>[   28.131207]  [] 
__do_softirq+0xb5/0x1a3
dmesg-erst-6374119981415661569:<4>[   28.131460]  [] 
irq_exit+0x4d/0x8e
dmesg-erst-6374119981415661569:<4>[   28.131712]  [] 
do_IRQ+0xaa/0xc2
dmesg-erst-6374119981415661569:<4>[   28.131965]  [] 
common_interrupt+0x7c/0x7c

dmesg-erst-6374119981415661569:<4>[   28.132217]  
dmesg-erst-6374119981415661569:<4>[   28.132286]  [] ? 
mwait_idle+0x4e/0x61
dmesg-erst-6374119981415661569:<4>[   28.132773]  [] 
arch_cpu_idle+0xa/0xc
dmesg-erst-6374119981415661569:<4>[   28.133026]  [] 
default_idle_call+0x20/0x22
dmesg-erst-6374119981415661569:<4>[   28.133282]  [] 
cpu_startup_entry+0xde/0x185
dmesg-erst-6374119981415661569:<4>[   28.133539]  [] 
start_secondary+0xe8/0xeb
dmesg-erst-6374119981415661569:<4>[   28.133792] Code: f7 e8 eb 63 ff ff 
85 c0 0f 88 d5 01 00 00 44 8b 45 80 48 8d 75 b0 66 44 8b 66 0c 41 83 c0 
0e e9 87 00 00 00 41 8d 50 04 66 85 c0 <41> 8b 46 78 44 0f 48 c2 41 2b 
46 7c 42 8d 34 03 29 f0 83 f8 03
dmesg-erst-6374119981415661569:<1>[   28.138401] RIP  
[] __skb_flow_dissect+0x73f/0x931

dmesg-erst-6374119981415661569:<4>[   28.138718]  RSP 
dmesg-erst-6374119981415661569:<4>[   28.138964] CR2: 0078
dmesg-erst-6374119981415661569:<4>[   28.139215] ---[ end trace 
46fb1cf5af272

probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2016-12-17 Thread Denys Fedoryshchenko

Hi,

I posted recently several netfilter related crashes, didn't got any 
answers, one of them started to happen quite often on loaded NAT 
(17Gbps),
so after trying endless ways to make it stable, i found out that in 
backtrace i can often see timers, and this bug probably appearing on 
older releases,

i've seen such backtrace with timer fired for conntrack on them.
I disabled Intel turbo for cpus on this loaded NAT, and voila, panic 
disappeared for 2nd day!

* by wrmsr -a 0x1a0 0x4000850089
I am not sure timers is the reason, but probably turbo creating some 
condition for bug.




Here is examples of backtrace of last reboots (kernel 4.8.14), and same 
kernel worked perfectly without turbo.
Last one also one crash on 4.8.0 that looks painfully similar, on 
totally different workload, but with conntrack enabled. It happens there 
much less often,

so harder to crash and test by disabling turbo.

[28904.162607] BUG: unable to handle kernel
NULL pointer dereference
at 0008
[28904.163210] IP:
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.163745] PGD 0

[28904.164058] Oops: 0002 [#1] SMP
[28904.164323] Modules linked in:
nf_nat_pptp
nf_nat_proto_gre
xt_TCPMSS
xt_connmark
ipt_MASQUERADE
nf_nat_masquerade_ipv4
xt_nat
xt_rateest
xt_RATEEST
nf_conntrack_pptp
nf_conntrack_proto_gre
xt_CT
xt_set
xt_hl
xt_tcpudp
ip_set_hash_net
ip_set
nfnetlink
iptable_raw
iptable_mangle
iptable_nat
nf_conntrack_ipv4
nf_defrag_ipv4
nf_nat_ipv4
nf_nat
nf_conntrack
iptable_filter
ip_tables
x_tables
netconsole
configfs
8021q
garp
mrp
stp
llc
bonding
ixgbe
dca

[28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 
4.8.14-build-0124 #2
[28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.1008.031920151331 03/19/2015

[28904.168853] task: 885fa42e8c40 task.stack: 885fa42f
[28904.169114] RIP: 0010:[]
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246
[28904.169901] RAX:  RBX: 885fbccc RCX: 
885fbccc0010
[28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 
885fbccc
[28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 
0100
[28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 
885f87a1c140
[28904.170971] R13: 0005d948 R14: 000ea942 R15: 
885f87a1c160
[28904.171237] FS: () GS:885fbccc() 
knlGS:

[28904.171688] CS: 0010 DS:  ES:  CR0: 80050033
[28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 
001406e0

[28904.172231] Stack:
[28904.172482] 885f87a1c140
820a1405
885fbccc3e28
a00abb30

[28904.173182] 0002820a1405
885f87a1c140
885f99a28201


[28904.173884] 
820050c8
885fbccc3e58
a00abc62

[28904.174585] Call Trace:
[28904.174835] 

[28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2 
[nf_conntrack]
[28904.175613] [] nf_ct_delete+0x109/0x12c 
[nf_conntrack]
[28904.175894] [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
[28904.176169] [] death_by_timeout+0xd/0xf 
[nf_conntrack]

[28904.176443] [] call_timer_fn.isra.5+0x17/0x6b
[28904.176714] [] expire_timers+0x6f/0x7e
[28904.176975] [] run_timer_softirq+0x69/0x8b
[28904.177238] [] ? 
clockevents_program_event+0xd0/0xe8

[28904.177504] [] __do_softirq+0xbd/0x1aa
[28904.177765] [] irq_exit+0x37/0x7c
[28904.178026] [] 
smp_trace_apic_timer_interrupt+0x7b/0x88

[28904.178300] [] smp_apic_timer_interrupt+0x9/0xb
[28904.178565] [] apic_timer_interrupt+0x7c/0x90
[28904.178835] 

[28904.178907] [] ? mwait_idle+0x64/0x7a
[28904.179436] [] ? 
atomic_notifier_call_chain+0x13/0x15

[28904.179712] [] arch_cpu_idle+0xa/0xc
[28904.179976] [] default_idle_call+0x27/0x29
[28904.180244] [] cpu_startup_entry+0x11d/0x1c7
[28904.180508] [] start_secondary+0xe8/0xeb
[28904.180767] Code:
80
2f
0b
82
48
89
df
e8
da
90
84
e1
48
8b
43
10
49
8d
54
24
10
48
8d
4b
10
49
89
4c
24
18
a8
01
49
89
44
24
10
48
89
53
10
75
04

89
50
08
c6
03
00
5b
41
5c
5d
c3
48
8b
05
10
be
00
00
89
f6

[28904.185546] RIP
[] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack]
[28904.186065] RSP 
[28904.186319] CR2: 0008
[28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]---
[28904.186860] Kernel panic - not syncing: Fatal exception in interrupt
[28904.187155] Kernel Offset: disabled
[28904.187419] Rebooting in 5 seconds..

[28909.193662] ACPI MEMORY or I/O RESET_REG.



[14125.227611] BUG: unable to handle kernel
NULL pointer dereference
at (null)
[14125.228215] IP:
[] nf_nat_setup_info+0x6d8/0x755 [nf_nat]
[14125.228564] PGD 0

[14125.228882] Oops:  [#1] SMP
[14125.229146] Modules linked in:
nf_nat_pptp
nf_nat_proto_gre
xt_TCPMSS
xt_connmark
ipt_MASQUERADE
nf_nat_masquerade_ipv4
xt_nat
xt_rateest
xt_RATEEST
nf_conntrack_pptp
nf_conntrack_proto_gre
xt_CT
xt_set
xt_hl
xt_tcpudp
ip_set_hash_net
ip_set
nfnetlink
iptable_raw
ipt

Kernel panic in netfilter 4.8.10 probably on conntrack -L

2016-12-05 Thread Denys Fedoryshchenko

Hi!

I have quite loaded NAT server (approx 17Gbps of traffic) where periodic 
"conntrack -L" might trigger once per day kernel panic.
I am not definitely sure it is triggered exactly at running tool, or 
just by enabling events.

Here is panic message:

 [221287.380762] general protection fault:  [#1] SMP
 [221287.381029] Modules linked in:
 xt_rateest
 xt_RATEEST
 nf_conntrack_netlink
 netconsole
 configfs
 tun
 nf_nat_pptp
 nf_nat_proto_gre
 xt_TCPMSS
 xt_connmark
 ipt_MASQUERADE
 nf_nat_masquerade_ipv4
 xt_nat
 nf_conntrack_pptp
 nf_conntrack_proto_gre
 xt_CT
 xt_set
 xt_hl
 xt_tcpudp
 ip_set_hash_net
 ip_set
 nfnetlink
 iptable_raw
 iptable_mangle
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_filter
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 bonding
 ixgbe
 dca

 [221287.384913] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.8.10-build-0121 #10
 [221287.385184] Hardware name: Intel Corporation 
S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015
 [221287.385634] task: 8200b4c0 task.stack: 
8200

 [221287.385900] RIP: 0010:[]
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.386428] RSP: 0018:882fbf603df8  EFLAGS: 00010202
 [221287.386693] RAX:  RBX: 882f96a51da8 RCX: 

 [221287.387134] RDX:  RSI: 882fbf603e00 RDI: 
0004
 [221287.387575] RBP: 882fbf603e38 R08: ff81822024ff R09: 
0004
 [221287.388011] R10: 882fbf603de0 R11: 820050c0 R12: 
882f810bf0c0
 [221287.388445] R13:  R14:  R15: 
0004
 [221287.388877] FS:  () 
GS:882fbf60() knlGS:

 [221287.389311] CS:  0010 DS:  ES:  CR0: 80050033
 [221287.389567] CR2: 7faff0bd8978 CR3: 02006000 CR4: 
001406f0

 [221287.389998] Stack:
 [221287.390238]  00049f292300
 882f810bf0c0
 
 882f810bf0c0

 [221287.390913]  882f96a51d80
 
 
 820050c8

 [221287.391587]  882fbf603e68
 a0098bd3
 8100
 a0098c85

 [221287.392262] Call Trace:
 [221287.392508]  

 [221287.392579]  [] nf_ct_delete+0x7a/0x12c 
[nf_conntrack]
 [221287.393082]  [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
 [221287.393351]  [] death_by_timeout+0xd/0xf 
[nf_conntrack]
 [221287.393617]  [] 
call_timer_fn.isra.5+0x17/0x6b

 [221287.393881]  [] expire_timers+0x6f/0x7e
 [221287.394134]  [] run_timer_softirq+0x69/0x8b
 [221287.394390]  [] __do_softirq+0xbd/0x1aa
 [221287.394643]  [] irq_exit+0x37/0x7c
 [221287.394898]  [] 
smp_trace_call_function_single_interrupt+0x2e/0x30
 [221287.395341]  [] 
smp_call_function_single_interrupt+0x9/0xb
 [221287.395600]  [] 
call_function_single_interrupt+0x7c/0x90

 [221287.395857]  

 [221287.395926]  [] ? mwait_idle+0x64/0x7a
 [221287.396413]  [] arch_cpu_idle+0xa/0xc
 [221287.396665]  [] default_idle_call+0x27/0x29
 [221287.396919]  [] 
cpu_startup_entry+0x11d/0x1c7

 [221287.397175]  [] rest_init+0x72/0x74
 [221287.397428]  [] start_kernel+0x3ba/0x3c7
 [221287.397681]  [] 
x86_64_start_reservations+0x2a/0x2c
 [221287.397937]  [] 
x86_64_start_kernel+0x12a/0x135

 [221287.402124] Code:
 f2
 89
 75
 d0
 75
 04
 4c
 8b
 73
 08
 0f
 b7
 73
 10
 41
 89
 ff
 4d
 89
 f1
 4d
 09
 f9
 31
 c0
 49
 85
 f1
 74
 67
 41
 89
 d5
 89
 7d
 c4
 48
 8d
 75
 c8
 44
 09
 f7

 ff
 10
 89
 c2
 c1
 ea
 1f
 75
 05
 4d
 85
 f6
 74
 4b
 49
 83
 c4
 04
 89
 45

 [221287.406724] RIP
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.407234]  RSP 
 [221287.407489] ---[ end trace 4b077b9412fc7065 ]---
 [221287.407746] Kernel panic - not syncing: Fatal exception in 
interrupt

 [221287.408013] Kernel Offset: disabled
 [221287.408270] Rebooting in 5 seconds..
Dec  5 23:17:58 10.0.253.34
Dec  5 23:17:58 10.0.253.34 [221292.408645] ACPI MEMORY or I/O 
RESET_REG.


Re: SNAT --random & fully is not actually random for ips

2016-11-28 Thread Denys Fedoryshchenko

On 2016-11-28 13:29, Pablo Neira Ayuso wrote:

On Mon, Nov 28, 2016 at 01:12:07PM +0200, Denys Fedoryshchenko wrote:

On 2016-11-28 13:06, Pablo Neira Ayuso wrote:
>Why does your patch reverts NF_NAT_RANGE_PROTO_RANDOM_FULLY?

Ops, sorry i just did mistake with files, actually it is in reverse ( 
did

this patch, and it worked properly with it, with random source ip).


Oh, I see 8)


--- nf_nat_core.c   2016-11-21 09:11:59.0 +
+++ nf_nat_core.c.new   2016-11-28 09:55:54.0 +
@@ -282,9 +282,13 @@
 * client coming from the same IP (some Internet Banking sites
 * like this), even across reboots.
 */
-	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),

+   if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
+   j = prandom_u32();
+   } else {
+	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),

   range->flags & NF_NAT_RANGE_PERSISTENT ?
0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id);
+   }

full_range = false;
for (i = 0; i <= max; i++) {

This is current situation, RANDOM_FULLY actually does prandom_u32 for 
source

port only, but not for IP.
IP kept as persistent and kind of predictable, because hash function 
based

on source ip.

Sure i did tried to specify any combination of flags, but looking to
"find_best_ips_proto" function, it wont have any effect.


IIRC the original intention on random-fully was to cover only ports.
Did you interpret from git history otherwise? Otherwise, safe
procedure is to add a new flag.

No, seems i didnt read man page well, sorry.
I will check it, maybe will try to add new option and submit a patch, 
still studying impact on "balancing" with this change, seems it works 
great.
But not really sure such thing needed for someone else, actually some 
might have privacy concerns as well, and can use such option for 
privacy.


Re: SNAT --random & fully is not actually random for ips

2016-11-28 Thread Denys Fedoryshchenko

On 2016-11-28 13:06, Pablo Neira Ayuso wrote:

On Mon, Nov 28, 2016 at 12:45:59PM +0200, Denys Fedoryshchenko wrote:

Hello,

I noticed that if i specify -j SNAT with options --random 
--random-fully

still it keeps persistence for source IP.


So you specify both?

Actually truly random src ip required in some scenarios like links 
balanced

by IPs, but seems since 2012 at least it is not possible.

But actually if i do something like:
--- nf_nat_core.c.new   2016-11-28 09:55:54.0 +
+++ nf_nat_core.c   2016-11-21 09:11:59.0 +
@@ -282,13 +282,9 @@
 * client coming from the same IP (some Internet Banking sites
 * like this), even across reboots.
 */
-   if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
-   j = prandom_u32();
-   } else {
-	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),
+	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),

   range->flags & NF_NAT_RANGE_PERSISTENT ?
0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id);
-   }

full_range = false;
for (i = 0; i <= max; i++) {

It works as intended. But i guess to not break compatibility it is 
better

should be introduced as new option?
Or maybe there is no really need for such option?


Why does your patch reverts NF_NAT_RANGE_PROTO_RANDOM_FULLY?
Ops, sorry i just did mistake with files, actually it is in reverse ( 
did this patch, and it worked properly with it, with random source ip).

--- nf_nat_core.c   2016-11-21 09:11:59.0 +
+++ nf_nat_core.c.new   2016-11-28 09:55:54.0 +
@@ -282,9 +282,13 @@
 * client coming from the same IP (some Internet Banking sites
 * like this), even across reboots.
 */
-   j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32),
+   if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
+   j = prandom_u32();
+   } else {
+	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),

   range->flags & NF_NAT_RANGE_PERSISTENT ?
0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id);
+   }

full_range = false;
for (i = 0; i <= max; i++) {

This is current situation, RANDOM_FULLY actually does prandom_u32 for 
source port only, but not for IP.
IP kept as persistent and kind of predictable, because hash function 
based on source ip.


Sure i did tried to specify any combination of flags, but looking to 
"find_best_ips_proto" function, it wont have any effect.


SNAT --random & fully is not actually random for ips

2016-11-28 Thread Denys Fedoryshchenko

Hello,

I noticed that if i specify -j SNAT with options --random --random-fully 
still it keeps persistence for source IP.
Actually truly random src ip required in some scenarios like links 
balanced by IPs, but seems since 2012 at least it is not possible.


But actually if i do something like:
--- nf_nat_core.c.new   2016-11-28 09:55:54.0 +
+++ nf_nat_core.c   2016-11-21 09:11:59.0 +
@@ -282,13 +282,9 @@
 * client coming from the same IP (some Internet Banking sites
 * like this), even across reboots.
 */
-   if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
-   j = prandom_u32();
-   } else {
-	j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / 
sizeof(u32),

+   j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32),
   range->flags & NF_NAT_RANGE_PERSISTENT ?
0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id);
-   }

full_range = false;
for (i = 0; i <= max; i++) {

It works as intended. But i guess to not break compatibility it is 
better should be introduced as new option?

Or maybe there is no really need for such option?


Re: kernel panic TPROXY , vanilla 4.7.1

2016-08-17 Thread Denys Fedoryshchenko

On 2016-08-17 19:04, Eric Dumazet wrote:

On Wed, 2016-08-17 at 08:42 -0700, Eric Dumazet wrote:

On Wed, 2016-08-17 at 17:31 +0300, Denys Fedoryshchenko wrote:
> Hi!
>
> Tried to run squid on latest kernel, and hit a panic
> Sometimes it just shows warning in dmesg (but doesnt work properly)
> [   75.701666] IPv4: Attempt to release TCP socket in state 10
> 88102d430780
> [   83.866974] squid (2700) used greatest stack depth: 12912 bytes left
> [   87.506644] IPv4: Attempt to release TCP socket in state 10
> 880078a48780
> [  114.704295] IPv4: Attempt to release TCP socket in state 10
> 881029f8ad00
>
> I cannot catch yet oops/panic message, netconsole not working.
>
> After triggering warning message 3 times, i am unable to run squid
> anymore (without reboot), and in netstat it doesnt show port running.
>
> firewall is:
> *mangle
> -A PREROUTING -p tcp -m socket -j DIVERT
> -A PREROUTING -p tcp -m tcp --dport 80 -i eno1 -j TPROXY --on-port 3129
> --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
> -A DIVERT -j MARK --set-xmark 0x1/0x
> -A DIVERT -j ACCEPT
>
> routing
> ip rule add fwmark 1 lookup 100
> ip route add local default dev eno1 table 100
>
>
> squid config is default with tproxy option
> http_port 3129 tproxy
>

Hmppff... sorry for this, I will send a fix.

Thanks for the report !




Could you try the following ?

Thanks !

 net/netfilter/xt_TPROXY.c |4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 7f4414d26a66..663c4c3c9072 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -127,6 +127,8 @@ nf_tproxy_get_sock_v4(struct net *net, struct
sk_buff *skb, void *hp,
daddr, dport,
in->ifindex);

+   if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+   sk = NULL;
/* NOTE: we return listeners even if bound to
 * 0.0.0.0, those are filtered out in
 * xt_socket, since xt_TPROXY needs 0 bound
@@ -195,6 +197,8 @@ nf_tproxy_get_sock_v6(struct net *net, struct
sk_buff *skb, int thoff, void *hp,
   daddr, ntohs(dport),
   in->ifindex);

+   if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+   sk = NULL;
/* NOTE: we return listeners even if bound to
 * 0.0.0.0, those are filtered out in
 * xt_socket, since xt_TPROXY needs 0 bound

Yes, everything fine after patch!
Thanks a lot


kernel panic TPROXY , vanilla 4.7.1

2016-08-17 Thread Denys Fedoryshchenko

Hi!

Tried to run squid on latest kernel, and hit a panic
Sometimes it just shows warning in dmesg (but doesnt work properly)
[   75.701666] IPv4: Attempt to release TCP socket in state 10 
88102d430780

[   83.866974] squid (2700) used greatest stack depth: 12912 bytes left
[   87.506644] IPv4: Attempt to release TCP socket in state 10 
880078a48780
[  114.704295] IPv4: Attempt to release TCP socket in state 10 
881029f8ad00


I cannot catch yet oops/panic message, netconsole not working.

After triggering warning message 3 times, i am unable to run squid 
anymore (without reboot), and in netstat it doesnt show port running.


firewall is:
*mangle
-A PREROUTING -p tcp -m socket -j DIVERT
-A PREROUTING -p tcp -m tcp --dport 80 -i eno1 -j TPROXY --on-port 3129 
--on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

-A DIVERT -j MARK --set-xmark 0x1/0x
-A DIVERT -j ACCEPT

routing
ip rule add fwmark 1 lookup 100
ip route add local default dev eno1 table 100


squid config is default with tproxy option
http_port 3129 tproxy



Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-08-17 Thread Denys Fedoryshchenko

On 2016-08-09 00:05, Guillaume Nault wrote:

On Mon, Aug 08, 2016 at 02:25:00PM +0300, Denys Fedoryshchenko wrote:

On 2016-08-01 23:59, Guillaume Nault wrote:
> Do you still have the vmlinux file with debug symbols that generated
> this panic?
Sorry for delay, i didn't had same image on all servers and probably i 
found

cause of panic, but still testing on several servers.
If i remove SFQ qdisc from ppp shapers, servers not rebooting anymore.


Thanks for the feedback. I wonder which interactions between SFQ and
PPP can lead to this problem. I'll take a look.


But still i need around 2 days to make sure that's the reason.


Okay, just let me know if you can confirm that removing SFQ really
solves the problem.
After long testing, i can confirm removing sfq from rules decreased 
panic reboot greatly, tested on many different servers.
I will try today to do some stress tests, to apply on live system at 
night sfq qdiscs, then remove them.

Then i will try also to disconnect all users with sfq qdiscs attached.
Not sure it will help to reproduce the bug, but worth to try.

Still i am hitting once per week some different conntrack bug, sand 
thats why i was confused, i was getting clearly panics in conntrack and 
then something else, i was not sure if it is different bugs, hardware 
glitch or something else.


Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-08-08 Thread Denys Fedoryshchenko

On 2016-08-01 23:59, Guillaume Nault wrote:

Do you still have the vmlinux file with debug symbols that generated
this panic?
Sorry for delay, i didn't had same image on all servers and probably i 
found cause of panic, but still testing on several servers.

If i remove SFQ qdisc from ppp shapers, servers not rebooting anymore.
But still i need around 2 days to make sure that's the reason.


Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-08-01 Thread Denys Fedoryshchenko

On 2016-08-01 23:59, Guillaume Nault wrote:

On Thu, Jul 28, 2016 at 02:28:23PM +0300, Denys Fedoryshchenko wrote:

 [ 5449.904989] CPU: 1 PID: 6359 Comm: ip Not tainted
4.7.0-build-0109 #2
 [ 5449.905255] Hardware name: Supermicro
X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0 04/24/2015
 [ 5449.905712] task: 8803eef4 ti: 8803fd754000
task.ti: 8803fd754000
 [ 5449.906168] RIP: 0010:[]
 [] inet_fill_ifaddr+0x5a/0x264
 [ 5449.906710] RSP: 0018:8803fd757b98  EFLAGS: 00010286
 [ 5449.906976] RAX: 8803ef65cb90 RBX: 8803f7d2cd00 
RCX:


 [ 5449.907248] RDX: 00080002 RSI: 8803ef65cb90 
RDI:

8803ef65cba8
 [ 5449.907519] RBP: 8803fd757be0 R08: 0008 
R09:

0002
 [ 5449.907792] R10: ffa005040269f480 R11: 820a1c00 
R12:

ffa005040269f480
 [ 5449.908067] R13: 8803ef65cb90 R14:  
R15:

8803f7d2cd00
 [ 5449.908339] FS:  7f660674d700()
GS:88041fc4() knlGS:
 [ 5449.908796] CS:  0010 DS:  ES:  CR0: 
80050033
 [ 5449.909067] CR2: 008b9018 CR3: 0003f2a11000 
CR4:

001406e0
 [ 5449.909339] Stack:
 [ 5449.909598]  0163a8c0869711ac
 0080
 
 0003e1d50003e1d5

 [ 5449.910329]  8800d54c0ac8
 8803f0d9
 0005
 

 [ 5449.911066]  8803f7d2cd00
 8803fd757c40
 818a9f73
 820a1c00

 [ 5449.911803] Call Trace:
 [ 5449.912061]  [] 
inet_dump_ifaddr+0xfb/0x185
 [ 5449.912332]  [] 
rtnl_dump_all+0xa9/0xc2
 [ 5449.912601]  [] 
netlink_dump+0xf0/0x25c
 [ 5449.912873]  [] 
netlink_recvmsg+0x1a9/0x2d3

 [ 5449.913142]  [] sock_recvmsg+0x14/0x16
 [ 5449.913407]  [] 
___sys_recvmsg+0xea/0x1a1

 [ 5449.913675]  [] ?
alloc_pages_vma+0x167/0x1a0
 [ 5449.913945]  [] ?
page_add_new_anon_rmap+0xb4/0xbd
 [ 5449.914212]  [] ?
lru_cache_add_active_or_unevictable+0x31/0x9d
 [ 5449.914664]  [] ?
handle_mm_fault+0x632/0x112d
 [ 5449.914940]  [] ? 
vma_merge+0x27e/0x2b1
 [ 5449.915208]  [] 
__sys_recvmsg+0x3d/0x5e
 [ 5449.915478]  [] ? 
__sys_recvmsg+0x3d/0x5e

 [ 5449.915747]  [] SyS_recvmsg+0xd/0x17
 [ 5449.916017]  []
entry_SYSCALL_64_fastpath+0x17/0x93


Do you still have the vmlinux file with debug symbols that generated
this panic?


I have slightly different build now (tried to enable slightly different 
kernel options), but i had also new panic in inet_fill_ifaddr in new 
build. I will prepare tomorrow(everything at office) all files and 
provide link with sources and vmlinux, and sure new panic message on 
this build.

New panic message happened on completely different location and ISP.


Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-07-28 Thread Denys Fedoryshchenko

On 2016-07-28 14:09, Guillaume Nault wrote:

On Tue, Jul 12, 2016 at 10:31:18AM -0700, Cong Wang wrote:

On Mon, Jul 11, 2016 at 12:45 PM,   wrote:
> Hi
>
> On latest kernel i noticed kernel panic happening 1-2 times per day. It is
> also happening on older kernel (at least 4.5.3).
>
...
>  [42916.426463] Call Trace:
>  [42916.426658]  
>
>  [42916.426719]  [] skb_push+0x36/0x37
>  [42916.427111]  [] ppp_start_xmit+0x10f/0x150
> [ppp_generic]
>  [42916.427314]  [] dev_hard_start_xmit+0x25a/0x2d3
>  [42916.427516]  [] ?
> validate_xmit_skb.isra.107.part.108+0x11d/0x238
>  [42916.427858]  [] sch_direct_xmit+0x89/0x1b5
>  [42916.428060]  [] __qdisc_run+0x133/0x170
>  [42916.428261]  [] net_tx_action+0xe3/0x148
>  [42916.428462]  [] __do_softirq+0xb9/0x1a9
>  [42916.428663]  [] irq_exit+0x37/0x7c
>  [42916.428862]  [] smp_apic_timer_interrupt+0x3d/0x48
>  [42916.429063]  [] apic_timer_interrupt+0x7c/0x90

Interesting, we call a skb_cow_head() before skb_push() in 
ppp_start_xmit(),

I have no idea why this could happen.


The skb is corrupted: head is at 8800b0bf2800 while data is at
ffa00500b0bf284c.

Figuring out how this corruption happened is going to be hard without a
way to reproduce the problem.

Denys, can you confirm you're using a vanilla kernel?
Also I guess the ppp devices and tc settings are handled by accel-ppp.
If so, can you share more info about your setup (accel-ppp.conf, radius
attributes, iptables...) so that I can try to reproduce it on my
machines?


I have slight modification from vanilla:

--- linux/net/sched/sch_htb.c   2016-06-08 01:23:53.0 +
+++ linux-new/net/sched/sch_htb.c   2016-06-21 14:03:08.398486593 +
@@ -1495,10 +1495,10 @@
cl->common.classid);
cl->quantum = 1000;
}
-   if (!hopt->quantum && cl->quantum > 20) {
+   if (!hopt->quantum && cl->quantum > 200) {
pr_warn("HTB: quantum of class %X is big. Consider r2q 
change.\n",
cl->common.classid);
-   cl->quantum = 20;
+   cl->quantum = 200;
}
if (hopt->quantum)
cl->quantum = hopt->quantum;

But i guess it should not be reason of crash (it is related to another 
system,  without it i was unable to shape over 7Gbps, maybe with latest 
kernel i will not need this patch).


I'm trying to make reproducible conditions of crash, because right now 
it happens only on some servers in large networks (completely different 
ISPs, so i excluded possible hardware fault of specific server). It is 
complex config, i have accel-ppp, plus my own "shaping daemon" that 
apply several shapers on ppp interfaces. Wost thing it happens only on 
live customers, i am unable to reproduce same on stress tests. Also 
until recent kernel i was getting different panic messages (but all 
related to ppp).


I think also at least one reason of crash also was fixed by "ppp: defer 
netns reference release for ppp channel" in 4.7.0 (maybe thats why i am 
getting less crashes recently).
I tried also various kernel debug options that doesn't cause major 
performance degradation (locks checking, freed memory poisoning and 
etc), without any luck yet. Is it useful if i will post panics that at 
least occurs twice? (I will post below example, got recently)
Sure if i will be able to reproducible conditions i will send them 
immediately.



 [ 5449.900988] general protection fault:  [#1] SMP
 [ 5449.901263] Modules linked in:
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 pppoe
 pppox
 ppp_generic
 slhc
 netconsole
 configfs
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 ixgbe
 dca

 [ 5449.904989] CPU: 1 PID: 6359 Comm: ip Not tainted 
4.7.0-build-0109 #2
 [ 5449.905255] Hardware name: Supermicro 
X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0 04/24/2015
 [ 5449.905712] task: 8803eef4 ti: 8803fd754000 
task.ti: 8803fd754000

 [ 5449.906168] RIP: 0010:[]
 [] inet_fill_ifaddr+0x5a/0x264
 [ 5449.906710] RSP: 0018:8803fd757b98  EFLAGS: 00010286
 [ 5449.906976] RAX: 8803ef65cb90 RBX: 8803f7d2cd00 
RCX: 
 [ 5449.907248] RDX: 00080002 RSI: 8803ef65cb90 
RDI: 8803ef65cba8
 [ 5449.907519] RBP: 8803fd757be0 R08: 0008 
R09: 0002
 [ 5449.907792] R10: ffa005040269f480 R11: 820a1c00 
R12: ffa005040269f480
 [ 5449.908067] R13: 8803ef65cb90 R14:  
R15: 8803f7d2cd00
 [ 5449.908339] FS:  7f660674d700() 
GS:88041fc4() knlGS:
 [ 5449.908796] CS:  0010 DS:  ES:  CR0: 
80050033
 [ 5449.909067] CR2: 008b9018 CR3: 0003f2a11000 
CR4: 0

Re: kernel panic, __neigh_notify, 4.7.0-rc7, Workqueue: events_power_efficient neigh_periodic_work

2016-07-24 Thread Denys Fedoryshchenko

On 2016-07-24 21:40, nuclear...@nuclearcat.com wrote:

Different hardware, but same workload. Seems different bug, happened
at least twice on this unit (both kernel panic messages here)
As additional sidenote, that might be useful (found in commits, that 
proxy arp might induce this bug, such as in commit "net/neighbour: fix 
crash at dumping device-agnostic proxy entries"): it is pppoe server 
with proxy_arp running on it


Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-13 Thread Denys Fedoryshchenko
I can confirm, after patch this issue never appeared again. So maybe 
good to push it to stable and etc :) Thanks a lot Eric, you saved me 
again.



Still i have some weird panic issues, maybe related to conntrack, but 
they are rare even on high load, so i am slowly gathering data, and i 
found at least one more person with similar conntrack crashes on latest 
kernels.



On 2015-11-04 06:46, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:

On 2015-11-04 00:06, Cong Wang wrote:
> On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
>  wrote:
>> Hi!
>>
>> Actually seems i was getting this panic for a while (once per week) on
>> loaded pppoe server, but just now was able to get full panic message.
>> After checking commit logs on sch_fq.c i didnt seen any fixes, so
>> probably
>> upgrading to newer kernel wont help?
>
>
> Can you share your `tc qdisc show dev ` with us? And how to
> reproduce
> it? I tried to setup htb+fq and then flip the interface back and forth
> but I don't
> see any crash.
My guess it wont be easy to reproduce, it is happening on box with 
4.5k

interfaces, that constantly create/delete interfaces,
and even with that this problem may happen once per day, or may not
happen for 1 week.

Here is script that is being fired after new ppp interface detected. 
But

pppoe process are independent from
process that are "establishing" shapers.



It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.

diff --git a/net/core/dev.c b/net/core/dev.c
index 8ce3f74cd6b9..bf136103bc7b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
spin_lock(&q->busylock);

spin_lock(root_lock);
+   if (unlikely(q != rcu_dereference_bh(txq->qdisc))) {
+   pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", 
q->state);
+   kfree_skb(skb);
+   rc = NET_XMIT_DROP;
+   goto end;
+   }
if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
kfree_skb(skb);
rc = NET_XMIT_DROP;
@@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
__qdisc_run(q);
}
}
+end:
spin_unlock(root_lock);
if (unlikely(contended))
spin_unlock(&q->busylock);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.3.0, neighbour: arp_cache: neighbor table overflow! and panic

2015-11-06 Thread Denys Fedoryshchenko

Hi

I have several pppoe servers running under older kernels, and upgraded 
two of them to 4.3.0
After that, one of them randomly rebooting and stacktrace always 
different. Also i noticed message appearing, that didnt exist before on 
older kernels, appearing on both now:

"neighbour: arp_cache: neighbor table overflow!"

At ip neigh i didnt noticed anything suspicious, there is less than 10 
entries, but there is quite a lot arp requests on eth0 (irrelevant to 
this host), that may cause some issues.


Here is panic messages caught over netconsole:

 [151784.835507] general protection fault:  [#1]
 SMP

 [151784.836049] Modules linked in:
 act_skbedit
 sch_fq
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 netconsole
 configfs
 pppoe
 pppox
 ppp_generic
 slhc
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc

 [151784.840667] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 
4.3.0-build-0087 #3
 [151784.841014] Hardware name: Intel Corporation S2600GZ/S2600GZ, 
BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
 [151784.841575] task: 88042d29dc00 ti: 88042d2c4000 task.ti: 
88042d2c4000

 [151784.847603] RIP: 0010:[]
 [] nf_ct_delete+0x28/0x20e [nf_conntrack]
 [151784.848421] RSP: 0018:88042f0a3e80  EFLAGS: 00010246
 [151784.848797] RAX: ffa2050402d3ab00 RBX: 8803d5087368 RCX: 
dead0200
 [151784.849421] RDX:  RSI:  RDI: 
8803d5087368
 [151784.850090] RBP: 88042f0a3ec8 R08: 88042f0a3f08 R09: 
0100
 [151784.850736] R10: 2710 R11: 0020 R12: 
a0045380
 [151784.851389] R13: 0065 R14:  R15: 

 [151784.852096] FS:  () 
GS:88042f0a() knlGS:

 [151784.852791] CS:  0010 DS:  ES:  CR0: 80050033
 [151784.853204] CR2: 7fec804efcbc CR3: 0200c000 CR4: 
000406e0

 [151784.853752] Stack:
 [151784.854110]  88042b074400
 1f3c94e2cf92
 0001144267c3b61c
 172f18fced27

 [151784.855136]  8100
 a0045380
 0065
 88042d2c8000

 [151784.856203]  0100
 88042f0a3ed8
 a004538d
 88042f0a3ef8

 [151784.857224] Call Trace:
 [151784.857608]  

 [151784.857717]  [] ? nf_ct_delete+0x20e/0x20e 
[nf_conntrack]
 [151784.858432]  [] death_by_timeout+0xd/0xf 
[nf_conntrack]
 [151784.858807]  [] 
call_timer_fn.isra.26+0x17/0x6d

 [151784.859238]  [] run_timer_softirq+0x172/0x193
 [151784.859630]  [] __do_softirq+0xba/0x1a9
 [151784.859985]  [] irq_exit+0x37/0x7c
 [151784.860380]  [] 
smp_apic_timer_interrupt+0x3d/0x48

 [151784.860774]  [] apic_timer_interrupt+0x7c/0x90
 [151784.861144]  

 [151784.861272]  [] ? mwait_idle+0x68/0x7e
 [151784.862033]  [] ? 
atomic_notifier_call_chain+0x13/0x15

 [151784.862409]  [] arch_cpu_idle+0xa/0xc
 [151784.862754]  [] default_idle_call+0x27/0x29
 [151784.863127]  [] cpu_startup_entry+0x121/0x1da
 [151784.863518]  [] start_secondary+0xe7/0xea
 [151784.863893] Code:
 5f
 5d
 c3
 55
 48
 89
 e5
 41
 57
 41
 89
 d7
 41
 56
 41
 89
 f6
 41
 55
 41
 54
 53
 48
 89
 fb
 48
 83
 ec
 20
 48
 8b
 87
 c8
 00
 00
 00
 48
 85
 c0
 74
 0c
 31
 d2

 83
 78
 1c
 00
 0f
 95
 c2
 eb
 02
 31
 d2
 85
 d2
 74
 1e
 44
 0f
 b7
 60
 1c

 [151784.871213] RIP
 [] nf_ct_delete+0x28/0x20e [nf_conntrack]
 [151784.871692]  RSP 
 [151784.872062] ---[ end trace 54f9b78db1dfe968 ]---
 [151784.886584] Kernel panic - not syncing: Fatal exception in 
interrupt

 [151784.886981] Kernel Offset: disabled
 [151784.922664] Rebooting in 5 seconds..


 10.0.253.10 [ 1722.079874] general protection fault:  [#1]
 10.0.253.10 SMP
 10.0.253.10
 10.0.253.10 [ 1722.080366] Modules linked in:
 10.0.253.10 act_skbedit
 10.0.253.10 sch_fq
 10.0.253.10 cls_fw
 10.0.253.10 act_police
 10.0.253.10 cls_u32
 10.0.253.10 sch_ingress
 10.0.253.10 sch_sfq
 10.0.253.10 sch_htb
 10.0.253.10 netconsole
 10.0.253.10 configfs
 10.0.253.10 pppoe
 10.0.253.10 pppox
 10.0.253.10 ppp_generic
 10.0.253.10 slhc
 10.0.253.10 xt_nat
 10.0.253.10 ts_bm
 10.0.253.10 xt_string
 10.0.253.10 xt_connmark
 10.0.253.10 xt_TCPMSS
 10.0.253.10 xt_tcpudp
 10.0.253.10 xt_mark
 10.0.253.10 iptable_filter
 10.0.253.10 iptable_nat
 10.0.253.10 nf_conntrack_ipv4
 10.0.253.10 nf_defrag_ipv4
 10.0.253.10 nf_nat_ipv4
 10.0.253.10 nf_nat
 10.0.253.10 nf_conntrack
 10.0.253.10 iptable_mangle
 10.0.253.10 ip_tables
 10.0.253.10 x_tables
 10.0.253.10 8021q
 10.0.253.10 garp
 10.0.253.10 mrp
 10.0.253.10 stp
 10.0.253.10 llc
 10.0.253.10
 10.0.253.10 [ 1722.085568] CPU: 19 PID: 103 Comm: ksoftirqd/19 Not 
tainted 4.3.0-build-0087 #3
 10.0.253.10 [ 1722.086291] Hardware name: Intel Corporation 
S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
 10.0.253.10 [ 1722.087011] tas

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-04 06:58, Eric Dumazet wrote:

On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:
> On 2015-11-04 00:06, Cong Wang wrote:
> > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
> >  wrote:
> >> Hi!
> >>
> >> Actually seems i was getting this panic for a while (once per week) on
> >> loaded pppoe server, but just now was able to get full panic message.
> >> After checking commit logs on sch_fq.c i didnt seen any fixes, so
> >> probably
> >> upgrading to newer kernel wont help?
> >
> >
> > Can you share your `tc qdisc show dev ` with us? And how to
> > reproduce
> > it? I tried to setup htb+fq and then flip the interface back and forth
> > but I don't
> > see any crash.
> My guess it wont be easy to reproduce, it is happening on box with 4.5k
> interfaces, that constantly create/delete interfaces,
> and even with that this problem may happen once per day, or may not
> happen for 1 week.
>
> Here is script that is being fired after new ppp interface detected. But
> pppoe process are independent from
> process that are "establishing" shapers.


It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.


Following patch would be more appropriate.
Prior one was meant to 'show' the issue.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cb5d4ad32946..7f5f3e8a10f5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue
*dev_queue,
spin_lock_bh(root_lock);

/* Prune old scheduler */
-   if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-   qdisc_reset(oqdisc);
-
+   if (oqdisc) {
+   if (atomic_read(&oqdisc->refcnt) <= 1)
+   qdisc_reset(oqdisc);
+   set_bit(__QDISC_STATE_DEACTIVATED, &oqdisc->state);
+   }
/* ... and graft new one */
if (qdisc == NULL)
qdisc = &noop_qdisc;


Applied, will test it, but this bug might be triggered rarely.
I will try to push it to more pppoe servers in order to stress test them 
(and 4.3) more.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-04 06:28, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:12 +0200, Denys Fedoryshchenko wrote:
Just enabling gro or gso (or together) is fine there. Thanks for 
advice.

Seems only tso causing problems.
Also i guess if i keep tso disabled, it will solve my MTU issues (i 
had

once issue, that traffic heading to pppoe users,
who have 14xx mtu, was blocked, when offloading enabled on transit
server, but can't reproduce it quickly again).
Should i try to report to e1000e maintainers this bug? On similar 
setup

it is happening only at specific locations,
but i am not definitely sure what can be the reason.


Not sure, have you tried per chance latest kernel (linux-4.3) for this
e1000e issue ?

Are you using vlan tags on this NIC ?

Tested now, can be reproduced on 4.3 as well.
What is interesting, if i enable tso alone, and leave gso/gro off - it 
is working fine. gso+gro on, tso off - fine also.

But if i enable them all together - i trigger the bug.

[   71.699687] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[   71.699687]   TDH  <96>
[   71.699687]   TDT  <9c>
[   71.699687]   next_to_use  <9c>
[   71.699687]   next_to_clean<92>
[   71.699687] buffer_info[next_to_clean]:
[   71.699687]   time_stamp   
[   71.699687]   next_to_watch<96>
[   71.699687]   jiffies  
[   71.699687]   next_to_watch.status <0>
[   71.699687] MAC Status <40080083>
[   71.699687] PHY Status <796d>
[   71.699687] PHY 1000BASE-T Status  <3800>
[   71.699687] PHY Extended Status<3000>
[   71.699687] PCI Status <10>
[   73.699241] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[   73.699241]   TDH  <96>
[   73.699241]   TDT  <9c>
[   73.699241]   next_to_use  <9c>
[   73.699241]   next_to_clean<92>
[   73.699241] buffer_info[next_to_clean]:
[   73.699241]   time_stamp   
[   73.699241]   next_to_watch<96>
[   73.699241]   jiffies  
[   73.699241]   next_to_watch.status <0>
[   73.699241] MAC Status <40080083>
[   73.699241] PHY Status <796d>
[   73.699241] PHY 1000BASE-T Status  <3800>
[   73.699241] PHY Extended Status<3000>
[   73.699241] PCI Status <10>
[   75.698775] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[   75.698775]   TDH  <96>
[   75.698775]   TDT  <9c>
[   75.698775]   next_to_use  <9c>
[   75.698775]   next_to_clean<92>
[   75.698775] buffer_info[next_to_clean]:
[   75.698775]   time_stamp   
[   75.698775]   next_to_watch<96>
[   75.698775]   jiffies  
[   75.698775]   next_to_watch.status <0>
[   75.698775] MAC Status <40080083>
[   75.698775] PHY Status <796d>
[   75.698775] PHY 1000BASE-T Status  <3800>
[   75.698775] PHY Extended Status<3000>
[   75.698775] PCI Status <10>
[   76.709871] [ cut here ]
[   76.710075] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x17c/0x1e2()
[   76.710383] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed 
out
[   76.710572] Modules linked in: xt_CLASSIFY xt_set ipt_REJECT 
nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_recent ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_tcpudp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre ip_set_hash_net ip_set 
nfnetlink iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables 
act_nat cls_u32 sch_ingress
[   76.713354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.3.0-build-0087 #1
[   76.713547] Hardware name: Intel Corporation SandyBridge Platform/To 
be filled by O.E.M., BIOS S1200BT.86B.02.00.0041.120520121743 12/05/2012
[   76.713868]   88042f003e08 81259d1d 
88042f003e50
[   76.714413]  88042f003e40 810bda73 818654a3 
88042c29
[   76.714946]  8800be758c00 0001  
88042f003ea0

[   76.715481] Call Trace:
[   76.715657][] dump_stack+0x44/0x55
[   76.715908]  [] warn_slowpath_common+0x95/0xae
[   76.716095]  [] ? dev_watchdog+0x17c/0x1e2
[   76.716281]  [] warn_slowpath_fmt+0x47/0x49
[   76.716470]  [] ? mod_timer_pinned+0xaf/0xbe
[   76.716662]  [] dev_watchdog+0x17c/0x1e2
[   76.716850]  [] ? dev_graft_qdisc+0x65/0x65
[   76.717039]  [] call_timer_fn.isra.26+0x17/0x6d
[   76.717227]  [] run_timer_softirq+0x172/0x193
[   76.717418]  [] __do_softirq+0xba/0x1a9
[   76.717606]  [] irq_exit+0x37/0x7c
[   76.717795]  [] smp_apic_timer_interrupt+0x3d/0x48
[   76.717988]  [] apic_timer_interrupt+0x7c/0x90
[   76.7181

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-04 00:06, Cong Wang wrote:

On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
 wrote:

Hi!

Actually seems i was getting this panic for a while (once per week) on
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so 
probably

upgrading to newer kernel wont help?



Can you share your `tc qdisc show dev ` with us? And how to 
reproduce

it? I tried to setup htb+fq and then flip the interface back and forth
but I don't
see any crash.
My guess it wont be easy to reproduce, it is happening on box with 4.5k 
interfaces, that constantly create/delete interfaces,
and even with that this problem may happen once per day, or may not 
happen for 1 week.


Here is script that is being fired after new ppp interface detected. But 
pppoe process are independent from

process that are "establishing" shapers.

/sbin/tc qdisc del  root
/sbin/tc qdisc add  handle 1: root htb default 3

/sbin/tc filter add parent 1:0 protocol ip prio 4 handle 1 fw flowid 1:3
/sbin/tc filter add parent 1:0 protocol ip prio 3 u32 match ip protocol 
6 0xff match ip src 10.0.252.8/32 flowid 1:3/sbin/tc filter add parent 
1:0 protocol ip prio 5 u32 match ip protocol 1 0xff flowid 1:0
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 80 0x flowid 1:4
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 443 0x flowid 1:5
/sbin/tc filter add parent 1:0 protocol ip prio 100 u32 match u32 0 0 
flowid 1:2


/sbin/tc class add  classid 1:1 parent 1:0 htb rate 512Kbit ceil 
512Kbit.

/sbin/tc class add  classid 1:2 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:3 parent 1:0 htb rate 10Mbit ceil 10Mbit
/sbin/tc class add  classid 1:4 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:5 parent 1:1 htb rate 32Kbit ceil 512Kbit

/sbin/tc qdisc add parent 1:2 fq limit 300
/sbin/tc qdisc add parent 1:3 pfifo limit 300
/sbin/tc qdisc add parent 1:4 fq limit 300
/sbin/tc qdisc add parent 1:5 fq limit 300

Possible cases come to my mind (but maybe i missed others):
 Script and tc working and interface are deleted in a process (e.g. 
interface disappears)
 Script deleting root while there is heavy traffic on interface and a 
lot of packets queued
 ppp interface destroyed, while there is a lot of traffic queued on it 
(this one a bit rare situation)




Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-03 23:23, Eric Dumazet wrote:

On Tue, 2015-11-03 at 22:24 +0200, Denys Fedoryshchenko wrote:


I wont argue on that, you are right.
Ok, then it is a bit offtopic in current case, different setup, but i
know this one has easy to reproduce issues with offloading. but this 
is
bug related to that, directly appearing when i enable tso/gso/gro. I 
am

losing access to remote box, so max i can do right now:
ethtool -K eth0 tso on gso on gro on; sleep 5;ethtool -K eth0 tso off
gso off gro off

No shapers, just plain nat. I suspect it might be specific to network
card, but not sure.





What happens if you enable gro, but disable tso ?

With GRO enabled, you'll get a good performance increase, as forwarding
and qdisc will use big packets.
Just enabling gro or gso (or together) is fine there. Thanks for advice. 
Seems only tso causing problems.
Also i guess if i keep tso disabled, it will solve my MTU issues (i had 
once issue, that traffic heading to pppoe users,
who have 14xx mtu, was blocked, when offloading enabled on transit 
server, but can't reproduce it quickly again).
Should i try to report to e1000e maintainers this bug? On similar setup 
it is happening only at specific locations,

but i am not definitely sure what can be the reason.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-03 21:49, Eric Dumazet wrote:


Well, I am telling you.

Say no to people advising to turn off GRO/TSO.

If you were the guy adviding others to do so, it is time to see the
light.

Lets fix the bugs if any, instead of spreading disinformation.

I am so tired of telling these very simple facts guys.

If you prefer, continue to work on linux-2.0 but don't ask help on
netdev.

I wont argue on that, you are right.
Ok, then it is a bit offtopic in current case, different setup, but i 
know this one has easy to reproduce issues with offloading. but this is 
bug related to that, directly appearing when i enable tso/gso/gro. I am 
losing access to remote box, so max i can do right now:
ethtool -K eth0 tso on gso on gro on; sleep 5;ethtool -K eth0 tso off 
gso off gro off


No shapers, just plain nat. I suspect it might be specific to network 
card, but not sure.

4.1.4
02:00.0 "Class 0200" "8086" "10d3" "8086" "357a"

driver: e1000e
version: 2.3.2-k
firmware-version: 0.13-4
bus-info: :00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

But after that messages, honestly i don't know where to dig.

[6606122.904234] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[6606122.904234]   TDH  
[6606122.904234]   TDT  
[6606122.904234]   next_to_use  
[6606122.904234]   next_to_clean
[6606122.904234] buffer_info[next_to_clean]:
[6606122.904234]   time_stamp   <12761e88c>
[6606122.904234]   next_to_watch
[6606122.904234]   jiffies  <12761e928>
[6606122.904234]   next_to_watch.status <0>
[6606122.904234] MAC Status <40080083>
[6606122.904234] PHY Status <796d>
[6606122.904234] PHY 1000BASE-T Status  <3800>
[6606122.904234] PHY Extended Status<3000>
[6606122.904234] PCI Status <10>
[6606124.903733] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[6606124.903733]   TDH  
[6606124.903733]   TDT  
[6606124.903733]   next_to_use  
[6606124.903733]   next_to_clean
[6606124.903733] buffer_info[next_to_clean]:
[6606124.903733]   time_stamp   <12761e88c>
[6606124.903733]   next_to_watch
[6606124.903733]   jiffies  <12761e9f0>
[6606124.903733]   next_to_watch.status <0>
[6606124.903733] MAC Status <40080083>
[6606124.903733] PHY Status <796d>
[6606124.903733] PHY 1000BASE-T Status  <3800>
[6606124.903733] PHY Extended Status<3000>
[6606124.903733] PCI Status <10>
[6606126.903291] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
[6606126.903291]   TDH  
[6606126.903291]   TDT  
[6606126.903291]   next_to_use  
[6606126.903291]   next_to_clean
[6606126.903291] buffer_info[next_to_clean]:
[6606126.903291]   time_stamp   <12761e88c>
[6606126.903291]   next_to_watch
[6606126.903291]   jiffies  <12761eab8>
[6606126.903291]   next_to_watch.status <0>
[6606126.903291] MAC Status <40080083>
[6606126.903291] PHY Status <796d>
[6606126.903291] PHY 1000BASE-T Status  <3800>
[6606126.903291] PHY Extended Status<3000>
[6606126.903291] PCI Status <10>
[6606127.912352] [ cut here ]
[6606127.912566] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x180/0x1e6()
[6606127.912877] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed 
out
[6606127.913067] Modules linked in: xt_CLASSIFY xt_set ipt_REJECT 
nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_recent ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_tcpudp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre ip_set_hash_net ip_set 
nfnetlink iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables 
act_nat cls_u32 sch_ingress
[6606127.915843] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.1.4-build-0084 #1
[6606127.916035] Hardware name: Intel Corporation SandyBridge 
Platform/To be filled by O.E.M., BIOS 
S1200BT.86B.02.00.0041.120520121743 12/05/2012
[6606127.916356]  0009 88042f003dd8 81896390 
00fb
[6606127.916903]  88042f003e28 88042f003e18 810bc024 
820aad98
[6606127.917451]  81830ab3 8800be47c000 88042a8dce00 
0001

[6606127.917991] Call Trace:
[6606127.918175][] dump_stack+0x45/0x57
[6606127.918429]  [] warn_slowpath_common+0x97/0xb1
[6606127.918621]  [] ? dev_watchdog+0x180/0x1e6
[6606127.918812]  [] warn_slowpath_fmt+0x41/0x43
[6606127.919007]  [] ? nf_ct_delete+0x1ef/0x202 
[nf_conntrack]

[6606127.919201]  [] dev_watchdog+0x180/0x1e6
[6606127.919396]  [] ? nf_ct_delete+0x202/0x202 
[nf_conntrack]

[6606127.919589]  [] ? dev_graft_qdisc+0x65/0x65
[6606127.919781]  [] call_timer_fn.isra.27+0x17/0x6d
[6606127.919

Re: HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-03 21:11, Eric Dumazet wrote:

On Tue, 2015-11-03 at 19:33 +0200, Denys Fedoryshchenko wrote:

Hi

Recently i was testing shaping over single 10G cards, for speeds up to
3-4Gbps, and noticed interesting effect.

Shaping scheme:
Incoming bandwidth comes to switch port, with access vlan 100
Outgoing bandwidth leaves switch port with access vlan 200
Linux with Intel X710 connected to trunk port, bridge created, 
eth0.100

bridged to eth0.200
gso/gro/tso disabled (they doesn't work nice with shapers)


Well, this seems urban legend to me.

Something that is repeatedly copied/pasted on many web pages since last
century.

Given the nature of qdisc (being protected by a spinlock), you
absolutely want to have some kind of aggregation.

I have a patch to allow a sysadmin to set a max gro segs value to
incoming packets. You could play with it. Start with 4 segments,
allow GSO/TSO on the output and watch performance coming back.


It is not, since i have more than 120 servers installed over country 
(most of them handle small traffic), in forwarding mode, first thing i 
am doing on forwarding setup - disabling gro/gso/tso. It is helped also 
many ISP on their forum where i visit often, first thing in 
troubleshooting unreliable network traffic forwarding - disabling 
offloading.
Because problem starts from incorrect shaping, and ends in some cases 
with network drivers spitting watchdog errors. Sometimes even shaper not 
necessary, just plain forwarding with offload enabled can cause issues, 
but it might be bug in networking drivers.
Should i try to reproduce and report? Sure if anybody can look into this 
issue.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HTB, HFSC, PIE, FIFO stuck on 2.4Gbit on default values

2015-11-03 Thread Denys Fedoryshchenko

Hi

Recently i was testing shaping over single 10G cards, for speeds up to 
3-4Gbps, and noticed interesting effect.


Shaping scheme:
Incoming bandwidth comes to switch port, with access vlan 100
Outgoing bandwidth leaves switch port with access vlan 200
Linux with Intel X710 connected to trunk port, bridge created, eth0.100 
bridged to eth0.200

gso/gro/tso disabled (they doesn't work nice with shapers)
Sure latest kernel

Shaper are installed on eth0.200, and seems multiqueue works on eth0 in 
general (i see packets are distributed over each queue), CPU load is 
very low (max 20% on core, but usually below 5%).

I tried:
HTB with fq, pfifo, pie qdisc
HFSC with fq, pfifo, pie qdisc

After i run shaper with default values, i can see traffic start to queue 
in classes and total traffic doesn't reach more than 2.4Gbit, and if i 
remove shaper it directly reach 4Gbit.
The only trick i found, it is running pie with burst 1 cburst 1 
in leaf classes, and 10 in root class (i think 1 in root class 
might work as well). If i change discipline to fq, i am returning back 
to 2.4Gbit, but it might be just because fq is not intended to be used 
with HTB leaf class.
So in my case burst/cburst solved issue, but i suspect maybe possible 
more elegant solution/tuning, than putting some random values?
Is there any particular reason why i am limited by ~2.4Gbit on any other 
settings?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko

On 2015-11-02 18:12, Eric Dumazet wrote:

On Mon, 2015-11-02 at 17:58 +0200, Denys Fedoryshchenko wrote:

On 2015-11-02 17:24, Eric Dumazet wrote:
> On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote:
>> Hi!
>>
>> Actually seems i was getting this panic for a while (once per week) on
>> loaded pppoe server, but just now was able to get full panic message.
>> After checking commit logs on sch_fq.c i didnt seen any fixes, so
>> probably upgrading to newer kernel wont help?
>
> I do not think we support sch_fq as a HTB leaf.
>
> If you want both HTB and sch_fq, you need to setup a bonding device.
>
> HTB on bond0
>
> sch_fq on the slaves
>
> Sure, the kernel should not crash, but HTB+sch_fq on same net device is
> certainly not something that will work anyway.
Strange, because except ppp, on static devices it works really very 
well

in such scheme. It is the only solution that can throttle incoming
bandwidth, when bandwidth is very overbooked - reliably, for my use
cases, such as 256k+ flows/2.5Gbps and several different classes of
traffic, so using DRR will end up in just not enough classes.

On latest kernels i had to patch tc to provide parameter for orphan 
mask

in fq, to increase number for flows for transit traffic.
None of other qdiscs able to solve this problem, incoming bandwidth
simply flowing 10-20% more than set, but fq is doing magic.
The only device that was working with similar efficiency for such 
cases
- proprietary PacketShaper, but is modifying tcp window size, and 
can't

be called transparent, and also has stability issues over 1Gbps.


Ah, I was thinking you needed more like 10Gb traffic ;)

with HTB on bonding, we can use MQ+FQ on the slaves in order to use 
many

cpus to serve local traffic.

But yes, if you use HTB+FQ for forwarding, I guess the bonding setup is
not really needed.
Well, here country is very underdeveloped in matters of technology. 10G 
interfaces appeared in some ISP only this year.
On the ppp interfaces where crash happening - it is even less bandwidth. 
Each user max 1-2Mbps(average usage 128kbps), 4.5k interfaces.
But i have some more heavy setups there, around 9k pppoe users 
terminated on single server, (means 9k interfaces), about 2Gbps traffic 
passing thru.
If i take non-FOSS solution, i will have to pay for software licenses 
$100k+, which is unbearable for local ISP. fq is not critical in this 
specific use case, i can use for ppp interfaces fifo or such, but i 
guess better to report a but :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko

On 2015-11-02 17:24, Eric Dumazet wrote:

On Mon, 2015-11-02 at 16:11 +0200, Denys Fedoryshchenko wrote:

Hi!

Actually seems i was getting this panic for a while (once per week) on
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so
probably upgrading to newer kernel wont help?


I do not think we support sch_fq as a HTB leaf.

If you want both HTB and sch_fq, you need to setup a bonding device.

HTB on bond0

sch_fq on the slaves

Sure, the kernel should not crash, but HTB+sch_fq on same net device is
certainly not something that will work anyway.
Strange, because except ppp, on static devices it works really very well 
in such scheme. It is the only solution that can throttle incoming 
bandwidth, when bandwidth is very overbooked - reliably, for my use 
cases, such as 256k+ flows/2.5Gbps and several different classes of 
traffic, so using DRR will end up in just not enough classes.


On latest kernels i had to patch tc to provide parameter for orphan mask 
in fq, to increase number for flows for transit traffic.
None of other qdiscs able to solve this problem, incoming bandwidth 
simply flowing 10-20% more than set, but fq is doing magic.
The only device that was working with similar efficiency for such cases 
- proprietary PacketShaper, but is modifying tcp window size, and can't 
be called transparent, and also has stability issues over 1Gbps.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-02 Thread Denys Fedoryshchenko

Hi!

Actually seems i was getting this panic for a while (once per week) on 
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so 
probably upgrading to newer kernel wont help?



 [237470.633382] general protection fault:  [#1]
 SMP

 [237470.633832] Modules linked in:
 netconsole
 configfs
 act_skbedit
 sch_fq
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 pppoe
 pppox
 ppp_generic
 slhc
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc

 [237470.637835] CPU: 1 PID: 14035 Comm: accel-pppd Not tainted 
4.2.3-build-0087 #3
 [237470.638342] Hardware name: Intel Corporation 
S2600GZ/S2600GZ, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
 [237470.638859] task: 8803ef8b5080 ti: 8803ed7e 
task.ti: 8803ed7e

 [237470.639370] RIP: 0010:[]
 [] rb_erase+0x37/0x2c4
 [237470.639960] RSP: 0018:8803ed7e3b88  EFLAGS: 00010286
 [237470.644863] RAX:  RBX: 8804106ab000 
RCX: 0001
 [237470.645366] RDX: ffa2050402210218 RSI: 88040cfe2cf0 
RDI: 8803f50d00e0
 [237470.645872] RBP: 8803ed7e3b88 R08:  
R09: 88042ee37d50
 [237470.646376] R10: ea000fe7a9c0 R11: 94f1b850 
R12: 019e
 [237470.646881] R13: 88040cfe2cf0 R14: 8803f50d00d0 
R15: 
 [237470.647381] FS:  7fcd5d384700() 
GS:88042ee2() knlGS:
 [237470.647889] CS:  0010 DS:  ES:  CR0: 
80050033
 [237470.648209] CR2: 7fcd003efa90 CR3: 000424b6e000 
CR4: 000406e0

 [237470.648707] Stack:
 [237470.648990]  8803ed7e3bb8
 a00ef38b
 8804106ab000
 880416079000

 [237470.649791]  0002
 8804160790d8
 8803ed7e3bd8
 8183785c

 [237470.650589]  0002
 8800b021d000
 8803ed7e3c18
 a00d247a

 [237470.651387] Call Trace:
 [237470.651716]  [] fq_reset+0x7a/0xf2 
[sch_fq]

 [237470.652084]  [] qdisc_reset+0x18/0x42
 [237470.652444]  [] htb_reset+0x96/0x14d 
[sch_htb]

 [237470.652780]  [] qdisc_reset+0x18/0x42
 [237470.653146]  [] 
dev_deactivate_queue.constprop.34+0x43/0x53
 [237470.653726]  [] 
dev_deactivate_many+0x53/0x206
 [237470.654088]  [] 
__dev_close_many+0x73/0xbf

 [237470.654436]  [] __dev_close+0x2c/0x41
 [237470.654784]  [] ? 
_raw_spin_unlock_bh+0x15/0x17
 [237470.655106]  [] 
__dev_change_flags+0xa5/0x13c
 [237470.655427]  [] 
dev_change_flags+0x23/0x59

 [237470.655777]  [] ? mutex_lock+0x13/0x24
 [237470.656073]  [] devinet_ioctl+0x246/0x533
 [237470.656372]  [] inet_ioctl+0x8c/0xa6
 [237470.656667]  [] sock_do_ioctl+0x22/0x40
 [237470.656960]  [] sock_ioctl+0x1f2/0x200
 [237470.657253]  [] do_vfs_ioctl+0x360/0x41a
 [237470.657549]  [] ? vfs_write+0x105/0x164
 [237470.657841]  [] SyS_ioctl+0x39/0x61
 [237470.658134]  [] 
entry_SYSCALL_64_fastpath+0x16/0x6e

 [237470.658431] Code:
 48
 85
 c0
 75
 36
 48
 8b
 0f
 48
 89
 c8
 48
 83
 e0
 fc
 74
 12
 48
 39
 78
 10
 75
 06
 48
 89
 50
 10
 eb
 09
 48
 89
 50
 08
 eb
 03
 48
 89
 16
 48
 85
 d2
 74
 08

 89
 0a
 e9
 83
 02
 00
 00
 80
 e1
 01
 e9
 c3
 00
 00
 00
 48
 85
 d2
 75
 2c

 [237470.663930] RIP
 [] rb_erase+0x37/0x2c4
 [237470.664296]  RSP 
 [237470.664598] ---[ end trace 32ea40a7de450892 ]---
 [237470.673272] Kernel panic - not syncing: Fatal exception in 
interrupt

 [237470.673577] Kernel Offset: disabled
 [237470.704654] Rebooting in 5 seconds..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()

2015-10-21 Thread Denys Fedoryshchenko

On 2015-10-22 03:14, Matt Bennett wrote:

On Tue, 2015-10-13 at 05:13 +0300, Denys Fedoryshchenko wrote:

On 2015-10-07 15:12, Guillaume Nault wrote:
> On Mon, Oct 05, 2015 at 02:08:44PM +0200, Guillaume Nault wrote:
>>if (po) {
>>struct sock *sk = sk_pppox(po);
>>
>> -  bh_lock_sock(sk);
>> -
>> -  /* If the user has locked the socket, just ignore
>> -   * the packet.  With the way two rcv protocols hook into
>> -   * one socket family type, we cannot (easily) distinguish
>> -   * what kind of SKB it is during backlog rcv.
>> -   */
>> -  if (sock_owned_by_user(sk) == 0) {
>> -  /* We're no longer connect at the PPPOE layer,
>> -   * and must wait for ppp channel to disconnect us.
>> -   */
>> -  sk->sk_state = PPPOX_ZOMBIE;
>> -  }
>> -
>> -  bh_unlock_sock(sk);
>>if (!schedule_work(&po->proto.pppoe.padt_work))
>>sock_put(sk);
>>}
>>
> Finally, I think I'll keep this approach for net-next, to completely
> remove PPPOX_ZOMBIE.
> For now, let's just avoid any assumption about the relationship between
> the PPPOX_ZOMBIE state and the value of po->pppoe_dev, as suggested by
> Matt.
>
> Denys, can you let me know if your issue goes away with the following
> patch?
> ---
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 2ed7506..5e0b432 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock)
>
>po = pppox_sk(sk);
>
> -  if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
> +  if (po->pppoe_dev) {
>dev_put(po->pppoe_dev);
>po->pppoe_dev = NULL;
>}
I just got OK to upgrade server yesterday, for now around 12 hours
working fine. I need 1-2 more days, and maybe will upgrade few more
servers to say for sure, if it is ok or not.
Sorry for delay, just it is production servers and at current 
situation

they cannot tolerate significant downtime.


Any update on whether this issue is fixed with the suggested patch?


As on server i am allowed to test - no crashed anymore, but i am unable 
to get permission yet to test
on server where this crash was happening several times per day. But all 
i can say it is definitely better now.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()

2015-10-12 Thread Denys Fedoryshchenko

On 2015-10-07 15:12, Guillaume Nault wrote:

On Mon, Oct 05, 2015 at 02:08:44PM +0200, Guillaume Nault wrote:

if (po) {
struct sock *sk = sk_pppox(po);

-   bh_lock_sock(sk);
-
-   /* If the user has locked the socket, just ignore
-* the packet.  With the way two rcv protocols hook into
-* one socket family type, we cannot (easily) distinguish
-* what kind of SKB it is during backlog rcv.
-*/
-   if (sock_owned_by_user(sk) == 0) {
-   /* We're no longer connect at the PPPOE layer,
-* and must wait for ppp channel to disconnect us.
-*/
-   sk->sk_state = PPPOX_ZOMBIE;
-   }
-
-   bh_unlock_sock(sk);
if (!schedule_work(&po->proto.pppoe.padt_work))
sock_put(sk);
}


Finally, I think I'll keep this approach for net-next, to completely
remove PPPOX_ZOMBIE.
For now, let's just avoid any assumption about the relationship between
the PPPOX_ZOMBIE state and the value of po->pppoe_dev, as suggested by
Matt.

Denys, can you let me know if your issue goes away with the following
patch?
---
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 2ed7506..5e0b432 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock)

po = pppox_sk(sk);

-   if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
+   if (po->pppoe_dev) {
dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
}
I just got OK to upgrade server yesterday, for now around 12 hours 
working fine. I need 1-2 more days, and maybe will upgrade few more 
servers to say for sure, if it is ok or not.
Sorry for delay, just it is production servers and at current situation 
they cannot tolerate significant downtime.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()

2015-10-04 Thread Denys Fedoryshchenko

On 2015-10-02 20:54, Guillaume Nault wrote:

On Fri, Oct 02, 2015 at 11:01:45AM +0300, Denys Fedoryshchenko wrote:
Here is similar panic after patch applied (it might be different bug), 
got

over netconsole:

 [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted
4.2.2-build-0087 #2
 [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, 
BIOS

SE5C600.86B.02.03.0003.041920141333 04/19/2014
 [126348.618193] task: 8817cfbe ti: 8817c635 task.ti:
8817c635
 [126348.618696] RIP: 0010:[]
 [] pppoe_release+0x56/0x142 [pppoe]
 [126348.619306] RSP: 0018:8817c6353e28  EFLAGS: 00010202
 [126348.619601] RAX:  RBX: 8817a92b0400 RCX:

 [126348.620152] RDX: 0001 RSI: fe01 RDI:
8180c18a
 [126348.620715] RBP: 8817c6353e68 R08:  R09:

 [126348.621254] R10: 88173c02b210 R11: 0293 R12:
8817b3c18000
 [126348.621784] R13: 8817b3c18030 R14: 8817967f1140 R15:
8817d226c920
 [126348.622330] FS:  7f9444db9700() GS:8817dee0()
knlGS:
 [126348.622876] CS:  0010 DS:  ES:  CR0: 80050033
 [126348.623202] CR2: 0428 CR3: 0017c70b2000 CR4:
001406f0
 [126348.623760] Stack:
 [126348.624056]  000100200018
 
 0001
 8817b3c18000

 [126348.624925]  a00ec280
 8817b3c18030
 8817967f1140
 8817d226c920

 [126348.625736]  8817c6353e88
 8180820a
 88173c02b200
 0008

 [126348.626533] Call Trace:
 [126348.626873]  [] sock_release+0x1a/0x70
 [126348.627183]  [] sock_close+0xd/0x11
 [126348.627512]  [] __fput+0xdf/0x193
 [126348.627845]  [] fput+0x9/0xb
 [126348.628169]  [] task_work_run+0x78/0x8f
 [126348.628517]  [] do_notify_resume+0x40/0x4e
 [126348.628837]  [] int_signal+0x12/0x17


Ok, so there's another possibility for pppoe_release() to be called 
while
sk->sk_state is PPPOX_{CONNECTED,BOUND,ZOMBIE} but po->pppoe_dev is 
NULL.


I'll check the code to see if I can find any race wrt. po->pppoe_dev
and sk->sk_state settings.

In a previous message, you said you'd try reverting 287f3a943fef
("pppoe: Use workqueue to die properly when a PADT is received") and
related patches. I guess "related patches" means 665a6cd809f4 ("pppoe:
drop pppoe device in pppoe_unbind_sock_work"), right?.
Did these reverts give any successful result?

BTW, please don't top-post.
I am doing just "dirty" patch like this, i cannot certainly remember if 
i was doing git reversal, because
it was a while when i spotted this bug. After that pppoe server is not 
rebooting.


diff -Naur linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 
linux-4.2.2-changed/drivers/net/ppp/pppoe.c
--- linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 2015-09-29 
20:38:27.0 +0300
+++ linux-4.2.2-changed/drivers/net/ppp/pppoe.c 2015-10-04 
19:05:55.697732991 +0300

@@ -519,7 +519,7 @@
}

bh_unlock_sock(sk);
-   if (!schedule_work(&po->proto.pppoe.padt_work))
+// if (!schedule_work(&po->proto.pppoe.padt_work))
sock_put(sk);
}

@@ -633,7 +633,7 @@

lock_sock(sk);

-   INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
+// INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);

error = -EINVAL;
if (sp->sa_protocol != PX_PROTO_OE)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()

2015-10-02 Thread Denys Fedoryshchenko
Here is similar panic after patch applied (it might be different bug), 
got over netconsole:


 [126348.610996] BUG: unable to handle kernel
 NULL pointer dereference
 at 0428
 [126348.611656] IP:
 [] pppoe_release+0x56/0x142 [pppoe]
 [126348.612033] PGD 17d0b03067
 PUD 17c721b067
 PMD 0

 [126348.612545] Oops:  [#1]
 SMP

 [126348.612981] Modules linked in:
 act_skbedit
 sch_fq
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 pppoe
 pppox
 ppp_generic
 slhc
 netconsole
 configfs
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 bonding

 [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted 
4.2.2-build-0087 #2
 [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
SE5C600.86B.02.03.0003.041920141333 04/19/2014
 [126348.618193] task: 8817cfbe ti: 8817c635 task.ti: 
8817c635

 [126348.618696] RIP: 0010:[]
 [] pppoe_release+0x56/0x142 [pppoe]
 [126348.619306] RSP: 0018:8817c6353e28  EFLAGS: 00010202
 [126348.619601] RAX:  RBX: 8817a92b0400 RCX: 

 [126348.620152] RDX: 0001 RSI: fe01 RDI: 
8180c18a
 [126348.620715] RBP: 8817c6353e68 R08:  R09: 

 [126348.621254] R10: 88173c02b210 R11: 0293 R12: 
8817b3c18000
 [126348.621784] R13: 8817b3c18030 R14: 8817967f1140 R15: 
8817d226c920
 [126348.622330] FS:  7f9444db9700() GS:8817dee0() 
knlGS:

 [126348.622876] CS:  0010 DS:  ES:  CR0: 80050033
 [126348.623202] CR2: 0428 CR3: 0017c70b2000 CR4: 
001406f0

 [126348.623760] Stack:
 [126348.624056]  000100200018
 
 0001
 8817b3c18000

 [126348.624925]  a00ec280
 8817b3c18030
 8817967f1140
 8817d226c920

 [126348.625736]  8817c6353e88
 8180820a
 88173c02b200
 0008

 [126348.626533] Call Trace:
 [126348.626873]  [] sock_release+0x1a/0x70
 [126348.627183]  [] sock_close+0xd/0x11
 [126348.627512]  [] __fput+0xdf/0x193
 [126348.627845]  [] fput+0x9/0xb
 [126348.628169]  [] task_work_run+0x78/0x8f
 [126348.628517]  [] do_notify_resume+0x40/0x4e
 [126348.628837]  [] int_signal+0x12/0x17
 [126348.629131] Code:
 48
 8b
 83
 e0
 00
 00
 00
 a8
 01
 74
 12
 48
 89
 df
 e8
 0d
 24
 72
 e1
 b8
 f7
 ff
 ff
 ff
 e9
 eb
 00
 00
 00
 8a
 43
 12
 a8
 0b
 74
 1c
 48
 8b
 83
 a0
 02
 00
 00

 8b
 80
 28
 04
 00
 00
 65
 ff
 08
 48
 c7
 83
 a0
 02
 00
 00
 00
 00
 00
 00

 [126348.635060] RIP
 [] pppoe_release+0x56/0x142 [pppoe]
 [126348.635432]  RSP 
 [126348.635718] CR2: 0428
 [126348.641165] ---[ end trace 911ff90a1416e3d1 ]---
 [126348.653235] Kernel panic - not syncing: Fatal exception
 [126348.653538] Kernel Offset: disabled
 [126348.677177] Rebooting in 5 seconds..




On 2015-09-30 12:45, Guillaume Nault wrote:
Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in 
pppoe_release"),

pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference
at 04e0
[  570.142931] IP: [] pppoe_release+0x50/0x101 
[pppoe]

[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops:  [#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core
ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop
crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac
drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul
glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16
mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Debian-1.8.2-1 04/01/2014
[  570.144601] task: 88003d30d600 ti: 880036b6 task.ti:
880036b6
[  570.144601] RIP: 0010:[]  []
pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:880036b63e08  EFLAGS: 00010202
[  570.144601] RAX:  RBX: 88003434 RCX: 
0206
[  570.144601] RDX: 0006 RSI: 88003d30dd20 RDI: 
88003d30dd20
[  570.144601] RBP: 880036b63e28 R08: 0001 R09: 

[  570.144601] R10: 7ffee9b50420 R11: 880034340078 R12: 
8800387ec780
[  570.144601] R13: 8800387ec7b0 R14: 88003e222aa0 R15: 
8800387ec7b0

[  570.144601] FS:  7f5672f48700() GS:88003fc8()
knlGS:
[  570.144601] CS:  0010 DS:  ES:  CR0: 80050033
[  570.144601] CR2: 00

Re: 4.1.0, kernel panic, pppoe_release

2015-09-25 Thread Denys Fedoryshchenko

On 2015-09-25 17:38, Guillaume Nault wrote:

On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:

Hi,
Sorry for late reply, was not able to push new kernel on pppoes 
without

permissions (it's production servers), just got OK.

I am testing patch on another pppoe server with 9k users, for ~3 days, 
seems

fine. I will test today
also on server that was experiencing crashes within 1 day.


Thanks for the feedback. I'm about to submit a fix. Should I add a
Tested-by tag for you?
On one of servers i got same crash as before, within hours. 9k users 
server also crashed after while, so it seems it doesn't help.

I will do some more tests tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.1.0, kernel panic, pppoe_release

2015-09-21 Thread Denys Fedoryshchenko

Hi,
Sorry for late reply, was not able to push new kernel on pppoes without 
permissions (it's production servers), just got OK.


I am testing patch on another pppoe server with 9k users, for ~3 days, 
seems fine. I will test today

also on server that was experiencing crashes within 1 day.

On 2015-09-10 18:56, Guillaume Nault wrote:

On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:

Probably my knowledge of kernel is not sufficient, but i will try few
approaches.
One of them to add to pppoe_unbind_sock_work:

pppox_unbind_sock(sk);
+/* Signal the death of the socket. */
+sk->sk_state = PPPOX_DEAD;


I don't believe this will fix anything. pppox_unbind_sock() already
sets sk->sk_state when necessary.

I will wait first, to make sure this patch was causing kernel panic 
(it

needs 24h testing cycle), then i will try this fix.


I suspect the problem goes with actions performed on the underlying
interface (MAC address, MTU or link state update). This triggers
pppoe_flush_dev(), which cleans up the device without announcing it
in sk->sk_state.

Can you pleas try the following patch?

---
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
if (po->pppoe_dev == dev &&
 			sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) 
{

pppox_unbind_sock(sk);
-   sk->sk_state = PPPOX_ZOMBIE;
sk->sk_state_change(sk);
po->pppoe_dev = NULL;
dev_put(dev);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.1.0, kernel panic, pppoe_release

2015-07-17 Thread Denys Fedoryshchenko
Probably my knowledge of kernel is not sufficient, but i will try few 
approaches.

One of them to add to pppoe_unbind_sock_work:

pppox_unbind_sock(sk);
+/* Signal the death of the socket. */
+sk->sk_state = PPPOX_DEAD;

I will wait first, to make sure this patch was causing kernel panic (it 
needs 24h testing cycle), then i will try this fix.


On 2015-07-17 18:36, Dan Williams wrote:

On Fri, 2015-07-17 at 12:24 +0300, Denys Fedoryshchenko wrote:

As i suspect, this kernel panic caused by recent changes to pppoe.
This problem appearing in accel-pppd (server), on loaded servers (2k
users and more).
Most probably related to changed "pppoe: Use workqueue to die properly
when a PADT is received"
I will try to reverse this and related patches.


While I didn't write the patch, I'm the one that started the process
that got it submitted...  Could you review the patch quickly too to see
if you can spot anything amiss with it, so that it could get fixed up?
The original patch does fix a real problem so ideally we don't have to
revert the whole thing upstream.

Dan


On 2015-07-14 13:57, Denys Fedoryshchenko wrote:
> Here is panic message from netconsole. Please let me know if any
> additional information required.
>
> Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
> Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
> Jul 14 13:49:16 10.0.252.10 at 03f0
> Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
> Jul 14 13:49:16 10.0.252.10 []
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
> Jul 14 13:49:16 10.0.252.10 PUD 333f17067
> Jul 14 13:49:16 10.0.252.10 PMD 0
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops:  [#1]
> Jul 14 13:49:16 10.0.252.10 SMP
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
> Jul 14 13:49:16 10.0.252.10 netconsole
> Jul 14 13:49:16 10.0.252.10 configfs
> Jul 14 13:49:16 10.0.252.10 coretemp
> Jul 14 13:49:16 10.0.252.10 sch_fq
> Jul 14 13:49:16 10.0.252.10 cls_fw
> Jul 14 13:49:16 10.0.252.10 act_police
> Jul 14 13:49:16 10.0.252.10 cls_u32
> Jul 14 13:49:16 10.0.252.10 sch_ingress
> Jul 14 13:49:16 10.0.252.10 sch_sfq
> Jul 14 13:49:16 10.0.252.10 sch_htb
> Jul 14 13:49:16 10.0.252.10 pppoe
> Jul 14 13:49:16 10.0.252.10 pppox
> Jul 14 13:49:16 10.0.252.10 ppp_generic
> Jul 14 13:49:16 10.0.252.10 slhc
> Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
> Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
> Jul 14 13:49:16 10.0.252.10 tun
> Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
> Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
> Jul 14 13:49:16 10.0.252.10 xt_set
> Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
> Jul 14 13:49:16 10.0.252.10 ipt_REJECT
> Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
> Jul 14 13:49:16 10.0.252.10 ts_bm
> Jul 14 13:49:16 10.0.252.10 xt_string
> Jul 14 13:49:16 10.0.252.10 xt_connmark
> Jul 14 13:49:16 10.0.252.10 xt_DSCP
> Jul 14 13:49:16 10.0.252.10 xt_mark
> Jul 14 13:49:16 10.0.252.10 xt_tcpudp
> Jul 14 13:49:16 10.0.252.10 iptable_mangle
> Jul 14 13:49:16 10.0.252.10 iptable_filter
> Jul 14 13:49:16 10.0.252.10 iptable_nat
> Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
> Jul 14 13:49:16 10.0.252.10 nf_nat
> Jul 14 13:49:16 10.0.252.10 nf_conntrack
> Jul 14 13:49:16 10.0.252.10 ip_tables
> Jul 14 13:49:16 10.0.252.10 x_tables
> Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
> Jul 14 13:49:16 10.0.252.10 ip_set
> Jul 14 13:49:16 10.0.252.10 nfnetlink
> Jul 14 13:49:16 10.0.252.10 8021q
> Jul 14 13:49:16 10.0.252.10 garp
> Jul 14 13:49:16 10.0.252.10 mrp
> Jul 14 13:49:16 10.0.252.10 stp
> Jul 14 13:49:16 10.0.252.10 llc
> Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
> Jul 14 13:49:16 10.0.252.10
> Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm:
> accel-pppd Not tainted 4.1.0-build-0074 #7
> Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant
> DL320e Gen8 v2, BIOS P80 04/02/2015
> Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti:
> 8800b09f4000 task.ti: 8800b09f4000
> Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP:
> 0010:[]
> Jul 14 13:49:16 10.0.252.10 []
> pppoe_release+0x56/0x142 [pppoe]
> Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28
> EFLAGS: 00010202
> Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX:  RBX:
> 88032a214400 RCX: 
> Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI:
> 00

Re: 4.1.0, kernel panic, pppoe_release

2015-07-17 Thread Denys Fedoryshchenko

As i suspect, this kernel panic caused by recent changes to pppoe.
This problem appearing in accel-pppd (server), on loaded servers (2k 
users and more).
Most probably related to changed "pppoe: Use workqueue to die properly 
when a PADT is received"

I will try to reverse this and related patches.

On 2015-07-14 13:57, Denys Fedoryshchenko wrote:

Here is panic message from netconsole. Please let me know if any
additional information required.

Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
Jul 14 13:49:16 10.0.252.10 at 03f0
Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
Jul 14 13:49:16 10.0.252.10 []
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
Jul 14 13:49:16 10.0.252.10 PUD 333f17067
Jul 14 13:49:16 10.0.252.10 PMD 0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops:  [#1]
Jul 14 13:49:16 10.0.252.10 SMP
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
Jul 14 13:49:16 10.0.252.10 netconsole
Jul 14 13:49:16 10.0.252.10 configfs
Jul 14 13:49:16 10.0.252.10 coretemp
Jul 14 13:49:16 10.0.252.10 sch_fq
Jul 14 13:49:16 10.0.252.10 cls_fw
Jul 14 13:49:16 10.0.252.10 act_police
Jul 14 13:49:16 10.0.252.10 cls_u32
Jul 14 13:49:16 10.0.252.10 sch_ingress
Jul 14 13:49:16 10.0.252.10 sch_sfq
Jul 14 13:49:16 10.0.252.10 sch_htb
Jul 14 13:49:16 10.0.252.10 pppoe
Jul 14 13:49:16 10.0.252.10 pppox
Jul 14 13:49:16 10.0.252.10 ppp_generic
Jul 14 13:49:16 10.0.252.10 slhc
Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
Jul 14 13:49:16 10.0.252.10 tun
Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
Jul 14 13:49:16 10.0.252.10 xt_set
Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
Jul 14 13:49:16 10.0.252.10 ipt_REJECT
Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
Jul 14 13:49:16 10.0.252.10 ts_bm
Jul 14 13:49:16 10.0.252.10 xt_string
Jul 14 13:49:16 10.0.252.10 xt_connmark
Jul 14 13:49:16 10.0.252.10 xt_DSCP
Jul 14 13:49:16 10.0.252.10 xt_mark
Jul 14 13:49:16 10.0.252.10 xt_tcpudp
Jul 14 13:49:16 10.0.252.10 iptable_mangle
Jul 14 13:49:16 10.0.252.10 iptable_filter
Jul 14 13:49:16 10.0.252.10 iptable_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack
Jul 14 13:49:16 10.0.252.10 ip_tables
Jul 14 13:49:16 10.0.252.10 x_tables
Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
Jul 14 13:49:16 10.0.252.10 ip_set
Jul 14 13:49:16 10.0.252.10 nfnetlink
Jul 14 13:49:16 10.0.252.10 8021q
Jul 14 13:49:16 10.0.252.10 garp
Jul 14 13:49:16 10.0.252.10 mrp
Jul 14 13:49:16 10.0.252.10 stp
Jul 14 13:49:16 10.0.252.10 llc
Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm:
accel-pppd Not tainted 4.1.0-build-0074 #7
Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant
DL320e Gen8 v2, BIOS P80 04/02/2015
Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti:
8800b09f4000 task.ti: 8800b09f4000
Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 
0010:[]

Jul 14 13:49:16 10.0.252.10 []
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28
EFLAGS: 00010202
Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX:  RBX:
88032a214400 RCX: 
Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI:
fe01 RDI: 8180d6da
Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: 8800b09f7e68 R08:
 R09: 
Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: 88031ef6a110 R11:
0293 R12: 88030f8d8fc0
Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: 88030f8d8ff0 R14:
88033115ee40 R15: 8803394e4920
Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  7f79b602c700()
GS:88034746() knlGS:
Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS:  ES: 
CR0: 80050033
Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 03f0 CR3:
000335425000 CR4: 001407e0
Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
Jul 14 13:49:16 10.0.252.10 [76078.876434]  88033ac45c80
Jul 14 13:49:16 10.0.252.10 
Jul 14 13:49:16 10.0.252.10 0001
Jul 14 13:49:16 10.0.252.10 88030f8d8fc0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877001]  a0120260
Jul 14 13:49:16 10.0.252.10 88030f8d8ff0
Jul 14 13:49:16 10.0.252.10 88033115ee40
Jul 14 13:49:16 10.0.252.10 8803394e4920
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877564]  8800b0

4.1.0, kernel panic, pppoe_release

2015-07-14 Thread Denys Fedoryshchenko
Here is panic message from netconsole. Please let me know if any 
additional information required.


Jul 14 13:49:16 10.0.252.10 [76078.867822] BUG: unable to handle kernel
Jul 14 13:49:16 10.0.252.10 NULL pointer dereference
Jul 14 13:49:16 10.0.252.10 at 03f0
Jul 14 13:49:16 10.0.252.10 [76078.868280] IP:
Jul 14 13:49:16 10.0.252.10 [] 
pppoe_release+0x56/0x142 [pppoe]

Jul 14 13:49:16 10.0.252.10 [76078.868541] PGD 336e4a067
Jul 14 13:49:16 10.0.252.10 PUD 333f17067
Jul 14 13:49:16 10.0.252.10 PMD 0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.868918] Oops:  [#1]
Jul 14 13:49:16 10.0.252.10 SMP
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.869226] Modules linked in:
Jul 14 13:49:16 10.0.252.10 netconsole
Jul 14 13:49:16 10.0.252.10 configfs
Jul 14 13:49:16 10.0.252.10 coretemp
Jul 14 13:49:16 10.0.252.10 sch_fq
Jul 14 13:49:16 10.0.252.10 cls_fw
Jul 14 13:49:16 10.0.252.10 act_police
Jul 14 13:49:16 10.0.252.10 cls_u32
Jul 14 13:49:16 10.0.252.10 sch_ingress
Jul 14 13:49:16 10.0.252.10 sch_sfq
Jul 14 13:49:16 10.0.252.10 sch_htb
Jul 14 13:49:16 10.0.252.10 pppoe
Jul 14 13:49:16 10.0.252.10 pppox
Jul 14 13:49:16 10.0.252.10 ppp_generic
Jul 14 13:49:16 10.0.252.10 slhc
Jul 14 13:49:16 10.0.252.10 nf_nat_pptp
Jul 14 13:49:16 10.0.252.10 nf_nat_proto_gre
Jul 14 13:49:16 10.0.252.10 nf_conntrack_pptp
Jul 14 13:49:16 10.0.252.10 nf_conntrack_proto_gre
Jul 14 13:49:16 10.0.252.10 tun
Jul 14 13:49:16 10.0.252.10 xt_REDIRECT
Jul 14 13:49:16 10.0.252.10 nf_nat_redirect
Jul 14 13:49:16 10.0.252.10 xt_set
Jul 14 13:49:16 10.0.252.10 xt_TCPMSS
Jul 14 13:49:16 10.0.252.10 ipt_REJECT
Jul 14 13:49:16 10.0.252.10 nf_reject_ipv4
Jul 14 13:49:16 10.0.252.10 ts_bm
Jul 14 13:49:16 10.0.252.10 xt_string
Jul 14 13:49:16 10.0.252.10 xt_connmark
Jul 14 13:49:16 10.0.252.10 xt_DSCP
Jul 14 13:49:16 10.0.252.10 xt_mark
Jul 14 13:49:16 10.0.252.10 xt_tcpudp
Jul 14 13:49:16 10.0.252.10 iptable_mangle
Jul 14 13:49:16 10.0.252.10 iptable_filter
Jul 14 13:49:16 10.0.252.10 iptable_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack_ipv4
Jul 14 13:49:16 10.0.252.10 nf_defrag_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat_ipv4
Jul 14 13:49:16 10.0.252.10 nf_nat
Jul 14 13:49:16 10.0.252.10 nf_conntrack
Jul 14 13:49:16 10.0.252.10 ip_tables
Jul 14 13:49:16 10.0.252.10 x_tables
Jul 14 13:49:16 10.0.252.10 ip_set_hash_ip
Jul 14 13:49:16 10.0.252.10 ip_set
Jul 14 13:49:16 10.0.252.10 nfnetlink
Jul 14 13:49:16 10.0.252.10 8021q
Jul 14 13:49:16 10.0.252.10 garp
Jul 14 13:49:16 10.0.252.10 mrp
Jul 14 13:49:16 10.0.252.10 stp
Jul 14 13:49:16 10.0.252.10 llc
Jul 14 13:49:16 10.0.252.10 [last unloaded: netconsole]
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.873195] CPU: 3 PID: 2940 Comm: 
accel-pppd Not tainted 4.1.0-build-0074 #7
Jul 14 13:49:16 10.0.252.10 [76078.873396] Hardware name: HP ProLiant 
DL320e Gen8 v2, BIOS P80 04/02/2015
Jul 14 13:49:16 10.0.252.10 [76078.873598] task: 8800b1886ba0 ti: 
8800b09f4000 task.ti: 8800b09f4000
Jul 14 13:49:16 10.0.252.10 [76078.873929] RIP: 
0010:[]
Jul 14 13:49:16 10.0.252.10 [] 
pppoe_release+0x56/0x142 [pppoe]
Jul 14 13:49:16 10.0.252.10 [76078.874317] RSP: 0018:8800b09f7e28  
EFLAGS: 00010202
Jul 14 13:49:16 10.0.252.10 [76078.874512] RAX:  RBX: 
88032a214400 RCX: 
Jul 14 13:49:16 10.0.252.10 [76078.874709] RDX: 000d RSI: 
fe01 RDI: 8180d6da
Jul 14 13:49:16 10.0.252.10 [76078.874906] RBP: 8800b09f7e68 R08: 
 R09: 
Jul 14 13:49:16 10.0.252.10 [76078.875102] R10: 88031ef6a110 R11: 
0293 R12: 88030f8d8fc0
Jul 14 13:49:16 10.0.252.10 [76078.875299] R13: 88030f8d8ff0 R14: 
88033115ee40 R15: 8803394e4920
Jul 14 13:49:16 10.0.252.10 [76078.875499] FS:  7f79b602c700() 
GS:88034746() knlGS:
Jul 14 13:49:16 10.0.252.10 [76078.875837] CS:  0010 DS:  ES:  
CR0: 80050033
Jul 14 13:49:16 10.0.252.10 [76078.876036] CR2: 03f0 CR3: 
000335425000 CR4: 001407e0

Jul 14 13:49:16 10.0.252.10 [76078.876239] Stack:
Jul 14 13:49:16 10.0.252.10 [76078.876434]  88033ac45c80
Jul 14 13:49:16 10.0.252.10 
Jul 14 13:49:16 10.0.252.10 0001
Jul 14 13:49:16 10.0.252.10 88030f8d8fc0
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877001]  a0120260
Jul 14 13:49:16 10.0.252.10 88030f8d8ff0
Jul 14 13:49:16 10.0.252.10 88033115ee40
Jul 14 13:49:16 10.0.252.10 8803394e4920
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.877564]  8800b09f7e88
Jul 14 13:49:16 10.0.252.10 81809e2e
Jul 14 13:49:16 10.0.252.10 88031ef6a100
Jul 14 13:49:16 10.0.252.10 0008
Jul 14 13:49:16 10.0.252.10
Jul 14 13:49:16 10.0.252.10 [76078.878128] Call Trace:
Jul 14 13:49:16 10.0.252.10 [76078.878327]  [] 
sock_release+0x1a/0x78
Jul 14 13:49:16 10.0.252.10 [76078.878528]  [] 
s

Re: circular locking, mirred, 2.6.24.2

2008-02-25 Thread Denys Fedoryshchenko
What does it mean early?
I have custom boot scripts, it is also custom system based on busybox. There 
is a chance that i forgot to bring ifb0 up, but thats it.
I think such warning must not appear on any actions in userspace.

On Mon, 25 Feb 2008 09:56:46 +, Jarek Poplawski wrote
> On 24-02-2008 23:20, Denys Fedoryshchenko wrote:
> > 2.6.24.2 with applied patches for printk,softlockup, and patch for htb 
(as i 
> > understand, they are in 2.6.25 git and it is fixes).
> > 
> > I will send also to private mails QoS rules i am using.
> > 
> > [  118.840072] ===
> > [  118.840158] [ INFO: possible circular locking dependency detected ]
> > [  118.840203] 2.6.24.2-build-0022 #7
> > [  118.840243] ---
> > [  118.840288] swapper/0 is trying to acquire lock:
> > [  118.840329]  (&dev->queue_lock){-+..}, at: [] dev_queue_xmit
> > +0x177/0x302
> > [  118.840490]
> > [  118.840490] but task is already holding lock:
> > [  118.840567]  (&p->tcfc_lock){-+..}, at: [] tcf_mirred
+0x20/0x180 
> > [act_mirred]
> > [  118.840727]
> > [  118.840727] which lock already depends on the new lock.
> > [  118.840728]
> > [  118.840842]
> > [  118.840842] the existing dependency chain (in reverse order) is:
> > [  118.840921]
> > [  118.840921] -> #2 (&p->tcfc_lock){-+..}:
> > [  118.841075][] __lock_acquire+0xa30/0xc19
> > [  118.841324][] lock_acquire+0x7a/0x94
> > [  118.841572][] _spin_lock+0x2e/0x58
> > [  118.841820][] tcf_mirred+0x20/0x180 [act_mirred]
> > [  118.842068][] tcf_action_exec+0x44/0x77
> > [  118.842344][] u32_classify+0x119/0x24a [cls_u32]
> > [  118.842595][] tc_classify_compat+0x2f/0x5e
> > [  118.842845][] tc_classify+0x1a/0x80
> > [  118.843092][] ingress_enqueue+0x1a/0x53 [sch_ingress]
> > [  118.843343][] netif_receive_skb+0x296/0x44c
> > [  118.843592][] e100_poll+0x14b/0x26a [e100]
> > [  118.843843][] net_rx_action+0xbf/0x201
> > [  118.844091][] __do_softirq+0x6f/0xe9
> > [  118.844343][] do_softirq+0x61/0xc8
> > [  118.844591][] 0x
> > [  118.844840]
> > [  118.844840] -> #1 (&dev->ingress_lock){-+..}:
> > [  118.844993][] __lock_acquire+0xa30/0xc19
> > [  118.845242][] lock_acquire+0x7a/0x94
> > [  118.845489][] _spin_lock+0x2e/0x58
> > [  118.845737][] qdisc_lock_tree+0x1e/0x21
> > [  118.845984][] dev_init_scheduler+0xb/0x53
> > [  118.846235][] register_netdevice+0x2a3/0x2fd
> > [  118.846483][] register_netdev+0x32/0x3f
> > [  118.846730][] loopback_net_init+0x39/0x6c
> > [  118.846980][] register_pernet_operations+0x13/0x15
> > [  118.847230][] register_pernet_device+0x1f/0x4c
> > [  118.847478][] loopback_init+0xd/0xf
> > [  118.847725][] kernel_init+0x155/0x2c6
> 
> This looks strange: are you sure your tc scripts aren't started too
> early? (Or maybe there are some problems during booting?)
> 
> Regards,
> Jarek P.
> 
> > [  118.847973][] kernel_thread_helper+0x7/0x10
> > [  118.848225][] 0x
> > [  118.848472]
> > [  118.848472] -> #0 (&dev->queue_lock){-+..}:
> > [  118.848626][] __lock_acquire+0x920/0xc19
> > [  118.848874][] lock_acquire+0x7a/0x94
> > [  118.849122][] _spin_lock+0x2e/0x58
> > [  118.849370][] dev_queue_xmit+0x177/0x302
> > [  118.849617][] tcf_mirred+0x15f/0x180 [act_mirred]
> > [  118.849866][] tcf_action_exec+0x44/0x77
> > [  118.850114][] u32_classify+0x119/0x24a [cls_u32]
> > [  118.850366][] tc_classify_compat+0x2f/0x5e
> > [  118.850614][] tc_classify+0x1a/0x80
> > [  118.850861][] ingress_enqueue+0x1a/0x53 [sch_ingress]
> > [  118.85][] netif_receive_skb+0x296/0x44c
> > [  118.851360][] e100_poll+0x14b/0x26a [e100]
> > [  118.851612][] net_rx_action+0xbf/0x201
> > [  118.851859][] __do_softirq+0x6f/0xe9
> > [  118.852106][] do_softirq+0x61/0xc8
> > [  118.852355][] 0x
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


circular locking, mirred, 2.6.24.2

2008-02-24 Thread Denys Fedoryshchenko
2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i 
understand, they are in 2.6.25 git and it is fixes).

I will send also to private mails QoS rules i am using.

[  118.840072] ===
[  118.840158] [ INFO: possible circular locking dependency detected ]
[  118.840203] 2.6.24.2-build-0022 #7
[  118.840243] ---
[  118.840288] swapper/0 is trying to acquire lock:
[  118.840329]  (&dev->queue_lock){-+..}, at: [] dev_queue_xmit
+0x177/0x302
[  118.840490]
[  118.840490] but task is already holding lock:
[  118.840567]  (&p->tcfc_lock){-+..}, at: [] tcf_mirred+0x20/0x180 
[act_mirred]
[  118.840727]
[  118.840727] which lock already depends on the new lock.
[  118.840728]
[  118.840842]
[  118.840842] the existing dependency chain (in reverse order) is:
[  118.840921]
[  118.840921] -> #2 (&p->tcfc_lock){-+..}:
[  118.841075][] __lock_acquire+0xa30/0xc19
[  118.841324][] lock_acquire+0x7a/0x94
[  118.841572][] _spin_lock+0x2e/0x58
[  118.841820][] tcf_mirred+0x20/0x180 [act_mirred]
[  118.842068][] tcf_action_exec+0x44/0x77
[  118.842344][] u32_classify+0x119/0x24a [cls_u32]
[  118.842595][] tc_classify_compat+0x2f/0x5e
[  118.842845][] tc_classify+0x1a/0x80
[  118.843092][] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.843343][] netif_receive_skb+0x296/0x44c
[  118.843592][] e100_poll+0x14b/0x26a [e100]
[  118.843843][] net_rx_action+0xbf/0x201
[  118.844091][] __do_softirq+0x6f/0xe9
[  118.844343][] do_softirq+0x61/0xc8
[  118.844591][] 0x
[  118.844840]
[  118.844840] -> #1 (&dev->ingress_lock){-+..}:
[  118.844993][] __lock_acquire+0xa30/0xc19
[  118.845242][] lock_acquire+0x7a/0x94
[  118.845489][] _spin_lock+0x2e/0x58
[  118.845737][] qdisc_lock_tree+0x1e/0x21
[  118.845984][] dev_init_scheduler+0xb/0x53
[  118.846235][] register_netdevice+0x2a3/0x2fd
[  118.846483][] register_netdev+0x32/0x3f
[  118.846730][] loopback_net_init+0x39/0x6c
[  118.846980][] register_pernet_operations+0x13/0x15
[  118.847230][] register_pernet_device+0x1f/0x4c
[  118.847478][] loopback_init+0xd/0xf
[  118.847725][] kernel_init+0x155/0x2c6
[  118.847973][] kernel_thread_helper+0x7/0x10
[  118.848225][] 0x
[  118.848472]
[  118.848472] -> #0 (&dev->queue_lock){-+..}:
[  118.848626][] __lock_acquire+0x920/0xc19
[  118.848874][] lock_acquire+0x7a/0x94
[  118.849122][] _spin_lock+0x2e/0x58
[  118.849370][] dev_queue_xmit+0x177/0x302
[  118.849617][] tcf_mirred+0x15f/0x180 [act_mirred]
[  118.849866][] tcf_action_exec+0x44/0x77
[  118.850114][] u32_classify+0x119/0x24a [cls_u32]
[  118.850366][] tc_classify_compat+0x2f/0x5e
[  118.850614][] tc_classify+0x1a/0x80
[  118.850861][] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.85][] netif_receive_skb+0x296/0x44c
[  118.851360][] e100_poll+0x14b/0x26a [e100]
[  118.851612][] net_rx_action+0xbf/0x201
[  118.851859][] __do_softirq+0x6f/0xe9
[  118.852106][] do_softirq+0x61/0xc8
[  118.852355][] 0x
[  118.852602]
[  118.852602] other info that might help us debug this:
[  118.852603]
[  118.852716] 5 locks held by swapper/0:
[  118.852756]  #0:  (rcu_read_lock){..--}, at: [] net_rx_action
+0x50/0x201
[  118.852940]  #1:  (rcu_read_lock){..--}, at: [] netif_receive_skb
+0xf6/0x44c
[  118.853123]  #2:  (&dev->ingress_lock){-+..}, at: [] 
netif_receive_skb+0x282/0x44c
[  118.853309]  #3:  (&p->tcfc_lock){-+..}, at: [] tcf_mirred
+0x20/0x180 [act_mirred]
[  118.853493]  #4:  (rcu_read_lock){..--}, at: [] dev_queue_xmit
+0x11d/0x302
[  118.853677]
[  118.853677] stack backtrace:
[  118.853753] Pid: 0, comm: swapper Not tainted 2.6.24.2-build-0022 #7
[  118.853796]  [] show_trace_log_lvl+0x1a/0x2f
[  118.853865]  [] show_trace+0x12/0x14
[  118.853932]  [] dump_stack+0x6c/0x72
[  118.853999]  [] print_circular_bug_tail+0x5f/0x68
[  118.854068]  [] __lock_acquire+0x920/0xc19
[  118.854135]  [] lock_acquire+0x7a/0x94
[  118.854205]  [] _spin_lock+0x2e/0x58
[  118.854272]  [] dev_queue_xmit+0x177/0x302
[  118.854340]  [] tcf_mirred+0x15f/0x180 [act_mirred]
[  118.854409]  [] tcf_action_exec+0x44/0x77
[  118.854477]  [] u32_classify+0x119/0x24a [cls_u32]
[  118.854547]  [] tc_classify_compat+0x2f/0x5e
[  118.854615]  [] tc_classify+0x1a/0x80
[  118.854682]  [] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.854752]  [] netif_receive_skb+0x296/0x44c
[  118.854820]  [] e100_poll+0x14b/0x26a [e100]
[  118.854890]  [] net_rx_action+0xbf/0x201
[  118.854958]  [] __do_softirq+0x6f/0xe9
[  118.855025]  [] do_softirq+0x61/0xc8


--
Denys 

Re: RESEND, HTB(?) softlockup, vanilla 2.6.24

2008-02-16 Thread Denys Fedoryshchenko
Server is fully redundant now, so i apply patches (but i apply both, probably 
it will make system more reliable somehow) and i enable required debug 
options in kernel. So i will try to catch this bug few more times, probably 
if it will generate more detailed info over netconsole it will be useful.

Is there any project to dump console messages/kernel dump to disk? For 
example such issues related to networking, and i guess netconsole doesn't 
always work, especially when network driver is crashed, but tech's on 
location told there is some messages running non-stop on the screen. Probably 
some generic code writing such data over x86 INT 13 (or even kernel dump?) to 
separate partition will be useful to debug this problem. I know there is some 
3rd party patches(for example LKCD), but i prefer to not apply them to not 
add more bugs.

I notice some code in MTD(CONFIG_MTD_OOPS), but i am not sure it is correct 
and will work if i will setup MTD emulation for block device.
That just idea.

On Sat, 16 Feb 2008 21:45:19 +0100, Jarek Poplawski wrote
> On Sat, Feb 16, 2008 at 12:25:31PM +0200, Denys Fedoryshchenko wrote:
> > Thanks, i will try it.
> > You think lockdep can be buggy?
> 
> Just like every code... But the main reason is it has quite 
> meaningful overhead, so could be right "in production" only after 
> lockups happen. But if it doesn't report anything anyway...
> 
> Your report shows there are quite long paths of calls during softirqs
> with some actions (ipt + mirred here?) and qdiscs, so if I'm not 
> wrong with this stack problem, this would need some optimization. 
> And, of course, there could be some additional bugs involved around too:
> otherwise it seems this should happen more often. But I don't expect
> you would try to debug this on your servers, so I hope, it simply 
> will be found BTW some day...
> 
> Regards,
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RESEND, HTB(?) softlockup, vanilla 2.6.24

2008-02-16 Thread Denys Fedoryshchenko
Thanks, i will try it.
You think lockdep can be buggy?

On Sat, 16 Feb 2008 09:00:36 +0100, Jarek Poplawski wrote
> Denys Fedoryshchenko wrote, On 02/13/2008 09:13 AM:
> 
> > It is very difficult to reproduce, happened after running about 1month. 
No 
> > changes done in classes at time of crash.
> > 
> > Kernel 2.6.24 vanilla
> 
> Hi,
> 
> I could be wrong, but IMHO this looks like stack was overridden here,
> so my proposal is to try this:
> 
> CONFIG_DEBUG_STACKOVERFLOW=y
> 
> But, if you're not very interested in reproducing this, you could 
> also try to turn off some other debugging, especially lockdep.
> 
> Regards,
> Jarek P.
> 
> 
> 
> > Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP
> > Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers:
> 
> 
> 
> > Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted 
> > (2.6.24-build-0021 #26)
> > Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[] EFLAGS: 
0082 
> > CPU: 1
> > Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50
> > Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: 
> > f76494a4 EDX: c1ff5f80
> > Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: 
> >  ESP: f7c29c70
> > Feb 10 15:53:22 SHAPER [ 8271.779406]  DS: 007b ES: 007b FS: 00d8 GS: 
 
> > SS: 0068
> > Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, 
ti=f7c28000 
> > task=f7c20a60 task.ti=f7c28000)
> > Feb 10 15:53:22 SHAPER
> > Feb 10 15:53:22 SHAPER [ 8271.779446] Stack:
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER c01f0ef4
> > Feb 10 15:53:22 SHAPER c1ff5f80
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER f76494a8
> > Feb 10 15:53:22 SHAPER c1ff5f78
> > Feb 10 15:53:22 SHAPER
> > Feb 10 15:53:22 SHAPER [ 8271.779493]
> > Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted 
> > (2.6.24-build-0021 #26)
> > Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[] EFLAGS: 
0082 
> > CPU: 1
> > Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50
> > Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: 
> > f76494a4 EDX: c1ff5f80
> > Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: 
> >  ESP: f7c29c70
> > Feb 10 15:53:22 SHAPER [ 8271.779406]  DS: 007b ES: 007b FS: 00d8 GS: 
 
> > SS: 0068
> > Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, 
ti=f7c28000 
> > task=f7c20a60 task.ti=f7c28000)
> > Feb 10 15:53:22 SHAPER
> > Feb 10 15:53:22 SHAPER [ 8271.779446] Stack:
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER f76494a4
> > Feb 10 15:53:22 SHAPER f76494a4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG/ spinlock lockup, 2.6.24

2008-02-15 Thread Denys Fedoryshchenko
This server was working fine under load under FreeBSD, and worked fine before 
with other tasks under Linux. I dont think it is RAM.
Additionally it is server hardware (Dell PowerEdge) with ECC, MCE and other 
layers, who will report about any hardware issue most probably, and i think 
even better than memtest. 
Additionally it is very difficult to run test on it, cause it is in another 
country, and i have limited access to it (i dont have network KVM).

I have similar crashes on completely different hardware with same job (QOS), 
so i think it is actually some nasty bug in networking.


On Fri, 15 Feb 2008 16:24:56 +0100, Bart Van Assche wrote
> 2008/2/15 Denys Fedoryshchenko <[EMAIL PROTECTED]>:
> >  I have random crashes, at least once per week. It is very difficult to 
catch
> >  error message, and only recently i setup netconsole. Now i got crash, but
> >  there is no traceback and only single line came over netconsole, 
mentioned
> >  before.
> 
> Did you already run memtest ? You can run memtest by booting from the
> Knoppix CD-ROM or DVD. Most Linux distributions also have included
> memtest on their bootable distribution CD's/DVD's.
> 
> Bart Van Assche.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG/ spinlock lockup, 2.6.24

2008-02-15 Thread Denys Fedoryshchenko
 : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips: 6383.76
clflush size: 64


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HTB(?) softlockup, vanilla 2.6.24

2008-02-10 Thread Denys Fedoryshchenko
 03
Feb 10 15:53:22 SHAPER 5b
Feb 10 15:53:22 SHAPER 5e
Feb 10 15:53:22 SHAPER 5f
Feb 10 15:53:22 SHAPER c3
Feb 10 15:53:22 SHAPER 57
Feb 10 15:53:22 SHAPER 89
Feb 10 15:53:22 SHAPER d7
Feb 10 15:53:22 SHAPER 56
Feb 10 15:53:22 SHAPER 53
Feb 10 15:53:22 SHAPER
Feb 10 15:53:22 SHAPER c3
Feb 10 15:53:22 SHAPER 8b
Feb 10 15:53:22 SHAPER 50
Feb 10 15:53:22 SHAPER 08
Feb 10 15:53:22 SHAPER 8b
Feb 10 15:53:22 SHAPER 30
Feb 10 15:53:22 SHAPER 8b
Feb 10 15:53:22 SHAPER 4a
Feb 10 15:53:22 SHAPER 04
Feb 10 15:53:22 SHAPER 83
Feb 10 15:53:22 SHAPER e6
Feb 10 15:53:22 SHAPER fc
Feb 10 15:53:22 SHAPER 85
Feb 10 15:53:22 SHAPER c9
Feb 10 15:53:22 SHAPER 89
Feb 10 15:53:22 SHAPER 48
Feb 10 15:53:22 SHAPER 08
Feb 10 15:53:22 SHAPER 74
Feb 10 15:53:22 SHAPER 09
Feb 10 15:53:22 SHAPER 8b
Feb 10 15:53:22 SHAPER


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel panic on 2.6.24 with esfq patch applied

2008-02-01 Thread Denys Fedoryshchenko
 __remove_hrtimer+0x5d/0x64
Feb  1 09:08:50 SERVER [12380.067861]  []
Feb  1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a
Feb  1 09:08:50 SERVER [12380.067883]  []
Feb  1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80
Feb  1 09:08:50 SERVER [12380.067905]  []
Feb  1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb  1 09:08:50 SERVER [12380.067928]  []
Feb  1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27
Feb  1 09:08:50 SERVER [12380.067949]  []
Feb  1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f
Feb  1 09:08:50 SERVER [12380.067970]  []
Feb  1 09:08:50 SERVER hrtimer_start+0x16/0xf4
Feb  1 09:08:50 SERVER [12380.067991]  []
Feb  1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21
Feb  1 09:08:50 SERVER [12380.068013]  []
Feb  1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb]
Feb  1 09:08:50 SERVER [12380.068036]  []
Feb  1 09:08:50 SERVER ip_rcv+0x1fc/0x237
Feb  1 09:08:50 SERVER [12380.068057]  []
Feb  1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb  1 09:08:50 SERVER [12380.068078]  []
Feb  1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb  1 09:08:50 SERVER [12380.068099]  []
Feb  1 09:08:50 SERVER getnstimeofday+0x2b/0xb5
Feb  1 09:08:50 SERVER [12380.068118]  []
Feb  1 09:08:50 SERVER clockevents_program_event+0xe0/0xee
Feb  1 09:08:50 SERVER [12380.068140]  []
Feb  1 09:08:50 SERVER __qdisc_run+0x2a/0x163
Feb  1 09:08:50 SERVER [12380.068161]  []
Feb  1 09:08:50 SERVER net_tx_action+0xa8/0xcc
Feb  1 09:08:50 SERVER [12380.068180]  []
Feb  1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b
Feb  1 09:08:50 SERVER [12380.068199]  []
Feb  1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b
Feb  1 09:08:50 SERVER [12380.068218]  []
Feb  1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96
Feb  1 09:08:50 SERVER [12380.068241]  []
Feb  1 09:08:50 SERVER __do_softirq+0x5d/0xc1
Feb  1 09:08:50 SERVER [12380.068260]  []
Feb  1 09:08:50 SERVER do_softirq+0x32/0x36
Feb  1 09:08:50 SERVER [12380.068279]  []
Feb  1 09:08:50 SERVER irq_exit+0x38/0x6b
Feb  1 09:08:50 SERVER [12380.068298]  []
Feb  1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80
Feb  1 09:08:50 SERVER [12380.068319]  []
Feb  1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb  1 09:08:50 SERVER [12380.068343]  []
Feb  1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40
Feb  1 09:08:50 SERVER [12380.068365]  []
Feb  1 09:08:50 SERVER mwait_idle+0x0/0xa
Feb  1 09:08:50 SERVER [12380.068384]  []
Feb  1 09:08:50 SERVER cpu_idle+0x98/0xb9
Feb  1 09:08:50 SERVER [12380.068403]  []
Feb  1 09:08:50 SERVER start_kernel+0x2d7/0x2df
Feb  1 09:08:50 SERVER [12380.068422]  []
Feb  1 09:08:50 SERVER unknown_bootoption+0x0/0x195
Feb  1 09:08:50 SERVER [12380.068444]  ===
Feb  1 09:08:50 SERVER [12380.068460] Code:
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 08
Feb  1 09:08:50 SERVER 39
Feb  1 09:08:50 SERVER d9
Feb  1 09:08:50 SERVER 0f
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 04
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER a8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 75
Feb  1 09:08:50 SERVER 14
Feb  1 09:08:50 SERVER 83
Feb  1 09:08:50 SERVER c8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER ea
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER f0
Feb  1 09:08:50 SERVER 83
Feb  1 09:08:50 SERVER 26
Feb  1 09:08:50 SERVER fe
Feb  1 09:08:50 SERVER e8
Feb  1 09:08:50 SERVER 1e
Feb  1 09:08:50 SERVER fd
Feb  1 09:08:50 SERVER ff
Feb  1 09:08:50 SERVER ff
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 04
Feb  1 07:08:49 SERVER unparseable log message: "<8b> "
Feb  1 09:08:50 SERVER 59
Feb  1 09:08:50 SERVER 08
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER db
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 06
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 03
Feb  1 09:08:50 SERVER a8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 15
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 41
Feb  1 09:08:50 SERVER 04
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER c0
Feb  1 09:08:50 SERVER 0f
Feb  1 09:08:50 SERVER 84
Feb  1 09:08:50 SERVER c6
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.068753] EIP: []
Feb  1 09:08:50 SERVER rb_erase+0x110/0x22f
Feb  1 09:08:50 SERVER SS:ESP 0068:c037fda8
Feb  1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal 
exception in interrupt


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pppoe, /proc/net/pppoe wrong (extra entries)

2008-01-29 Thread Denys Fedoryshchenko
Hi again

I notice strange thing, with /proc/net/pppoe, not sure if it is bug, but for 
me it looks wrong.

cat /proc/net/pppoe
there is normal entries of users, but at the end i have 

0D00 00:16:D3:0B:F9:34 eth1
4000 00:50:22:00:1C:FC eth1
7E00 00:03:47:BD:34:25 eth1
7E00 00:03:47:BD:34:25 eth1
7E00 00:03:47:BD:34:25 eth1
7E00 00:03:47:BD:34:25 eth1
7E00 00:03:47:BD:34:25 eth1

and last entry duplicates till end.

i have script to get customers interfaces, so i am using it to calculate 
amount of users logged in

defaulthost ~ #cat /proc/net/pppoe |grep -i '00:03:47:BD:34:25'|wc -l
40
defaulthost ~ #cat /proc/net/pppoe |wc -l
113
defaulthost ~ #pppctrl |wc -l
73

It means there is 40 extra entries. 00:03:47:BD:34:25 host have established, 
but only one session. I am seeing similar issue on all remaining pppoe 
servers, extra entries with same mac at the end.

If you need more info or access, please let me know.

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING, tcp_fastretrans_alert, rc6-git11

2008-01-22 Thread Denys Fedoryshchenko
Just got on one of proxies, under high load.
It is a bit old rc, so probably my report not interesting, but since it is 
production machines, i cannot change too often.
Kernel is 2.6.24-rc6-git11
Some sysctl adjustments done. Please tell me if need more information.


There is rules in iptables (if it is interesting)
Chain PREROUTING (policy ACCEPT 209M packets, 19G bytes)
 pkts bytes target prot opt in out source   
destination
0 0 DROP   tcp  --  eth+   *   0.0.0.0/00.0.0.0/
0   tcp dpt:1

Chain POSTROUTING (policy ACCEPT 120M packets, 7408M bytes)
 pkts bytes target prot opt in out source   
destination

Chain OUTPUT (policy ACCEPT 18240 packets, 22M bytes)
 pkts bytes target prot opt in out source   
destination
<< some local networks skipped, not important, similar ACCEPT as next >>
 200K  245M ACCEPT all  --  *  *   0.0.0.0/0
172.16.0.0/16
3930K  236M REDIRECT   tcp  --  *  eth00.0.0.0/00.0.0.0/
0   tcp flags:0x17/0x02 TOS match 0x04 redir ports 2
 112M 6720M REDIRECT   tcp  --  *  eth00.0.0.0/00.0.0.0/
0   tcp dpt:80 flags:0x17/0x02 redir ports 1
 116K 6953K REDIRECT   tcp  --  *  eth00.0.0.0/00.0.0.0/
0   OWNER UID match 101 tcp flags:0x17/0x02 redir ports 1


[9561199.893090] WARNING: at net/ipv4/tcp_input.c:2391 tcp_fastretrans_alert()
[9561199.893161] Pid: 32283, comm: squid Not tainted 2.6.24-rc6-git11-build-
0020 #9
[9561199.893277]  [] tcp_ack+0xd32/0x18cc
[9561199.893398]  [] ipt_do_table+0x416/0x474 [ip_tables]
[9561199.893479]  [] tcp_rcv_established+0xca/0x7ad
[9561199.893566]  [] tcp_v4_do_rcv+0x2b/0x330
[9561199.893636]  [] nf_ct_deliver_cached_events+0x3e/0x90 
[nf_conntrack]
[9561199.893759]  [] tcp_v4_rcv+0x7c4/0x80f
[9561199.893862]  [] ip_local_deliver_finish+0xd9/0x148
[9561199.893932]  [] ip_rcv_finish+0x2bb/0x2da
[9561199.894004]  [] ip_rcv+0x1fc/0x237
[9561199.894063]  [] ip_rcv_finish+0x0/0x2da
[9561199.894122]  [] ip_rcv+0x0/0x237
[9561199.894183]  [] netif_receive_skb+0x376/0x3e2
[9561199.894273]  [] e1000_clean_rx_irq+0x379/0x445 [e1000]
[9561199.894388]  [] e1000_clean_rx_irq+0x0/0x445 [e1000]
[9561199.894462]  [] e1000_clean+0x67/0x1f8 [e1000]
[9561199.894547]  [] net_rx_action+0x8d/0x17c
[9561199.894632]  [] __do_softirq+0x5d/0xc1
[9561199.894698]  [] do_softirq+0x32/0x36
[9561199.894755]  [] irq_exit+0x38/0x6b
[9561199.894813]  [] do_IRQ+0x5c/0x73
[9561199.894867]  [] sys_read+0x5f/0x67
[9561199.894936]  [] common_interrupt+0x23/0x28
[9561199.895040]  ===

    

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >