[Kernel-packages] [Bug 1886668] Re: linux 4.15.0-109-generic network DoS regression vs -108

Thadeu Lima de Souza Cascardo Thu, 09 Jul 2020 14:46:27 -0700

https://launchpad.net/~cascardo/+archive/ubuntu/ppa/+sourcepub/11419106
/+listing-archive-extra


So, this package on my ppa is built for bionic, but should work on other
series too.

It has a service that will call a wrapper that will start the reproducer
and reboot. The reason for the reboot is because once we add a task to
net_prio cgroup, it will disable cgroup bpf and we can't call the
reproducer again. And the reproducer, though it can cause the refcount
to go below 0 every time, it won't always cause the exact crash from
this bug.

Once you want to disable the reproducer, you should add to the kernel
cmdline the parameter "systemd.mask=cgroup-bpf-net-prio-crash.service".
Then, you need to remove the package and can get your system back.

You may be running some service that will add a task to net_prio or
net_cls cgroup, thus preventing the reproducer to run at all (but not
stop it from rebooting your system over and over again). lxd comes to
mind here.

You may check that it's the case (before installing the reproducer) by looking 
at dmesg and searching for:
cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls 
activation

The following WARN is the demonstration that the refcount underflow has 
happened (though not the crash):
[   12.581125] ------------[ cut here ]------------
[   12.585021] percpu ref (cgroup_bpf_release_fn) <= 0 (-357) after switching 
to atomic
[   12.585092] WARNING: CPU: 2 PID: 665 at lib/percpu-refcount.c:160 
percpu_ref_switch_to_atomic_rcu+0x12e/0x140

The crash will cause a panic and likely prevent the system from
rebooting, showing you have reproduced the issue.

If you never see the WARN, the bug has been mitigated, though it can
still happen if we modify the reproducer slightly to also change
net_cls.classid.

Cascardo.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1886668

Title:
  linux 4.15.0-109-generic network DoS regression vs -108

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Eoan:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress

Bug description:
  [Impact]
  On systems using cgroups and sockets extensively, like docker, kubernetes, 
lxd, libvirt, a crash might happen when using linux 4.15.0-109-generic.

  [Fix]
  Revert the patch that disables sk_alloc cgroup refcounting when tasks are 
added to net_prio cgroup.

  [Test case]
  Test that such environments where the issue is reproduced survive some hours 
of uptime. A different bug was reproduced with a work-in-progress code and was 
not reproduced with the culprit reverted.

  [Regression potential]
  The reverted commit fix a memory leak on similar scenarios. But a leak is 
better than a crash. Two other bugs have been opened to track a real fix for 
this issue and the leak.

  ----------------------------------------------------------

  Reported from a user:

  Several of our infrastructure VMs recently started crashing (oops
  attached), after they upgraded to -109.  -108 appears to be stable.

  Analysing the crash, it appears to be a wild pointer access in a BPF
  filter, which makes this (probably) a network-traffic triggered crash.

  [  696.396831] general protection fault: 0000 [#1] SMP PTI
  [  696.396843] Modules linked in: iscsi_target_mod target_core_mod 
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user 
xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge nfsv3 cmac 
arc4 md4 rpcsec_gss_krb5 nfsv4 nls_utf8 cifs nfs aufs ccm fscache binfmt_misc 
overlay xfs libcrc32c intel_rapl crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd glue_helper 
cryptd input_leds joydev intel_rapl_perf serio_raw parport_pc parport mac_hid 
sch_fq_codel nfsd 8021q auth_rpcgss garp nfs_acl mrp lockd stp llc grace xenfs 
sunrpc xen_privcmd ip_tables x_tables autofs4 hid_generic usbhid hid psmouse 
i2c_piix4 pata_acpi floppy
  [  696.396966] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.15.0-109-generic 
#110-Ubuntu
  [  696.396979] Hardware name: Xen HVM domU, BIOS 4.7.6-1.26 12/03/2018
  [  696.396993] RIP: 0010:__cgroup_bpf_run_filter_skb+0xbb/0x1e0
  [  696.397005] RSP: 0018:ffff893fdcb83a70 EFLAGS: 00010292
  [  696.397015] RAX: 6d69546e6f697469 RBX: 0000000000000000 RCX: 
0000000000000014
  [  696.397028] RDX: 0000000000000000 RSI: ffff893fd0360000 RDI: 
ffff893fb5154800
  [  696.397041] RBP: ffff893fdcb83ad0 R08: 0000000000000001 R09: 
0000000000000000
  [  696.397058] R10: 0000000000000000 R11: 0000000000000003 R12: 
0000000000000014
  [  696.397075] R13: ffff893fb5154800 R14: 0000000000000020 R15: 
ffff893fc6ba4d00
  [  696.397091] FS:  0000000000000000(0000) GS:ffff893fdcb80000(0000) 
knlGS:0000000000000000
  [  696.397107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  696.397119] CR2: 000000c0001b4000 CR3: 00000006dce0a004 CR4: 
00000000003606e0
  [  696.397135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [  696.397152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [  696.397169] Call Trace:
  [  696.397175]  <IRQ>
  [  696.397183]  sk_filter_trim_cap+0xd0/0x1b0
  [  696.397191]  tcp_v4_rcv+0x8b7/0xa80
  [  696.397199]  ip_local_deliver_finish+0x66/0x210
  [  696.397208]  ip_local_deliver+0x7e/0xe0
  [  696.397215]  ? ip_rcv_finish+0x430/0x430
  [  696.397223]  ip_rcv_finish+0x129/0x430
  [  696.397230]  ip_rcv+0x296/0x360
  [  696.397238]  ? inet_del_offload+0x40/0x40
  [  696.397249]  __netif_receive_skb_core+0x432/0xb80
  [  696.397261]  ? skb_send_sock+0x50/0x50
  [  696.397271]  ? tcp4_gro_receive+0x137/0x1a0
  [  696.397280]  __netif_receive_skb+0x18/0x60
  [  696.397290]  ? __netif_receive_skb+0x18/0x60
  [  696.397300]  netif_receive_skb_internal+0x45/0xe0
  [  696.397309]  napi_gro_receive+0xc5/0xf0
  [  696.397317]  xennet_poll+0x9ca/0xbc0
  [  696.397325]  net_rx_action+0x140/0x3a0
  [  696.397334]  __do_softirq+0xe4/0x2d4
  [  696.397344]  irq_exit+0xc5/0xd0
  [  696.397352]  xen_evtchn_do_upcall+0x30/0x50
  [  696.397361]  xen_hvm_callback_vector+0x90/0xa0
  [  696.397371]  </IRQ>
  [  696.397378] RIP: 0010:native_safe_halt+0x12/0x20
  [  696.397390] RSP: 0018:ffff94c4862cbe80 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff0c
  [  696.397405] RAX: ffffffff8efc1800 RBX: 0000000000000006 RCX: 
0000000000000000
  [  696.397419] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [  696.397435] RBP: ffff94c4862cbe80 R08: 0000000000000002 R09: 
0000000000000001
  [  696.397449] R10: 0000000000100000 R11: 0000000000000397 R12: 
0000000000000006
  [  696.397462] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
  [  696.397479]  ? __sched_text_end+0x1/0x1
  [  696.397489]  default_idle+0x20/0x100
  [  696.397499]  arch_cpu_idle+0x15/0x20
  [  696.397507]  default_idle_call+0x23/0x30
  [  696.397515]  do_idle+0x172/0x1f0
  [  696.397522]  cpu_startup_entry+0x73/0x80
  [  696.397530]  start_secondary+0x1ab/0x200
  [  696.397538]  secondary_startup_64+0xa5/0xb0
  [  696.397545] Code: 89 5d b0 49 29 cc 45 01 a7 80 00 00 00 44 89 e1 48 29 c8 
48 89 4d a8 49 89 87 d8 00 00 00 89 d2 48 8d 84 d6 38 03 00 00 48 8b 00 <4c> 8b 
70 10 4c 8d 68 10 4d 85 f6 0f 84 f6 00 00 00 49 8d 47 30
  [  696.397584] RIP: __cgroup_bpf_run_filter_skb+0xbb/0x1e0 RSP: 
ffff893fdcb83a70
  [  696.397607] ---[ end trace ec5c84424d511a6f ]---
  [  696.397616] Kernel panic - not syncing: Fatal exception in interrupt
  [  696.397876] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)

  We've correlated some of the other crashes, and the ASCII was a bit of a
  red herring.  All the others are a NULL pointer deference in the same
  place, so the problem is likely OoB memory read (possibly
  use-after-free) of a piece of memory which is usually zero, but not always.

  It is actually the control VM's for our test farms which were impacted,
  one of which was reliably crashing every 5 minutes or so, and others on
  more sporadic intervals up to about a day.  In all cases, reverting to
  the -108 kernel has resolved the crashes.

  Unfortunately, attempts to repro this off our production environment
  with a packet trace aren't going quite so well.  We're still experimenting.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1886668/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1886668] Re: linux 4.15.0-109-generic network DoS regression vs -108

Reply via email to