Nevermind, 4.18.17 is not enough to fix the crash. Currently testing
with 4.19.1.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1802480

Title:
  Crash when using IPsec VTI interfaces on 4.15 and 4.18.

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hey!

  After upgrading a few VPN to 4.15.0-38.41 (either Xenial or Bionic),
  we get random crashes. This also happens with the 4.18 in bionic-
  proposed. These crashes didn't happen with 4.4 from Xenial. Here is a
  stack trace:

  [   31.154360] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000038
  [   31.162233] PGD 0 P4D 0
  [   31.164786] Oops: 0000 [#1] SMP PTI
  [   31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic 
#12~18.04.1-Ubuntu
  [   31.176854] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 
1.0b 11/21/2016
  [   31.184980] RIP: 0010:vti_rcv_cb+0xb9/0x1a0 [ip_vti]
  [   31.189962] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 
8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe <f6> 40 38 
04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
  [   31.208916] RSP: 0018:ffffbc61832e3920 EFLAGS: 00010246
  [   31.214160] RAX: 0000000000000000 RBX: ffff9a3504964a00 RCX: 
0000000000000002
  [   31.221328] RDX: ffff9a351add4080 RSI: ffff9a351aa08000 RDI: 
ffff9a3504964a00
  [   31.228485] RBP: ffffbc61832e3940 R08: 0000000000000004 R09: 
ffffffffc0aa612b
  [   31.235643] R10: 0008f09b99881884 R11: 1884bd4e2d6b1fac R12: 
ffff9a3507b31900
  [   31.242803] R13: ffff9a3507b31000 R14: 0000000000000000 R15: 
ffff9a3504964a00
  [   31.249964] FS:  0000000000000000(0000) GS:ffff9a35bfd40000(0000) 
knlGS:0000000000000000
  [   31.258077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   31.263848] CR2: 0000000000000038 CR3: 000000041a40a003 CR4: 
00000000003606e0
  [   31.271004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [   31.278163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [   31.285320] Call Trace:
  [   31.287789]  xfrm4_rcv_cb+0x4a/0x70
  [   31.291297]  xfrm_input+0x58f/0x8f0
  [   31.294807]  vti_input+0xaa/0x110 [ip_vti]
  [   31.298926]  vti_rcv+0x33/0x3c [ip_vti]
  [   31.302783]  xfrm4_esp_rcv+0x39/0x50
  [   31.306375]  ip_local_deliver_finish+0x62/0x200
  [   31.310923]  ip_local_deliver+0xdf/0xf0
  [   31.314775]  ? ip_rcv_finish+0x420/0x420
  [   31.318718]  ip_rcv_finish+0x126/0x420
  [   31.322486]  ip_rcv+0x28f/0x360
  [   31.325655]  ? inet_del_offload+0x40/0x40
  [   31.329686]  __netif_receive_skb_core+0x48c/0xb70
  [   31.334413]  ? kmem_cache_alloc+0xb4/0x1d0
  [   31.338532]  ? __build_skb+0x2b/0xf0
  [   31.342128]  __netif_receive_skb+0x18/0x60
  [   31.346244]  ? __netif_receive_skb+0x18/0x60
  [   31.350536]  netif_receive_skb_internal+0x45/0xe0
  [   31.355263]  napi_gro_receive+0xc5/0xf0
  [   31.359141]  mlx5e_handle_rx_cqe+0x1b2/0x5d0 [mlx5_core]
  [   31.364476]  ? skb_release_all+0x24/0x30
  [   31.368430]  mlx5e_poll_rx_cq+0xd3/0x990 [mlx5_core]
  [   31.373432]  mlx5e_napi_poll+0x9b/0xc60 [mlx5_core]
  [   31.378333]  ? __switch_to_asm+0x34/0x70
  [   31.382270]  ? __switch_to_asm+0x40/0x70
  [   31.386214]  ? __switch_to_asm+0x34/0x70
  [   31.391056]  ? __switch_to_asm+0x40/0x70
  [   31.395905]  ? __switch_to_asm+0x34/0x70
  [   31.400743]  net_rx_action+0x140/0x3a0
  [   31.405379]  ? __switch_to+0xad/0x500
  [   31.409887]  __do_softirq+0xe4/0x2bb
  [   31.414448]  run_ksoftirqd+0x2b/0x40
  [   31.418862]  smpboot_thread_fn+0xfc/0x170
  [   31.423700]  kthread+0x121/0x140
  [   31.427701]  ? sort_range+0x30/0x30
  [   31.432040]  ? kthread_create_worker_on_cpu+0x70/0x70
  [   31.437816]  ret_from_fork+0x35/0x40
  [   31.442219] Modules linked in: esp6 authenc echainiv xfrm6_mode_tunnel 
xfrm4_mode_tunnel xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 
af_key xfrm_algo ip_vti ip_tunnel ip6_vti ip6_tunnel tunnel6 8021q garp mrp stp 
llc bonding ipt_REJECT nf_reject_ipv4 nfnetlink_log n
  fnetlink xt_NFLOG xt_hl xt_limit xt_nat xt_TCPMSS xt_HL xt_comment xt_tcpudp 
xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_connmark xt_mark iptable_mangle xt_CT 
nf_conntrack xt_addrtype iptable_raw bpfilter ipmi_ssif gpio_
  ich intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 
kvm_intel kvm irqbypass intel_cstate intel_rapl_perf input_leds joydev mei_me 
intel_pch_thermal ioatdma mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler 
acpi_pad mac_hid sch_fq_codel
  [   31.519488]  ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib
  _core raid1 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel ast pcbc ttm drm_kms_helper aesni_intel syscopyarea 
aes_x86_64 sysfillrect mxm_wmi crypto_simd sysimgblt cryptd glue_helper 
fb_sys_fops mlx5_core ixgbe igb mpt3sas drm ahci tls libahci i2c_algo_bit m
  lxfw raid_class dca devlink mdio scsi_transport_sas wmi
  [   31.578877] CR2: 0000000000000038
  [ 31.583249] ---[ end trace c4bada38847a0075 ]---

  Upgrading to mainline 4.18.17 seems to solve the issue. It's difficult
  to bissect as it doesn't happen often. 4.18.17 contains
  c473a489d4098969ffafda913e1ad71da31b1104 (xfrm: Fix NULL pointer
  dereference when skb_dst_force clears the dst_entry) but it doesn't
  match the stacktrace (stacktrace is input, patch is output and
  forward). There is also fdb06c787b34fd397f28f515105627307d615025
  (xfrm: Fix NULL pointer dereference when skb_dst_force clears the
  dst_entry) which is also in 4.17 and may better match the problem but
  I am unsure what it means to have several transformations (we use VTI
  interfaces, but other than that, we don't do anything fancy).

  Hardware is Mellanox ConnectX-4 Lx (no ESP offload).

  May I suggest upgrade 4.18 to 4.18.17 and to backport these two
  patches to Bionic 4.15?

  Thanks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802480/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to