Private bug reported:

Kernel panic with null pointer dereference, when RAID10 rebuild target
is disconnected during rebuild. It's sporadical issue.

Steps to reproduce:
1) Create raid10 with mdadm
2) Wait for resync to end
3) Add spare drive
4) Fail one of the member drive 
- Raid becomes degraded, rebuild to spare from step 3 starts.
5) disconnect the drive added in step 3 (rebuild target)

trace:
[ 1022.872118] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000f0
[ 1022.881072] IP: raid10d+0xaec/0x1430 [raid10]
[ 1022.886071] PGD 0 P4D 0 
[ 1022.889033] Oops: 0002 [#1] SMP PTI
[ 1022.893056] Modules linked in: xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge 
stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack snd_hda_codec_hdmi intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek 
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass 
snd_hda_core ipmi_ssif intel_cstate joydev snd_hwdep intel_rapl_perf input_leds 
snd_pcm snd_timer ioatdma dca lpc_ich snd soundcore shpchp ipmi_si ipmi_devintf 
ipmi_msghandler tpm_crb acpi_pad acpi_power_meter mac_hid sch_fq_codel ib_iser 
rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc
[ 1022.973751]  ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log nouveau 
hid_generic mxm_wmi mgag200 video i2c_algo_bit usbhid ttm e1000e i40e 
crct10dif_pclmul hid ptp crc32_pclmul ghash_clmulni_intel pcbc drm_kms_helper 
aesni_intel syscopyarea aes_x86_64 raid1 sysfillrect crypto_simd sysimgblt 
glue_helper uas fb_sys_fops cryptd ahci vmd drm usb_storage pps_core libahci wmi
[ 1023.026580] CPU: 90 PID: 6373 Comm: md126_raid10 Not tainted 
4.15.0-10-generic #11-Ubuntu
[ 1023.035831] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS 
PLYDTRL1.86B.0151.R03.1801050249 01/05/2018
[ 1023.046913] RIP: 0010:raid10d+0xaec/0x1430 [raid10]
[ 1023.052479] RSP: 0018:ffffb5178747bd70 EFLAGS: 00010246
[ 1023.058429] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99b3bd0d5e20
[ 1023.066517] RDX: ffffffffc025f8c0 RSI: 0000000000000286 RDI: ffff99b7a1ed5c00
[ 1023.074605] RBP: ffffb5178747be90 R08: 0000000000000349 R09: 0000000000000000
[ 1023.082697] R10: ffffb5178747bd70 R11: 0000000000000365 R12: 0000000000000000
[ 1023.090790] R13: ffff99b3d97dbf70 R14: ffff99b3bd0d5e00 R15: ffff99b3bd0d5e00
[ 1023.098883] FS:  0000000000000000(0000) GS:ffff99b7ed880000(0000) 
knlGS:0000000000000000
[ 1023.108051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1023.114602] CR2: 00000000000000f0 CR3: 00000001d0c0a004 CR4: 00000000007606e0
[ 1023.122707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1023.130804] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1023.138915] PKRU: 55555554
[ 1023.142086] Call Trace:
[ 1023.144964]  ? __clear_rsb+0x15/0x3d
[ 1023.149105]  ? __schedule+0x29f/0x8a0
[ 1023.153340]  ? __clear_rsb+0x25/0x3d
[ 1023.157478]  ? schedule+0x2c/0x80
[ 1023.161326]  md_thread+0x129/0x170
[ 1023.165273]  ? raid10_start_reshape+0x630/0x630 [raid10]
[ 1023.171349]  ? md_thread+0x129/0x170
[ 1023.175484]  ? wait_woken+0x80/0x80
[ 1023.179521]  kthread+0x121/0x140
[ 1023.183267]  ? find_pers+0x70/0x70
[ 1023.187204]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1023.192986]  ? do_syscall_64+0x118/0x130
[ 1023.197505]  ret_from_fork+0x35/0x40
[ 1023.201631] Code: e4 48 8b 57 48 0f 84 92 08 00 00 49 83 7c 24 48 00 0f 84 
86 08 00 00 48 63 d8 48 c1 e3 05 48 85 d2 74 41 49 8b 46 08 48 8b 04 18 <f0> ff 
80 f0 00 00 00 49 8b 46 08 48 8b 04 18 48 8b 40 30 48 8b 
[ 1023.223245] RIP: raid10d+0xaec/0x1430 [raid10] RSP: ffffb5178747bd70
[ 1023.230490] CR2: 00000000000000f0
[ 1023.234340] ---[ end trace 12e1280fca9f2646 ]---

Additional information:
Following upstream patches solves the issue:

md: document lifetime of internal rdev pointer.
https://marc.info/?l=linux-raid&m=151761002007155&w=2
https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=f2785b527cda46314805123ddcbc871655b7c4c4

md: only allow remove_and_add_spares when no sync_thread running.
https://marc.info/?l=linux-raid&m=151761004007159&w=2
https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=39772f0a7be3b3dc26c74ea13fe7847fd1522c8b

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Information type changed from Public to Private

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1759279

Title:
  [Intel Ubuntu 18.04 Bug] Null pointer dereference, when disconnecting
  RAID rebuild target

Status in linux package in Ubuntu:
  New

Bug description:
  Kernel panic with null pointer dereference, when RAID10 rebuild target
  is disconnected during rebuild. It's sporadical issue.

  Steps to reproduce:
  1) Create raid10 with mdadm
  2) Wait for resync to end
  3) Add spare drive
  4) Fail one of the member drive 
  - Raid becomes degraded, rebuild to spare from step 3 starts.
  5) disconnect the drive added in step 3 (rebuild target)

  trace:
  [ 1022.872118] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000f0
  [ 1022.881072] IP: raid10d+0xaec/0x1430 [raid10]
  [ 1022.886071] PGD 0 P4D 0 
  [ 1022.889033] Oops: 0002 [#1] SMP PTI
  [ 1022.893056] Modules linked in: xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge 
stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack snd_hda_codec_hdmi intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek 
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass 
snd_hda_core ipmi_ssif intel_cstate joydev snd_hwdep intel_rapl_perf input_leds 
snd_pcm snd_timer ioatdma dca lpc_ich snd soundcore shpchp ipmi_si ipmi_devintf 
ipmi_msghandler tpm_crb acpi_pad acpi_power_meter mac_hid sch_fq_codel ib_iser 
rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc
  [ 1022.973751]  ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log nouveau 
hid_generic mxm_wmi mgag200 video i2c_algo_bit usbhid ttm e1000e i40e 
crct10dif_pclmul hid ptp crc32_pclmul ghash_clmulni_intel pcbc drm_kms_helper 
aesni_intel syscopyarea aes_x86_64 raid1 sysfillrect crypto_simd sysimgblt 
glue_helper uas fb_sys_fops cryptd ahci vmd drm usb_storage pps_core libahci wmi
  [ 1023.026580] CPU: 90 PID: 6373 Comm: md126_raid10 Not tainted 
4.15.0-10-generic #11-Ubuntu
  [ 1023.035831] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS 
PLYDTRL1.86B.0151.R03.1801050249 01/05/2018
  [ 1023.046913] RIP: 0010:raid10d+0xaec/0x1430 [raid10]
  [ 1023.052479] RSP: 0018:ffffb5178747bd70 EFLAGS: 00010246
  [ 1023.058429] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff99b3bd0d5e20
  [ 1023.066517] RDX: ffffffffc025f8c0 RSI: 0000000000000286 RDI: 
ffff99b7a1ed5c00
  [ 1023.074605] RBP: ffffb5178747be90 R08: 0000000000000349 R09: 
0000000000000000
  [ 1023.082697] R10: ffffb5178747bd70 R11: 0000000000000365 R12: 
0000000000000000
  [ 1023.090790] R13: ffff99b3d97dbf70 R14: ffff99b3bd0d5e00 R15: 
ffff99b3bd0d5e00
  [ 1023.098883] FS:  0000000000000000(0000) GS:ffff99b7ed880000(0000) 
knlGS:0000000000000000
  [ 1023.108051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1023.114602] CR2: 00000000000000f0 CR3: 00000001d0c0a004 CR4: 
00000000007606e0
  [ 1023.122707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [ 1023.130804] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [ 1023.138915] PKRU: 55555554
  [ 1023.142086] Call Trace:
  [ 1023.144964]  ? __clear_rsb+0x15/0x3d
  [ 1023.149105]  ? __schedule+0x29f/0x8a0
  [ 1023.153340]  ? __clear_rsb+0x25/0x3d
  [ 1023.157478]  ? schedule+0x2c/0x80
  [ 1023.161326]  md_thread+0x129/0x170
  [ 1023.165273]  ? raid10_start_reshape+0x630/0x630 [raid10]
  [ 1023.171349]  ? md_thread+0x129/0x170
  [ 1023.175484]  ? wait_woken+0x80/0x80
  [ 1023.179521]  kthread+0x121/0x140
  [ 1023.183267]  ? find_pers+0x70/0x70
  [ 1023.187204]  ? kthread_create_worker_on_cpu+0x70/0x70
  [ 1023.192986]  ? do_syscall_64+0x118/0x130
  [ 1023.197505]  ret_from_fork+0x35/0x40
  [ 1023.201631] Code: e4 48 8b 57 48 0f 84 92 08 00 00 49 83 7c 24 48 00 0f 84 
86 08 00 00 48 63 d8 48 c1 e3 05 48 85 d2 74 41 49 8b 46 08 48 8b 04 18 <f0> ff 
80 f0 00 00 00 49 8b 46 08 48 8b 04 18 48 8b 40 30 48 8b 
  [ 1023.223245] RIP: raid10d+0xaec/0x1430 [raid10] RSP: ffffb5178747bd70
  [ 1023.230490] CR2: 00000000000000f0
  [ 1023.234340] ---[ end trace 12e1280fca9f2646 ]---

  Additional information:
  Following upstream patches solves the issue:

  md: document lifetime of internal rdev pointer.
  https://marc.info/?l=linux-raid&m=151761002007155&w=2
  
https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=f2785b527cda46314805123ddcbc871655b7c4c4

  md: only allow remove_and_add_spares when no sync_thread running.
  https://marc.info/?l=linux-raid&m=151761004007159&w=2
  
https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=39772f0a7be3b3dc26c74ea13fe7847fd1522c8b

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1759279/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to