For reference, the kernel spew of the BUG_ON:

[   78.354129] kernel BUG at 
/home/ubuntu/xenial-aws/drivers/nvme/host/pci.c:619!
[   78.357297] invalid opcode: 0000 [#1] SMP
[   78.359613] Modules linked in: dm_snapshot dm_bufio xfs ppdev serio_raw 
parport_pc 8250_fintek parport i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_sa 
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd ena
[   78.387878] CPU: 0 PID: 1687 Comm: mount Not tainted 4.4.0-1105-aws #116
[   78.390837] Hardware name: Amazon EC2 c5d.large/, BIOS 1.0 10/16/2017
[   78.393692] task: ffff8800bb155400 ti: ffff8800b93bc000 task.ti: 
ffff8800b93bc000
[   78.396973] RIP: 0010:[<ffffffff815dbd06>]  [<ffffffff815dbd06>] 
nvme_queue_rq+0x8c6/0xa60
[   78.400787] RSP: 0018:ffff8800b93bf7c8  EFLAGS: 00010286
[   78.403151] RAX: 0000000000000078 RBX: 0000000000001000 RCX: 0000000000001000
[   78.406276] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
[   78.409390] RBP: ffff8800b93bf8a8 R08: ffff8800b916c700 R09: 0000000000001000
[   78.412518] R10: 000000000001ec00 R11: ffff8800b8e30000 R12: 00000000fffffc00
[   78.417056] R13: 0000000000000010 R14: 000000000000fc00 R15: 0000000035fd5000
[   78.421581] FS:  00007f30fe043840(0000) GS:ffff880130a00000(0000) 
knlGS:0000000000000000
[   78.427884] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.431827] CR2: 00007f57d4057889 CR3: 0000000035974000 CR4: 0000000000360670
[   78.436322] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   78.440821] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   78.445316] Stack:
[   78.447706]  ffff880036009480 ffff880036009700 ffff8800b7782800 
0000000000000ff8
[   78.454583]  ffff8800b8e30420 ffff8800360a9400 ffff88000001fc00 
ffff8800b7697b00
[   78.461462]  ffff880100001000 ffff8800b8e30000 ffff88003604c000 
00000001ffc00400
[   78.468332] Call Trace:
[   78.470921]  [<ffffffff813e6617>] blk_mq_make_request+0x407/0x550
[   78.475001]  [<ffffffff813d8f14>] generic_make_request+0x114/0x2d0
[   78.479110]  [<ffffffff813d0371>] ? bvec_alloc+0x91/0x100
[   78.482936]  [<ffffffff813d9146>] submit_bio+0x76/0x160
[   78.486680]  [<ffffffffc0347a14>] _xfs_buf_ioapply+0x2e4/0x4a0 [xfs]
[   78.490866]  [<ffffffff810b22e0>] ? wake_up_q+0x70/0x70
[   78.494601]  [<ffffffffc0349c94>] ? xfs_bwrite+0x24/0x60 [xfs]
[   78.498583]  [<ffffffffc034975d>] xfs_buf_submit_wait+0x5d/0x230 [xfs]
[   78.502861]  [<ffffffffc0349c94>] xfs_bwrite+0x24/0x60 [xfs]
[   78.506785]  [<ffffffffc037108f>] xlog_bwrite+0x7f/0x100 [xfs]
[   78.510787]  [<ffffffffc0371f34>] xlog_write_log_records+0x1a4/0x230 [xfs]
[   78.515192]  [<ffffffffc0372077>] xlog_clear_stale_blocks+0xb7/0x1b0 [xfs]
[   78.519596]  [<ffffffffc037198f>] ? xlog_bread+0x3f/0x50 [xfs]
[   78.523588]  [<ffffffffc03765eb>] xlog_find_tail+0x2db/0x3b0 [xfs]
[   78.527705]  [<ffffffffc03766ed>] xlog_recover+0x2d/0x160 [xfs]
[   78.531720]  [<ffffffffc036a11b>] xfs_log_mount+0xdb/0x2a0 [xfs]
[   78.535767]  [<ffffffffc03612e3>] xfs_mountfs+0x4f3/0x870 [xfs]
[   78.539788]  [<ffffffffc036216b>] ? xfs_mru_cache_create+0x12b/0x180 [xfs]
[   78.544197]  [<ffffffffc036463b>] xfs_fs_fill_super+0x3bb/0x4e0 [xfs]
[   78.548400]  [<ffffffff8121dc70>] mount_bdev+0x270/0x2c0
[   78.552169]  [<ffffffffc0364280>] ? xfs_parseargs+0xab0/0xab0 [xfs]
[   78.556338]  [<ffffffffc03628e5>] xfs_fs_mount+0x15/0x20 [xfs]
[   78.560337]  [<ffffffff8121e65d>] mount_fs+0x3d/0x170
[   78.564091]  [<ffffffff811bc405>] ? __alloc_percpu+0x15/0x20
[   78.568060]  [<ffffffff8123b257>] vfs_kern_mount+0x67/0x110
[   78.571971]  [<ffffffff8123d95f>] do_mount+0x25f/0xda0
[   78.575692]  [<ffffffff8123bad4>] ? mntput+0x24/0x40
[   78.579334]  [<ffffffff811fbf06>] ? __kmalloc_track_caller+0x1b6/0x250
[   78.583595]  [<ffffffff8121c483>] ? __fput+0x193/0x230
[   78.587296]  [<ffffffff811b6952>] ? memdup_user+0x42/0x70
[   78.591111]  [<ffffffff8123e7df>] SyS_mount+0x9f/0x100
[   78.594804]  [<ffffffff818449db>] entry_SYSCALL_64_fastpath+0x22/0xcb
[   78.599029] Code: 11 e3 e3 ff 44 8b 95 50 ff ff ff 48 89 85 68 ff ff ff 4c 
8b 48 10 44 8b 58 18 8b 95 58 ff ff ff 8b 8d 60 ff ff ff e9 0a fd ff ff <0f> 0b 
48 8b 73 68 48 8b bd 70 ff ff ff e8 58 c5 e2 ff 83 f8 01
[   78.625198] RIP  [<ffffffff815dbd06>] nvme_queue_rq+0x8c6/0xa60
[   78.629410]  RSP <ffff8800b93bf7c8>
[   78.632442] ---[ end trace de20412ccd13806e ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1869229

Title:
  Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed

Bug description:
  [Impact]
  When mounting LVM snapshots using xfs, it's possible to hit a BUG_ON() in 
nvme driver.

  Upstream commit 729204ef49ec ("block: relax check on sg gap")
  introduced a way to merge bios if they are physically contiguous. This
  can lead to issues if one rq starts with a non-aligned buffer, as it
  can cause the merged segment to end in an unaligned virtual boundary.
  In some AWS instances, it's possible to craft such a request when
  attempting to mount LVM snapshots using xfs. This will then cause a
  kernel spew due to a BUG_ON in nvme_setup_prps(), which checks if
  dma_len is aligned to the page size.

  [Fix]
  Upstream commit 5a8d75a1b8c9 ("block: fix bio_will_gap() for first bvec with 
offset") prevents requests that begin with an unaligned buffer from being 
merged.

  [Test Case]
  This has been verified on AWS with c5d.large instances:

  1) Prepare the LVM device + snapshot
  $ sudo vgcreate vg0 /dev/nvme1n1
  $ sudo lvcreate -L5G -n data0 vg0
  $ sudo mkfs.xfs /dev/vg0/data0
  $ sudo mount /dev/vg0/data0 /mnt
  $ sudo touch /mnt/test
  $ sudo touch /mnt/test2
  $ sudo ls /mnt
  $ sudo umount /mnt
  $ sudo lvcreate -l100%FREE -s /dev/vg0/data0 -n data0_snap

  2) Attempting to mount the previously created snapshot results in the Oops:
  $ sudo mount /dev/vg0/data0_snap /mnt
  Segmentation fault (core dumped)

  [Regression Potential]
  The fix prevents some bios from being merged, so it can have a performance 
impact in certain scenarios. The patch only targets misaligned segments, so the 
impact should be less noticeable in the general case.
  The commit is also present in mainline kernels since 4.13, and hasn't been 
changed significantly, so potential for other regressions should be low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869229/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to