[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-10-13 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-51.56

---
linux (5.4.0-51.56) focal; urgency=medium

  * Packaging resync (LP: #1786013)
- update dkms package versions

linux (5.4.0-50.55) focal; urgency=medium

  * CVE-2020-16119
- SAUCE: dccp: avoid double free of ccid on child socket

  * CVE-2020-16120
- Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
  directories"
- ovl: pass correct flags for opening real directory
- ovl: switch to mounter creds in readdir
- ovl: verify permissions in ovl_path_open()
- ovl: call secutiry hook in ovl_real_ioctl()
- ovl: check permission to open real file

linux (5.4.0-49.53) focal; urgency=medium

  * focal/linux: 5.4.0-49.53 -proposed tracker (LP: #1896007)

  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
- ahci: Add Intel Comet Lake PCH-H PCI ID

  * Novalink (mkvterm command failure) (LP: #1892546)
- tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()

  * Oops and hang when starting LVM snapshots on 5.4.0-47 (LP: #1894780)
- SAUCE: Revert "mm: memcg/slab: fix memory leak at non-root kmem_cache
  destroy"

  * Intel x710 LOMs do not work on Focal (LP: #1893956)
- i40e: Fix LED blinking flow for X710T*L devices
- i40e: enable X710 support

  * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
- kvm: svm: Update svm_xsaves_supported

  * Fix non-working NVMe after S3 (LP: #1895718)
- SAUCE: PCI: Enable ACS quirk on CML root port

  * Focal update: v5.4.65 upstream stable release (LP: #1895881)
- ipv4: Silence suspicious RCU usage warning
- ipv6: Fix sysctl max for fib_multipath_hash_policy
- netlabel: fix problems with mapping removal
- net: usb: dm9601: Add USB ID of Keenetic Plus DSL
- sctp: not disable bh in the whole sctp_get_port_local()
- taprio: Fix using wrong queues in gate mask
- tipc: fix shutdown() of connectionless socket
- net: disable netpoll on fresh napis
- Linux 5.4.65

  * Focal update: v5.4.64 upstream stable release (LP: #1895880)
- HID: quirks: Always poll three more Lenovo PixArt mice
- drm/msm/dpu: Fix scale params in plane validation
- tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
- drm/msm: add shutdown support for display platform_driver
- hwmon: (applesmc) check status earlier.
- nvmet: Disable keep-alive timer when kato is cleared to 0h
- drm/msm: enable vblank during atomic commits
- habanalabs: validate FW file size
- habanalabs: check correct vmalloc return code
- drm/msm/a6xx: fix gmu start on newer firmware
- ceph: don't allow setlease on cephfs
- drm/omap: fix incorrect lock state
- cpuidle: Fixup IRQ state
- nbd: restore default timeout when setting it to zero
- s390: don't trace preemption in percpu macros
- drm/amd/display: Reject overlay plane configurations in multi-display
  scenarios
- drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in
  amdgpu_dm_update_backlight_caps
- drm/amd/display: Retry AUX write when fail occurs
- drm/amd/display: Fix memleak in amdgpu_dm_mode_config_init
- xen/xenbus: Fix granting of vmalloc'd memory
- fsldma: fix very broken 32-bit ppc ioread64 functionality
- dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
- batman-adv: Avoid uninitialized chaddr when handling DHCP
- batman-adv: Fix own OGM check in aggregated OGMs
- batman-adv: bla: use netif_rx_ni when not in interrupt context
- dmaengine: at_hdmac: check return value of of_find_device_by_node() in
  at_dma_xlate()
- rxrpc: Keep the ACK serial in a var in rxrpc_input_ack()
- rxrpc: Make rxrpc_kernel_get_srtt() indicate validity
- MIPS: mm: BMIPS5000 has inclusive physical caches
- MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores
- mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040
- netfilter: nf_tables: add NFTA_SET_USERDATA if not null
- netfilter: nf_tables: incorrect enum nft_list_attributes definition
- netfilter: nf_tables: fix destination register zeroing
- net: hns: Fix memleak in hns_nic_dev_probe
- net: systemport: Fix memleak in bcm_sysport_probe
- ravb: Fixed to be able to unload modules
- net: arc_emac: Fix memleak in arc_mdio_probe
- dmaengine: pl330: Fix burst length if burst size is smaller than bus width
- gtp: add GTPA_LINK info to msg sent to userspace
- net: ethernet: ti: cpsw: fix clean up of vlan mc entries for host port
- bnxt_en: Don't query FW when netif_running() is false.
- bnxt_en: Check for zero dir entries in NVRAM.
- bnxt_en: Fix PCI AER error recovery flow
- bnxt_en: Fix possible crash in bnxt_fw_reset_task().
- bnxt_en: fix HWRM error when querying VF temperature
- xfs: fix boundary test in xfs_attr_shortform_verify
- bnxt: don't enable NAPI until rings are ready
  

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-10-09 Thread Ian
I've confirmed using William's provided steps that the crash no longer
occurs on 5.4.0-49.53

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-10-08 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-21.22

---
linux (5.8.0-21.22) groovy; urgency=medium

  * groovy/linux: 5.8.0-21.22 -proposed tracker (LP: #1898150)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * Fix broken e1000e device after S3 (LP: #1897755)
- SAUCE: e1000e: Increase polling timeout on MDIC ready bit

  * EFA: add support for 0xefa1 devices (LP: #1896791)
- RDMA/efa: Expose maximum TX doorbell batch
- RDMA/efa: Expose minimum SQ size
- RDMA/efa: User/kernel compatibility handshake mechanism
- RDMA/efa: Add EFA 0xefa1 PCI ID

  * Groovy update: v5.8.13 upstream stable release (LP: #1898076)
- device_cgroup: Fix RCU list debugging warning
- ASoC: pcm3168a: ignore 0 Hz settings
- ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
- ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
- ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
- clk: versatile: Add of_node_put() before return statement
- RISC-V: Take text_mutex in ftrace_init_nop()
- i2c: aspeed: Mask IRQ status to relevant bits
- s390/init: add missing __init annotations
- lockdep: fix order in trace_hardirqs_off_caller()
- EDAC/ghes: Check whether the driver is on the safe list correctly
- drm/amdkfd: fix a memory leak issue
- drm/amd/display: Don't use DRM_ERROR() for DTM add topology
- drm/amd/display: update nv1x stutter latencies
- drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is
- drm/amd/display: Don't log hdcp module warnings in dmesg
- objtool: Fix noreturn detection for ignored functions
- i2c: mediatek: Send i2c master code at more than 1MHz
- riscv: Fix Kendryte K210 device tree
- ieee802154: fix one possible memleak in ca8210_dev_com_init
- ieee802154/adf7242: check status of adf7242_read_reg
- clocksource/drivers/h8300_timer8: Fix wrong return value in
  h8300_8timer_init()
- batman-adv: bla: fix type misuse for backbone_gw hash indexing
- libbpf: Fix build failure from uninitialized variable warning
- atm: eni: fix the missed pci_disable_device() for eni_init_one()
- batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
- netfilter: ctnetlink: add a range check for l3/l4 protonum
- netfilter: ctnetlink: fix mark based dump filtering regression
- netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled
- netfilter: nft_meta: use socket user_ns to retrieve skuid and skgid
- mac802154: tx: fix use-after-free
- bpf: Fix clobbering of r2 in bpf_gen_ld_abs
- tools/libbpf: Avoid counting local symbols in ABI check
- drm/vc4/vc4_hdmi: fill ASoC card owner
- net: qed: Disable aRFS for NPAR and 100G
- net: qede: Disable aRFS for NPAR and 100G
- net: qed: RDMA personality shouldn't fail VF load
- igc: Fix wrong timestamp latency numbers
- igc: Fix not considering the TX delay for timestamps
- drm/sun4i: sun8i-csc: Secondary CSC register correction
- hv_netvsc: Switch the data path at the right time during hibernation
- spi: spi-fsl-dspi: use XSPI mode instead of DMA for DPAA2 SoCs
- RDMA/core: Fix ordering of CQ pool destruction
- batman-adv: Add missing include for in_interrupt()
- xsk: Fix number of pinned pages/umem size discrepancy
- nvme-tcp: fix kconfig dependency warning when !CRYPTO
- batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LAN
- batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh
- batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh
- bpf: Fix a rcu warning for bpffs map pretty-print
- lib80211: fix unmet direct dependendices config warning when !CRYPTO
- mac80211: do not disable HE if HT is missing on 2.4 GHz
- cfg80211: fix 6 GHz channel conversion
- mac80211: fix 80 MHz association to 160/80+80 AP on 6 GHz
- ALSA: asihpi: fix iounmap in error handler
- io_uring: fix openat/openat2 unified prep handling
- SUNRPC: Fix svc_flush_dcache()
- regmap: fix page selection for noinc reads
- regmap: fix page selection for noinc writes
- net/mlx5e: mlx5e_fec_in_caps() returns a boolean
- MIPS: Loongson-3: Fix fp register access if MSA enabled
- PM / devfreq: tegra30: Disable clock on error in probe
- MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
- regulator: axp20x: fix LDO2/4 description
- spi: bcm-qspi: Fix probe regression on iProc platforms
- KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
- KVM: SVM: Add a dedicated INVD intercept routine
- mm: validate pmd after splitting
- arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
- x86/irq: Make run_on_irqstack_cond() typesafe
- x86/ioapic: Unbreak check_timer()
- scsi: lpfc: Fix initial FLOGI failure due to BBSCN 

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-10-06 Thread Ian
Hi William,

We are working to get verification testing completed, would you be able
to confirm this issue no longer occurs in 5.4.0-49.53.

Thanks,
Ian

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-21 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-17 Thread Ian
** Changed in: linux (Ubuntu Focal)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-16 Thread Thadeu Lima de Souza Cascardo
** Description changed:

+ [Impact]
+ kmemcaches will fail to be created after they have just been removed but not 
completely ripped out. This will cause some drivers (like lvm snapshots) to 
properly work and cause kernel traces to go on the logs.
+ 
+ [Test case]
+ See comment #9.
+ 
+ [Regression potential]
+ The fix reverts a commit, so we go back to a state of a previously released 
kernel, where a leak was possible. The regression here, though, is better than 
the impact that will also lead to a different leak and prevent users from 
correctly using LVM snapshots.
+ 
+ =
+ 
  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while
  starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47,
  with the following trace:
  
    [   29.126292] kobject_add_internal failed for :a-152 with -EEXIST, 
don't try to register things with the same name in the same directory.
    [   29.138854] BUG: kernel NULL pointer dereference, address: 
0020
    [   29.145977] #PF: supervisor read access in kernel mode
    [   29.145979] #PF: error_code(0x) - not-present page
    [   29.145981] PGD 0 P4D 0
    [   29.158800] Oops:  [#1] SMP NOPTI
    [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
    [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
    [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
    [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
    [   29.202530] RSP: 0018:a2f69c3d38e8 EFLAGS: 00010046
    [   29.209204] RAX:  RBX: 92202ff397c0 RCX: 
a880a000
    [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 

    [   29.223469] RBP: a2f69c3d3918 R08:  R09: 
a74a5300
    [   29.230609] R10: a2f69c3d3820 R11:  R12: 
cf35c0f24f14c3c0
    [   29.237745] R13: cf362fb2a054c3c0 R14: 0287 R15: 
0008
    [   29.244878] FS:  7f93a04b0900() GS:913faed8() 
knlGS:
    [   29.252961] CS:  0010 DS:  ES:  CR0: 80050033
    [   29.258707] CR2: 0020 CR3: 003fa9d9 CR4: 
003406e0
    [   29.265883] Call Trace:
    [   29.268346]  __kmem_cache_release+0x1a/0x30
    [   29.273913]  __kmem_cache_create+0x4f9/0x550
    [   29.278192]  ? __kmalloc_node+0x1eb/0x320
    [   29.282205]  ? kvmalloc_node+0x31/0x80
    [   29.285962]  create_cache+0x120/0x1f0
    [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
    [   29.295882]  kmem_cache_create+0x16/0x20
    [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
    [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
    [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
    [   29.316627]  ? _cond_resched+0x19/0x40
    [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
    [   29.325276]  dm_table_add_target+0x18d/0x370
    [   29.329552]  table_load+0x12a/0x370
    [   29.333045]  ctl_ioctl+0x1e2/0x590
    [   29.336450]  ? retrieve_status+0x1c0/0x1c0
    [   29.340551]  dm_ctl_ioctl+0xe/0x20
    [   29.343958]  do_vfs_ioctl+0xa9/0x640
    [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
    [   29.352337]  ksys_ioctl+0x75/0x80
    [   29.355663]  __x64_sys_ioctl+0x1a/0x20
    [   29.359421]  do_syscall_64+0x57/0x190
    [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   29.368144] RIP: 0033:0x7f939f0286d7
    [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
    [   29.390478] RSP: 002b:7ffe918df168 EFLAGS: 0202 ORIG_RAX: 
0010
    [   29.398045] RAX: ffda RBX: 561c107f672c RCX: 
7f939f0286d7
    [   29.405175] RDX: 561c1107c610 RSI: c138fd09 RDI: 
0009
    [   29.412309] RBP: 7ffe918df220 R08: 7f939f59d120 R09: 
7ffe918defd0
    [   29.419442] R10: 561c1107c6c0 R11: 0202 R12: 
7f939f59c4e6
    [   29.426623] R13: 7f939f59c4e6 R14: 7f939f59c4e6 R15: 
7f939f59c4e6
    [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-16 Thread Thadeu Lima de Souza Cascardo
After reverting commit 79ffe7107b13042c69c4a06394175362121b06b5
(upstream commit d38a2b7a9c939e6d7329ab92b96559ccebf7b135) ("mm:
memcg/slab: fix memory leak at non-root kmem_cache destroy"), things
seem to go back to normal.

The probable reason this one causes a problem is because it has:

@@ -326,6 +326,14 @@ int slab_unmergeable(struct kmem_cache *s)
if (s->refcount < 0)
return 1;
 
+#ifdef CONFIG_MEMCG_KMEM
+   /*
+* Skip the dying kmem_cache.
+*/
+   if (s->memcg_params.dying)
+   return 1;
+#endif
+
return 0;
 }

So, this causes the same-sized slab to become unmergeable, and when a
new slab is created, it will fail creating the sysfs entry.

I haven't investigated why memcg is at play here, and why this one would
be dying, that would involve memcgs being removed during the operation.
But not allowing merges will certainly cause problems here.

The other issue is that this memcg code has been totally
replaced/discarded on 5.9, so it will make things interesting trying to
upstream a proper fix here.

But considering the commit fix a leak and here we will have a different
leak and failures to create slabs, the revert is preferable for now.

Cascardo.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-10 Thread William Grant
As expected, it's easy enough to repro with raw dm_snapshot:

  root@bug-1894780-focal-3:~# for f in base snap1 snap2; do dd if=/dev/zero 
of=$f.img bs=1M count=1 seek=512; done
  root@bug-1894780-focal-3:~# losetup -f base.img 
  root@bug-1894780-focal-3:~# losetup -f snap1.img 
  root@bug-1894780-focal-3:~# losetup -f snap2.img 
  root@bug-1894780-focal-3:~# dmsetup create snap-base --table "0 524288 
snapshot-origin /dev/loop3"
  root@bug-1894780-focal-3:~# for i in 4 5; do dmsetup create snap-$i --table 
"0 524288 snapshot /dev/mapper/snap-base /dev/loop$i P 16" & done

It works fine once you sneak a "sleep 1" into the loop.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-09 Thread William Grant
On 10/9/20 7:44 am, Jay Vosburgh wrote:
> wgrant, you said:
> 
> That :a-152 is meant to be /sys/kernel/slab/:a-152. Even a
> working kernel shows some trouble there:
> 
>   $ uname -a
>   Linux  5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
> 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>   $ ls -l /sys/kernel/slab | grep a-152
>   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-152
> 
> Are you saying that the symlink is "some trouble" here?  Because that
> part isn't an error, that's the effect of slab merge (that the kernel
> normally treats all slabs of the same size as one big slab with multiple
> references, more or less).

The symlink itself is indeed not a bug. But there's one reference, and
the thing it's referencing doesn't exist. I don't think that symlink
should be dangling.

> Slab merge can be disabled via "slab_nomerge" on the command line.

Thanks for the slab_nomerge hint. That gets 5.4.0-47 to boot, but
dm_bufio_buffer interestingly doesn't show up in /proc/slabinfo or
/sys/kernel/slab at all, unlike in earlier kernels. There's no 152-byte
slab:

  $ sudo cat /sys/kernel/slab/*/slab_size | grep ^152$
  $

I've also just reproduced this on a second host by rebooting it into the
same updated kernel -- identical hardware except for a couple of things
like SSDs, and fairly similar software configuration.

... some digging later ...

The trigger on boot is the parallel pvscans launched by
lvm2-pvscan@.service in the presence of several PVs. If I mask that
service, the system boots fine on the updated kernel (without
slab_nomerge). And then this crashes it:

  for i in 259:1 259:2 259:3 8:32 8:48 8:64 8:80; do sudo /sbin/lvm
pvscan --cache --activate ay $i & done`

I think the key is to have no active VGs with snapshots, then
simultaneously activate two VGs with snapshots.

Armed with that hypothesis, I set up a boring local bionic qemu-kvm
instance, installed linux-generic-hwe-18.04, and reproduced the problem
with a couple of loop devices:

  $ sudo dd if=/dev/zero of=pv1.img bs=1M count=1 seek=1024
  $ sudo dd if=/dev/zero of=pv2.img bs=1M count=1 seek=1024
  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ sudo vgcreate vg1 /dev/loop0
  $ sudo vgcreate vg2 /dev/loop1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg2
  $ sudo systemctl mask lvm2-pvscan@.service
  $ sudo reboot

  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ for i in 7:0 7:1; do sudo /sbin/lvm pvscan --cache --activate ay $i
& done
  $ # Be glad if you can still type by this point.

The oops is not 100% reproducible in this configuration, but it seems
fairly reliable with four vCPUs. If not, a few cycles of rebooting and
running those last three commands always worked for me.

The console sometimes remains responsive after the oops, allowing me to
capture good and bad `dmsetup table -v` output. Not sure how helpful
that is, but I've attached an example (from a slightly different
configuration, where each VG has a linear LV with a snapshot,
rather than a snapshot-backed thin LV).


I've also been able to reproduce the fault on a pure focal system, but
it doesn't always happen on boot; lvm2-pvscan@.service (or a manual
pvscan afterwards) fails to activate the VGs. Something is creating
/run/lvm/vgs_online/$VG too early, so pvscan thinks it's already done
and I end up needing to activate them manually later. This seems
unrelated, and only affects a subset of my VMs. But when it happens,
that actually makes it easier to reproduce, since the system boots
without having the unit masked. So you can then crash with just:

  $ for VG in vg1 vg2; do sudo vgchange -ay $VG & done

While debugging locally I also found that groovy with 5.8.0-18 is
affected. Because when I stopped a VM with PVs on real block devices the
host (my desktop, on which I nearly lost this email, oops) dutifully ran
pvscan over them, got very sad, and needed to be rebooted with
slab_nomerge to recover:

  [ DO NOT BLINDLY RUN THIS, it may well crash the host. ]
  $ lxc launch --vm ubuntu:focal bug-1894780-focal-2
  $ lxc storage volume create default lvm-1 --type=block size=10GB
  $ lxc storage volume create default lvm-2 --type=block size=10GB
  $ lxc stop bug-1894780-focal-2
  $ lxc storage volume attach default lvm-1 bug-1894780-focal-2 lvm-1
  $ lxc storage volume attach default lvm-2 bug-1894780-focal-2 lvm-2
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # vgcreate vg1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-1
  # vgcreate vg2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-2
  # lvcreate --type snapshot -L4M -V10G -n test vg1
  # lvcreate --type snapshot -L4M -V10G -n test vg2
  # poweroff
  $ # Host sadness here, unless you're somehow immune.
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # for VG in vg1 vg2; do sudo vgchange -ay $VG & done
  # # Guest sadness here.

So 

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-09 Thread Jay Vosburgh
wgrant, you said:

That :a-152 is meant to be /sys/kernel/slab/:a-152. Even a
working kernel shows some trouble there:

  $ uname -a
  Linux  5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  $ ls -l /sys/kernel/slab | grep a-152
  lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-152

Are you saying that the symlink is "some trouble" here?  Because that
part isn't an error, that's the effect of slab merge (that the kernel
normally treats all slabs of the same size as one big slab with multiple
references, more or less).

Slab merge can be disabled via "slab_nomerge" on the command line.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-08 Thread William Grant
https://lore.kernel.org/linux-
mm/alpine.lrh.2.02.1806151817130.6...@file01.intranet.prod.int.rdu2.redhat.com/
(2018's "slub: fix failure when we delete and create a slab cache")
looks relevant to similar problems with this particular slub callsite.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-08 Thread William Grant
[   31.566946] kobject_add_internal failed for :a-152 with -EEXIST, don't 
try to register things with the same name in the same directory.
[   31.580027] BUG: kernel NULL pointer dereference, address: 0020
[   31.586990] #PF: supervisor read access in kernel mode
[   31.592130] #PF: error_code(0x) - not-present page
[   31.597269] PGD 0 P4D 0 
[   31.599826] Oops:  [#1] SMP NOPTI
[   31.599829] CPU: 103 PID: 2399 Comm: lvm Not tainted 5.4.0-45-generic 
#49~18.04.2-Ubuntu

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-08 Thread Kleber Sacilotto de Souza
Hi William,

Could you please test with the HWE 5.4.0-45.49~18.04.2 kernel?

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-07 Thread Andy Whitcroft
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs