This issue seems to have appeared somewhere between zfs-linux
0.8.4-1ubuntu11 (last known working version) and 0.8.4-1ubuntu16.

When the issue first hit, I had zfs-dkms installed, which was on
0.8.4-1ubuntu16 where as the kernel build had 0.8.4-1ubuntu11. I removed
zfs-dkms to go back to the kernel built version and it was working OK.
linux-image-5.8.0-36-generic is now released on Hirsute with
0.8.4-1ubuntu16 and so now the out of the box kernel is also broken and
I am regularly having problems with this.

linux-image-5.8.0-29-generic: working
linux-image-5.8.0-36-generic: broken

`
lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-29-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu11

lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-36-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu16
`

I don't have a good quick/easy reproducer but just using my desktop for
a day or two seems I am likely to hit the issue after a while.

I tried to install the upstream zfs-dkms package for 2.0 to see if I can
bisect the issue on upstream versions but it breaks my boot for some
weird systemd reason I cannot quite figure out as yet.

Looking at the Ubuntu changelog I'd say the fix for
https://bugs.launchpad.net/bugs/1899826 that landed in 0.8.4-1ubuntu13
to backport the 5.9 and 5.10 compataibility patches is a prime suspect
but could also be any other version. I'm going to try and 'bisect'
0.8.4-1ubuntu11 through 0.8.4-1ubuntu16 to figure out which version
actually hit it.

Since the default kernel is now hitting this, there have been 2 more
user reports of the same things in the upstream bug in the past few days
since that kernel landed and I am regularly getting inaccessible files
not just from chrome but even a linux git tree among other things I am
going to raise the priority on this bug to Critical as you lose access
to files so has data loss potential. I have not yet determined if you
can somehow get the data back, so far it's only affected files I can
replace such as cache/git files. It seems like snapshots might be OK
(which would make sense).

** Changed in: zfs-linux (Ubuntu)
   Importance: High => Critical

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

Status in zfs-linux package in Ubuntu:
  Confirmed

Bug description:
  Since today while running Ubuntu 21.04 Hirsute I started getting a ZFS
  panic in the kernel log which was also hanging Disk I/O for all
  Chrome/Electron Apps.

  I have narrowed down a few important notes:
  - It does not happen with module version 0.8.4-1ubuntu11 built and included 
with 5.8.0-29-generic

  - It was happening when using zfs-dkms 0.8.4-1ubuntu16 built with DKMS
  on the same kernel and also on 5.8.18-acso (a custom kernel).

  - For whatever reason multiple Chrome/Electron apps were affected,
  specifically Discord, Chrome and Mattermost. In all cases they seem
  (but I was unable to strace the processes so it was a bit hard ot
  confirm 100% but by deduction from /proc/PID/fd and the hanging ls)
  they seem hung trying to open files in their 'Cache' directory, e.g.
  ~/.cache/google-chrome/Default/Cache and ~/.config/Mattermost/Cache ..
  while the issue was going on I could not list that directory either
  "ls" would just hang.

  - Once I removed zfs-dkms only to revert to the kernel built-in
  version it immediately worked without changing anything, removing
  files, etc.

  - It happened over multiple reboots and kernels every time, all my
  Chrome apps weren't working but for whatever reason nothing else
  seemed affected.

  - It would log a series of spl_panic dumps into kern.log that look like this:
  Dec  2 12:36:42 optane kernel: [   72.857033] VERIFY(0 == 
sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) 
failed
  Dec  2 12:36:42 optane kernel: [   72.857036] PANIC at 
zfs_znode.c:335:zfs_znode_sa_init()

  I could only find one other google reference to this issue, with 2 other 
users reporting the same error but on 20.04 here:
  https://github.com/openzfs/zfs/issues/10971

  - I was not experiencing the issue on 0.8.4-1ubuntu14 and fairly sure
  it was working on 0.8.4-1ubuntu15 but broken after upgrade to
  0.8.4-1ubuntu16. I will reinstall those zfs-dkms versions to verify
  that.

  There were a few originating call stacks but the first one I hit was

  Call Trace:
   dump_stack+0x74/0x95
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   ? sa_cache_constructor+0x27/0x50 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dmu_buf_set_user_ie+0x54/0x80 [zfs]
   zfs_znode_sa_init+0xe0/0xf0 [zfs]
   zfs_znode_alloc+0x101/0x700 [zfs]
   ? arc_buf_fill+0x270/0xd30 [zfs]
   ? __cv_init+0x42/0x60 [spl]
   ? dnode_cons+0x28f/0x2a0 [zfs]
   ? _cond_resched+0x19/0x40
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? aggsum_add+0x153/0x170 [zfs]
   ? spl_kmem_alloc_impl+0xd8/0x110 [spl]
   ? arc_space_consume+0x54/0xe0 [zfs]
   ? dbuf_read+0x4a0/0xb50 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dnode_rele_and_unlock+0x5a/0xc0 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dmu_object_info_from_dnode+0x84/0xb0 [zfs]
   zfs_zget+0x1c3/0x270 [zfs]
   ? dmu_buf_rele+0x3a/0x40 [zfs]
   zfs_dirent_lock+0x349/0x680 [zfs]
   zfs_dirlook+0x90/0x2a0 [zfs]
   ? zfs_zaccess+0x10c/0x480 [zfs]
   zfs_lookup+0x202/0x3b0 [zfs]
   zpl_lookup+0xca/0x1e0 [zfs]
   path_openat+0x6a2/0xfe0
   do_filp_open+0x9b/0x110
   ? __check_object_size+0xdb/0x1b0
   ? __alloc_fd+0x46/0x170
   do_sys_openat2+0x217/0x2d0
   ? do_sys_openat2+0x217/0x2d0
   do_sys_open+0x59/0x80
   __x64_sys_openat+0x20/0x30

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to