Hi Colin, It seems to happen on 2 different controllers (both LSI). I'm using the 9201-16e at the moment and it performs much faster overall, but doing something like a zfs scrub on a pool still causes the resets and thus zfs locks as part of those. Basically seems to be under heavy IO load. Although, this controller can handle much heavier IO that just a single scrub. For example, it seems much more likely to happen on my 3.5" 2TB and 3TB HDD drives, as opposed to my 2.5" 1TB SSD's. The SSD's scream through a scrub in about an hour, whereas the HDD's take a day or more.
The only thing I can think of is maybe to increase direct cooling on the controller in case it's overheating. But this is a Dell R710 server chassis with lots of high volume airflow. It's really hard to pinpoint the problem between controller, driver, and filesystem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_sync D 0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl] [184383.480347] kthread+0x104/0x140 [184383.480356] ? clear_bit+0x20/0x20 [spl] [184383.480358] ? kthread_park+0x90/0x90 [184383.480361] ret_from_fork+0x35/0x40 Then nfsd hangs as well. [184866.787445] INFO: task nfsd:6585 blocked for more than 120 seconds. [184866.787485] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184866.787526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184866.787573] nfsd D 0 6585 2 0x80004000 [184866.787575] Call Trace: [184866.787578] __schedule+0x2e3/0x740 [184866.787675] ? __raw_spin_unlock+0x9/0x10 [zfs] [184866.787678] schedule+0x42/0xb0 [184866.787685] cv_wait_common+0x133/0x180 [spl] [184866.787688] ? wait_woken+0x80/0x80 [184866.787695] __cv_wait+0x15/0x20 [spl] [184866.787764] dmu_tx_wait+0x1ee/0x210 [zfs] [184866.787834] dmu_tx_assign+0x49/0x70 [zfs] [184866.787929] zfs_write+0x461/0xd40 [zfs] [184866.788025] ? atomic_sub_return.constprop.0+0xd/0x20 [zfs] [184866.788033] ? atomic_dec+0xd/0x20 [spl] [184866.788116] ? __raw_spin_unlock+0x9/0x10 [zfs] [184866.788122] ? __d_obtain_alias+0x36/0x90 [184866.788217] zpl_write_common_iovec+0xad/0x120 [zfs] [184866.788313] zpl_iter_write_common+0x8e/0xb0 [zfs] [184866.788409] zpl_iter_write+0x56/0x90 [zfs] [184866.788413] do_iter_readv_writev+0x14f/0x1d0 [184866.788416] do_iter_write+0x84/0x1a0 [184866.788418] vfs_iter_write+0x19/0x30 [184866.788442] nfsd_vfs_write+0xe0/0x480 [nfsd] [184866.788454] nfsd_write+0x7a/0x160 [nfsd] [184866.788458] ? kmem_cache_alloc+0x16d/0x230 [184866.788472] nfsd3_proc_write+0xc3/0x170 [nfsd] [184866.788483] nfsd_dispatch+0xd6/0x220 [nfsd] [184866.788508] svc_process_common+0x3af/0x700 [sunrpc] [184866.788527] ? svc_sock_secure_port+0x16/0x30 [sunrpc] [184866.788538] ? nfsd_svc+0x2d0/0x2d0 [nfsd] [184866.788557] svc_process+0xd9/0x110 [sunrpc] [184866.788568] nfsd+0xe8/0x150 [nfsd] [184866.788570] kthread+0x104/0x140 [184866.788581] ? nfsd_destroy+0x60/0x60 [nfsd] [184866.788583] ? kthread_park+0x90/0x90 [184866.788585] ret_from_fork+0x35/0x40 Linux zfs-01 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux root@zfs-01:/# lsb_release -rd Description: Ubuntu 20.04 LTS Release: 20.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1889110/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp