[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
** Changed in: zfs-linux (Ubuntu) Assignee: Colin Ian King (colin-king) => Dimitri John Ledkov (xnox) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
root@zfs-01:/# dpkg -l |grep -i zfs ii libzfs2linux 0.8.3-1ubuntu12.2 amd64OpenZFS filesystem library for Linux ii libzpool2linux 0.8.3-1ubuntu12.2 amd64OpenZFS pool library for Linux ii zfs-auto-snapshot1.2.4-2 all ZFS automatic snapshot service ii zfs-zed 0.8.3-1ubuntu12.2 amd64OpenZFS Event Daemon ii zfsutils-linux 0.8.3-1ubuntu12.2 amd64command-line tools to manage OpenZFS filesystems -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: New Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
When this happens, do you see "arc prune" or "z_upgrade" processes in top, using a lot of cpu? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: New Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Neither of those ring a bell. I'm trying to figure out how to reproduce as the load was very minimal at the time. There might have been a copy of a few large video files (15-50GB?)from 1 zfs array to another. Is there a way to trigger a snapshot of top next time this happens? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: New Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
I was able to capture the attached log messages during the last time window. I was doing a scrub on 2 different pools simultaneously. This may not be related to zfs, but may be related to the mpt3sas driver or the lsi card itself. It sort of looks to me like the mpt3sas driver is resetting the lsi card, which looks like a power-on event, which drops the connections to the sas switch. Is that a correct assessment? ** Attachment added: "scsireset_log_capture.txt" https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1889110/+attachment/5400025/+files/scsireset_log_capture.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: New Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Hi Rob, did the suggestions in comments #8 and #9 help? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
So it may be that the write-back-throttling (wbt) for the underlying devices is getting confused about the exact throttle rates are for these devices and somehow getting stuck. It maybe worth experimenting by disabling the throttling and seeing if this gets I/O working again. For example, to disable wbt for a device /dev/sda use: echo 0 | sudo tee /sys/block/sda/queue/wbt_lat_usec and if you need to reset it back to the default: echo -1 | sudo tee /sys/block/sda/queue/wbt_lat_usec ..use the appropriate block device name for the block devices you have attached. It may even be worth setting the wbt_lat_usec to 0 for all the block devices in your pool as early as possible after boot and see if this helps. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
I would say this is still an open issue. There is a long discussion on github about it and seems like there's currently no solution (yet?) https://github.com/openzfs/zfs/issues/9130 ** Bug watch added: github.com/openzfs/zfs/issues #9130 https://github.com/openzfs/zfs/issues/9130 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
@Rob, from the logs it does appear that you are seeing underlying issues with the devices and the error is not in ZFS. For example: Aug 7 20:59:54 zfs-01 kernel: [26266.626836] blk_update_request: I/O error, dev sdn, sector 5806167448 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0 Aug 7 20:59:54 zfs-01 kernel: [26266.628131] zio pool=Red-ZFS-RZ2-3TB vdev=/dev/sdn1 error=5 type=1 offset=2972756684800 size=24576 flags=1808b0 Aug 7 20:59:55 zfs-01 kernel: [26267.374865] sd 2:0:14:0: Power-on or device reset occurred Aug 7 20:59:55 zfs-01 kernel: [26267.374882] sd 2:0:14:0: [sdn] tag#5309 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK Aug 7 20:59:55 zfs-01 kernel: [26267.374888] sd 2:0:14:0: [sdn] tag#5309 CDB: Read(16) 88 00 00 00 00 01 5a 13 1a f8 00 00 00 58 00 00 Aug 7 20:59:55 zfs-01 kernel: [26267.374894] sd 2:0:14:0: [sdn] tag#5309 Sense Key : Unit Attention [current] Aug 7 20:59:55 zfs-01 kernel: [26267.374899] sd 2:0:14:0: [sdn] tag#5309 Add. Sense: Power on, reset, or bus device reset occurred So, it may be a physical device error, and/or a combination of issues with the drive and/or controller. It may be worth double checking the H/W at this point. It does appear ZFS gets blocked on device failures. ** Changed in: zfs-linux (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Can you double check the H/W and let me know if this resolves the ZFS issue? ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Hi Colin, It seems to happen on 2 different controllers (both LSI). I'm using the 9201-16e at the moment and it performs much faster overall, but doing something like a zfs scrub on a pool still causes the resets and thus zfs locks as part of those. Basically seems to be under heavy IO load. Although, this controller can handle much heavier IO that just a single scrub. For example, it seems much more likely to happen on my 3.5" 2TB and 3TB HDD drives, as opposed to my 2.5" 1TB SSD's. The SSD's scream through a scrub in about an hour, whereas the HDD's take a day or more. The only thing I can think of is maybe to increase direct cooling on the controller in case it's overheating. But this is a Dell R710 server chassis with lots of high volume airflow. It's really hard to pinpoint the problem between controller, driver, and filesystem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Hi Rob, It may be worth exercising the raw device by performing a lot of random I/O reads rather that via zfs, that way we can eliminate zfs out of the equation to see if it's a controller or driver issue and not a zfs error. That way we can at least use occams razor to remove a lot of software complexity from the debugging process. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
The write-back-throttling didn't do anything different. I have not had an opportunity yet to power down the unit and install extra fans directly on the card to see if that helps. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Hi Rob, is this still an open issue? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl] [18
[Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
** Changed in: zfs-linux (Ubuntu) Assignee: Dimitri John Ledkov (xnox) => (unassigned) ** Changed in: zfs-linux (Ubuntu) Status: Incomplete => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Invalid Bug description: ZFS filesystem becomes unresponsive and subsequent NFS shares unresponsive. ESXi sees all paths down. See this error 3 times in a row. [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds. [184383.479565] Tainted: P IO 5.4.0-42-generic #46-Ubuntu [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [184383.479655] txg_syncD0 4307 2 0x80004000 [184383.479658] Call Trace: [184383.479670] __schedule+0x2e3/0x740 [184383.479673] schedule+0x42/0xb0 [184383.479676] schedule_timeout+0x152/0x2f0 [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 [184383.479685] io_schedule_timeout+0x1e/0x50 [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] [184383.479702] ? wait_woken+0x80/0x80 [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] [184383.479816] zio_wait+0x11b/0x230 [zfs] [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] [184383.480156] spa_sync+0x312/0x5b0 [zfs] [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl]
Re: [Kernel-packages] [Bug 1889110] Re: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. "
Probably not. System has been pretty stable lately and I haven't been pushing it too hard either. On Fri, Apr 9, 2021 at 10:35 AM Colin Ian King <1889...@bugs.launchpad.net> wrote: > Hi Rob, is this still an open issue? > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1889110 > > Title: > zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than > 120 seconds. " > > Status in zfs-linux package in Ubuntu: > Incomplete > > Bug description: > ZFS filesystem becomes unresponsive and subsequent NFS shares > unresponsive. ESXi sees all paths down. > > See this error 3 times in a row. > > > [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 > seconds. > > [184383.479565] Tainted: P IO 5.4.0-42-generic > #46-Ubuntu > > [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > > [184383.479655] txg_syncD0 4307 2 0x80004000 > > > [184383.479658] Call Trace: > > > [184383.479670] __schedule+0x2e3/0x740 > > > [184383.479673] schedule+0x42/0xb0 > > > [184383.479676] schedule_timeout+0x152/0x2f0 > > > [184383.479683] ? __next_timer_interrupt+0xe0/0xe0 > > > [184383.479685] io_schedule_timeout+0x1e/0x50 > > > [184383.479697] __cv_timedwait_common+0x15e/0x1c0 [spl] > > > [184383.479702] ? wait_woken+0x80/0x80 > > > [184383.479710] __cv_timedwait_io+0x19/0x20 [spl] > > > [184383.479816] zio_wait+0x11b/0x230 [zfs] > > > [184383.479905] ? __raw_spin_unlock+0x9/0x10 [zfs] > > > [184383.479983] dsl_pool_sync+0xbc/0x410 [zfs] > > > [184383.480069] spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs] > > > [184383.480156] spa_sync+0x312/0x5b0 [zfs] > > > [184383.480245] txg_sync_thread+0x27a/0x310 [zfs] > > > [184383.480334] ? txg_dispatch_callbacks+0x100/0x100 [zfs] > > > [184383.480344] thread_generic_wrapper+0x83/0xa0 [spl] > > > [184383.480347] kthread+0x104/0x140 > > > [184383.480356] ? clear_bit+0x20/0x20 [spl] > > > [184383.480358] ? kthread_park+0x90/0x90 > > > [184383.480361] ret_from_fork+0x35/0x40 > > > > Then nfsd hangs as well. > > > [184866.787445] INFO: task nfsd:6585 blocked for more than 120 seconds. > [184866.787485] Tainted: P IO 5.4.0-42-generic > #46-Ubuntu > [184866.787526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [184866.787573] nfsdD0 6585 2 0x80004000 > [184866.787575] Call Trace: > [184866.787578] __schedule+0x2e3/0x740 > [184866.787675] ? __raw_spin_unlock+0x9/0x10 [zfs] > [184866.787678] schedule+0x42/0xb0 > [184866.787685] cv_wait_common+0x133/0x180 [spl] > [184866.787688] ? wait_woken+0x80/0x80 > [184866.787695] __cv_wait+0x15/0x20 [spl] > [184866.787764] dmu_tx_wait+0x1ee/0x210 [zfs] > [184866.787834] dmu_tx_assign+0x49/0x70 [zfs] > [184866.787929] zfs_write+0x461/0xd40 [zfs] > [184866.788025] ? atomic_sub_return.constprop.0+0xd/0x20 [zfs] > [184866.788033] ? atomic_dec+0xd/0x20 [spl] > [184866.788116] ? __raw_spin_unlock+0x9/0x10 [zfs] > [184866.788122] ? __d_obtain_alias+0x36/0x90 > [184866.788217] zpl_write_common_iovec+0xad/0x120 [zfs] > [184866.788313] zpl_iter_write_common+0x8e/0xb0 [zfs] > [184866.788409] zpl_iter_write+0x56/0x90 [zfs] > [184866.788413] do_iter_readv_writev+0x14f/0x1d0 > [184866.788416] do_iter_write+0x84/0x1a0 > [184866.788418] vfs_iter_write+0x19/0x30 > [184866.788442] nfsd_vfs_write+0xe0/0x480 [nfsd] > [184866.788454] nfsd_write+0x7a/0x160 [nfsd] > [184866.788458] ? kmem_cache_alloc+0x16d/0x230 > [184866.788472] nfsd3_proc_write+0xc3/0x170 [nfsd] > [184866.788483] nfsd_dispatch+0xd6/0x220 [nfsd] > [184866.788508] svc_process_common+0x3af/0x700 [sunrpc] > [184866.788527] ? svc_sock_secure_port+0x16/0x30 [sunrpc] > [184866.788538] ? nfsd_svc+0x2d0/0x2d0 [nfsd] > [184866.788557] svc_process+0xd9/0x110 [sunrpc] > [184866.788568] nfsd+0xe8/0x150 [nfsd] > [184866.788570] kthread+0x104/0x140 > [184866.788581] ? nfsd_destroy+0x60/0x60 [nfsd] > [184866.788583] ? kthread_park+0x90/0x90 > [184866.788585] ret_from_fork+0x35/0x40 > > > Linux zfs-01 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC > 2020 x86_64 x86_64 x86_64 GNU/Linux > > root@zfs-01:/# lsb_release -rd > Description:Ubuntu 20.04 LTS > Release:20.04 > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1889110/+subscriptions > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1889110 Title: zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than 120 seconds. " Status in zfs-linux package in Ubuntu: Incomplete Bug description: ZFS filesystem becomes unresponsiv