Sorry, correction to above: I meant perhaps something changed since 4.4.11 since the 4.4.10 based xenial kernels seem fine (except the curious issue with lowlatency build). I've run into some stack traces regarding __pthread and some occasional failures in the spare tests with "returned 0, expected 75" on newer builds.
There's a few interesting upstream patches that potentially fix these new errors, though this should probably be brought to the attention of upstream if ztest has issues on 0.6.5-release: "Skip ctldir znode in zfs_rezget to fix snapdir issues" "OpenZFS 6739 - assumption in cv_timedwait_hires" "Fix do_div() types in condvar:timeout" Either way I think xenial should probably sync with upstream 0.6.5.x, though I understand this is a sensitive matter. I reported this upstream against 0.6.5-release about this missing "Fix ztest truncated cache file" patch which hopefully should be queued for the next point release and maybe these other issues will be discovered and fixed as well. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1587686 Title: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" Status in Native ZFS for Linux: New Status in linux package in Ubuntu: Incomplete Status in zfs-linux package in Ubuntu: In Progress Bug description: Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory" This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are effected, 0.6.5-release seems to have included the offending commit. Sorry for excessive "Affects" tagging, I'm still new to this and unsure of the proper packages to report this against and/or how to properly add the upstream issues/commits. Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129 "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file." How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space) Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 Description: Fix ztest truncated cache file "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file." Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130 I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next. (I'm unsure how to associate this bug with multiple packages but zfs- dkms and linux-image-* packages both are affected). P.S. Also of note is https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb "Fix inverted logic on none elevator comparison" - which interestingly was signed-off-by canonical but curiously not included in the xenial kernel or zfs-dkms packages. It was however, backported to 0.6.5-release upstream. To manage notifications about this bug go to: https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp