Thanks tons for the tests Ryan! Well, at least the hung task timeout trace is different, so we're making some progress.
With the new kernel it seems that we're stuck in bch_bucket_alloc(). I've identified other upstream fixes that could help to prevent this problem. If you're willing to do few more tests, here's a new test kernel (based on 4.15.0-54-generic + set of bcache upstream fixes): https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-54.58+lp1796292/ And, just in case, I've also applied the same set of fixes also to the latest bionic's master-next: https://kernel.ubuntu.com/~arighi/LP-1796292/4.15.0-55.60+lp1796292/ Testing these two kernels should give us more information about the nature of the problem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures Status in curtin: Fix Released Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: New Status in linux source package in Cosmic: New Status in linux source package in Disco: New Status in linux source package in Eoan: Confirmed Bug description: I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable. --- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000 +++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000 @@ -263,7 +263,7 @@ return _subp(*args, **kwargs) -def wait_for_removal(path, retries=[1, 3, 5, 7]): +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): if not path: raise ValueError('wait_for_removal: missing path parameter') To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp