Some additional info about the deadlock:
crash> bt 16588
PID: 16588 TASK: ffff9ffd7f332b00 CPU: 1 COMMAND: "bcache_allocato"
[exception RIP: bch_crc64+57]
RIP: ffffffffc093b2c9 RSP: ffffab9585767e28 RFLAGS: 00000286
RAX: f1f51403756de2bd RBX: 0000000000000000 RCX: 0000000000000065
RDX: 0000000000000065 RSI: ffff9ffd63980000 RDI: ffff9ffd63925346
RBP: ffffab9585767e28 R8: ffffffffc093db60 R9: ffffab9585739000
R10: 000000000000007f R11: 000000001ffef001 R12: 0000000000000000
R13: 0000000000000008 R14: ffff9ffd63900000 R15: ffff9ffd683d0000
CS: 0010 SS: 0018
#0 [ffffab9585767e30] bch_prio_write at ffffffffc09325c0 [bcache]
#1 [ffffab9585767eb0] bch_allocator_thread at ffffffffc091bdc5 [bcache]
#2 [ffffab9585767f08] kthread at ffffffffa80b2481
#3 [ffffab9585767f50] ret_from_fork at ffffffffa8a00205
crash> bt 14658
PID: 14658 TASK: ffff9ffd7a9f0000 CPU: 0 COMMAND: "python3"
#0 [ffffab958380bb48] __schedule at ffffffffa89ae441
#1 [ffffab958380bbe8] schedule at ffffffffa89aea7c
#2 [ffffab958380bbf8] bch_bucket_alloc at ffffffffc091c370 [bcache]
#3 [ffffab958380bc68] __bch_bucket_alloc_set at ffffffffc091c5ce [bcache]
#4 [ffffab958380bcb8] bch_bucket_alloc_set at ffffffffc091c66e [bcache]
#5 [ffffab958380bcf8] __uuid_write at ffffffffc0931b69 [bcache]
#6 [ffffab958380bda0] bch_uuid_write at ffffffffc0931f76 [bcache]
#7 [ffffab958380bdc0] __cached_dev_store at ffffffffc0937c08 [bcache]
#8 [ffffab958380be20] bch_cached_dev_store at ffffffffc0938309 [bcache]
#9 [ffffab958380be50] sysfs_kf_write at ffffffffa830c97c
#10 [ffffab958380be60] kernfs_fop_write at ffffffffa830c3e5
#11 [ffffab958380bea0] __vfs_write at ffffffffa827e5bb
#12 [ffffab958380beb0] vfs_write at ffffffffa827e781
#13 [ffffab958380bee8] sys_write at ffffffffa827e9fc
#14 [ffffab958380bf30] do_syscall_64 at ffffffffa8003b03
#15 [ffffab958380bf50] entry_SYSCALL_64_after_hwframe at ffffffffa8a00081
RIP: 00007faffc7bd154 RSP: 00007ffe307cbc88 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007faffc7bd154
RDX: 0000000000000008 RSI: 00000000011ce7f0 RDI: 0000000000000003
RBP: 00007faffccb86c0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000011ce7f0 R15: 0000000000f33e60
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
In this case the task "python3" (pid 14658) gets stuck in a wait that
never completes from bch_bucket_alloc(). The task should that should
resume "python3" from this wait is "bcache_allocator" (pid 16588), but
the resume never happens, because bcache_allocator is stuck in this
"retry_invalidate" busy loop:
static int bch_allocator_thread(void *arg)
{
...
retry_invalidate:
allocator_wait(ca, ca->set->gc_mark_valid &&
!ca->invalidate_needs_gc);
invalidate_buckets(ca);
/*
* Now, we write their new gens to disk so we can start writing
* new stuff to them:
*/
allocator_wait(ca, !atomic_read(&ca->set->prio_blocked));
if (CACHE_SYNC(&ca->set->sb)) {
/*
* This could deadlock if an allocation with a btree
* node locked ever blocked - having the btree node
* locked would block garbage collection, but here we're
* waiting on garbage collection before we invalidate
* and free anything.
*
* But this should be safe since the btree code always
* uses btree_check_reserve() before allocating now, and
* if it fails it blocks without btree nodes locked.
*/
if (!fifo_full(&ca->free_inc))
goto retry_invalidate;
if (bch_prio_write(ca, false) < 0) {
ca->invalidate_needs_gc = 1;
wake_up_gc(ca->set);
goto retry_invalidate;
}
}
}
...
The exact code path is this: bch_prio_write() fails, because it calls
bch_bucket_alloc() that fails (out of free buckets), it's waking up the
garbage collector (trying to free up some buckets) and it goes back to
retry_invalidate, but it's not enough apprently; bch_prio_write() is
going to fail over and over again (due to no buckets available), unable
to break out of the busy loop => deadlock.
Looking better at the code it seems safe to resume the bcache_allocator
main loop when bch_prio_write() fails (still keeping the wake_up event
to the garbage collector), instead of going back to retry_invalidate.
This should give the allocator a better chance to free up some buckets,
possibly preventing the "out of buckets" deadlock condition.
I'm currently testing a kernel with this change, if I can't trigger any
deadlock in the next hour or so, I'll upload a new test kernel.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1796292
Title:
Tight timeout for bcache removal causes spurious failures
Status in curtin:
Fix Released
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Bionic:
Confirmed
Status in linux source package in Cosmic:
Confirmed
Status in linux source package in Disco:
Confirmed
Status in linux source package in Eoan:
Confirmed
Bug description:
I've had a number of deployment faults where curtin would report
Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass-
deployment of 30+ nodes. Upon retrying the node would usually deploy
fine. Experimentally I've set the timeout ridiculously high, and it
seems I'm getting no faults with this. I'm wondering if the timeout
for removal is set too tight, or might need to be made configurable.
--- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000
+++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000
@@ -263,7 +263,7 @@
return _subp(*args, **kwargs)
-def wait_for_removal(path, retries=[1, 3, 5, 7]):
+def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]):
if not path:
raise ValueError('wait_for_removal: missing path parameter')
To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp