We have two reports of frequent crashes in btrfs where asserts in
clear_page_dirty_for_io() were triggering on missing page locks.

The crashes were much easier to trigger when processes were catching
ctrl-c's, and after much debugging it really looked like lock_page was a
noop.

This recent commit looks pretty suspect to me, and I confirmed that we
were exiting __wait_on_bit_lock() with -EINTR when it was called with
TASK_UNINTERRUPTIBLE

commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
Author: Peter Zijlstra <pet...@infradead.org>
Date:   Tue Dec 1 14:04:04 2015 +0100

    sched/wait: Fix signal handling in bit wait helpers

The patch below is mostly untested, and probably not the right solution.
Dave's trinity run doesn't explode immediately anymore, and I wanted to
get this out for discussion.  A quick look on the list doesn't show
anyone else has tracked this down, sorry if it's a dup.

Reported-by: Dave Jones <d...@fb.com>, 
Reported-by: Jon Christopherson <j...@jons.org>
Signed-off-by: Chris Mason <c...@fb.com>

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f10bd87..12f69df 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -434,6 +434,8 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct 
wait_bit_queue *q,
                ret = action(&q->key);
                if (!ret)
                        continue;
+               if (ret == -EINTR && mode == TASK_UNINTERRUPTIBLE)
+                       continue;
                abort_exclusive_wait(wq, &q->wait, mode, &q->key);
                return ret;
        } while (test_and_set_bit(q->key.bit_nr, q->key.flags));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to