Excerpts from Stephane Chazelas's message of 2011-07-08 11:41:23 -0400:
> 2011-07-08 11:06:08 -0400, Chris Mason:
> [...]
> > So the invalidate opcode in btrfs-fixup-0 is the big problem.  We're
> > either failing to write because we weren't able to allocate memory (and
> > not dealing with it properly) or there is a bigger problem.
> > 
> > Does the btrfs-fixup-0 oops come before or after the ooms?
> 
> Hi Chris, thanks for looking into this.
> 
> It comes long before. Hours before there's any problem. So it
> seems unrelated.

It could be the cause of the problem.  We're calling BUG() with the page
locked, which means that page will never ever be freed, and since this
worker thread is gone, it could be messing up various parts of the
reclaim code.

But, this worker thread isn't supposed to get called very often...it's
really catching a corner of a corner of a strange race.  So we do need
to get a better understanding of why and how often.

You described this workload as rsync, is there anything else running?

I'd definitely try without -o compress_force.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to