Excerpts from Stephane Chazelas's message of 2011-07-08 11:41:23 -0400: > 2011-07-08 11:06:08 -0400, Chris Mason: > [...] > > So the invalidate opcode in btrfs-fixup-0 is the big problem. We're > > either failing to write because we weren't able to allocate memory (and > > not dealing with it properly) or there is a bigger problem. > > > > Does the btrfs-fixup-0 oops come before or after the ooms? > > Hi Chris, thanks for looking into this. > > It comes long before. Hours before there's any problem. So it > seems unrelated.
It could be the cause of the problem. We're calling BUG() with the page locked, which means that page will never ever be freed, and since this worker thread is gone, it could be messing up various parts of the reclaim code. But, this worker thread isn't supposed to get called very often...it's really catching a corner of a corner of a strange race. So we do need to get a better understanding of why and how often. You described this workload as rsync, is there anything else running? I'd definitely try without -o compress_force. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html