On Tue, 18 Jul 2000, Linus Torvalds wrote:
>>  I managed to reproduce this and, at least for me, it is caused by a
>>  deadlock when kflushd tries to write out data via raid1, raid1 tries
>>  to allocate memory, which blocks waiting for kflushd to free up some
>>  memory.
>
>Hmm.. This is actually what "GFP_BUFFER" was meant for: GFP_BUFFER is not
>atomic, but it will not block for IO.
>
>So for example, GFP_BUFFER can still walk the page tables and the LRU
>lists (because it's not called from an interrupt context), but it will
>drop only pages that don't need IO to be dropped.
>
>That is, of course, unless GFP_BUFFER has had bit-rot. It's simple enough
>that I don't think it has, and I'd love to hear if your deadlock goes away
>using GFP_BUFFER instead of GFP_ATOMIC, which would be the right thing to
>do..

Firstly I tried using Neil's patch with GPF_BUFFER but it performed no
differently than the original situation.

Then I tried using Neil's patch as he originally wrote it and it worked
slightly better.  It enables me to kill the bonnie++ process and have my
system return to a functional state.  In this case I had started bonnie++
immidiately after booting.  After a period of ~30 mins it was only 2% done
resyncing, the hard drive access light was only flickering intermittantly,
the rebuild speed was reported as 254K/s, and bonnie++ was in D state and
didn't seem to be doing anything (but it was still killable).


Neil's patch doesn't really seem to help things for me, best case is that my
system is only temporarily non-functional.  Worst-case is that it's as bad as
it was before.


NB  I did my testing on 2.4.0-test2-ac2 because 2.4.0-test3 and 2.4.0-test4
won't let me even start RAID-1, it gives me error 22.



Russell Coker

Reply via email to