On Tue, 18 Jul 2000, Linus Torvalds wrote: >> I managed to reproduce this and, at least for me, it is caused by a >> deadlock when kflushd tries to write out data via raid1, raid1 tries >> to allocate memory, which blocks waiting for kflushd to free up some >> memory. > >Hmm.. This is actually what "GFP_BUFFER" was meant for: GFP_BUFFER is not >atomic, but it will not block for IO. > >So for example, GFP_BUFFER can still walk the page tables and the LRU >lists (because it's not called from an interrupt context), but it will >drop only pages that don't need IO to be dropped. > >That is, of course, unless GFP_BUFFER has had bit-rot. It's simple enough >that I don't think it has, and I'd love to hear if your deadlock goes away >using GFP_BUFFER instead of GFP_ATOMIC, which would be the right thing to >do.. Firstly I tried using Neil's patch with GPF_BUFFER but it performed no differently than the original situation. Then I tried using Neil's patch as he originally wrote it and it worked slightly better. It enables me to kill the bonnie++ process and have my system return to a functional state. In this case I had started bonnie++ immidiately after booting. After a period of ~30 mins it was only 2% done resyncing, the hard drive access light was only flickering intermittantly, the rebuild speed was reported as 254K/s, and bonnie++ was in D state and didn't seem to be doing anything (but it was still killable). Neil's patch doesn't really seem to help things for me, best case is that my system is only temporarily non-functional. Worst-case is that it's as bad as it was before. NB I did my testing on 2.4.0-test2-ac2 because 2.4.0-test3 and 2.4.0-test4 won't let me even start RAID-1, it gives me error 22. Russell Coker