Ken Moffat wrote: > When I woke up in the > morning, I was surprised to find that my OpenJDK script was still running > rm -rf for the source directory, and had been doing so for more than 5 > hours (both wall-clock time and CPU time), and was now at 99%-100% of one > CPU, according to top.
...Odd. I upgraded to 3.13.5 a day after it got released, and just yesterday saw something vaguely but not exactly similar. Firefox (same binary as before the kernel upgrade) crashed when I was mid-mouse-movement, which seemed odd, so I poked at kernel logs. There were a couple of "BUG: Bad page map in process firefox" and "BUG: bad page state in process firefox", when trying to madvise() (in zap_page_range -> unmap_single_vma) and page_fault, respectively, about 15 minutes before the crash. Then again at crash time, the same pair of BUG messages, both in int_signal; the first was down in do_group_exit -> unmap_vmas -> unmap_single_vma, and the second was down in unmap_single_vma -> release_pages -> free_pages_prepare. Then it logged "BUG: bad rss-counter state" twice, followed by "INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 4, t=18002 jiffies, g=81303, c=81302, q=7261" and "INFO: Stall ended before state dump start". And about when it logged the rcu_preempt message, CPU 4 went busy-looping in kernel space (according to gkrellm, which showed 100% in orange instead of the userspace cyan or userspace-niced green) in a kworker thread (according to top). Had to reboot to get it back (trying to exit X also hung; most likely something got scheduled onto that worker during handoff to the console driver or something like that; had to alt-sysrq-u / b to get it to actually reboot). So I guess this is a long way of saying -- are you sure the rm userspace code is what was hung, and not something in the kernel? Might be a prevalence of cosmic rays I suppose, or it might be a memory corruption bug somewhere causing issues with RCU. (OTOH this system isn't really anywhere near stock LFS, either. Not sure how different it is from yours, but it's multilib with a pretty old gcc/glibc.)
signature.asc
Description: OpenPGP digital signature
-- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page