On Sat, Mar 30, 2013 at 10:46 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Fri, Mar 29, 2013 at 8:02 PM, Emmanuel Benisty <benist...@gmail.com> wrote: >> >> Then I start building a random package and the problems start. They >> may also happen without compiling but this seems to trigger the bug >> quite quickly. > > I suspect it's about preemption, and the build just results in enough > scheduling load that you start hitting whatever race there is. > >> Anyway, some progress here, I hope: dmesg seems to be >> willing to reveal some secrets (using some pastebin service since this >> is pretty big): >> >> https://gist.github.com/anonymous/5275120 > > That looks like exactly the exit_sem() bug that Davidlohr was talking > about, where the > > /* exit_sem raced with IPC_RMID, nothing to do */ > if (IS_ERR(sma)) > continue; > > should be moved to *before* the > > sem_lock(sma, NULL, -1); > > call. And apparently the bug I had found is already fixed in -next.
I just tried the 7 original patches + the 2 one liners from -next + modified Linus' patch (attached) on the top of 3.9-rc4 using PREEMPT_NONE and after moving sem_lock(sma, NULL, -1) as explained above. I was building two packages at the same time, went away for 30 seconds, came back and everything froze as soon as I touched the laptop's touchpad. Maybe a coincidence but anyway... Another shot in the dark, I had this weird message when trying to build gcc: semop(2): encountered an error: Identifier removed
patch.diff
Description: Binary data