Hello,

Perhaps I found something weird while running 9.2-RC3 FreeBSD
9.2-RC3 #0 r255393 (ZFS-only setup).

Quick history of the problem:

- Lately, using a very recent -STABLE, the host would hang randomly while
  building ports with poudriere (-J2) and using X11, without producing a
  core dump (solid deadlock, apparently). It works perfectly when using the
  console only, and it can run a large build overnight without hanging.
  Being on X11 I could not find out what was happening on the console;
  desktop PC does not have a proper serial port so there's not much I can
  see. In any case it does not reboot automatically.

- To rule out recent -STABLE changes I moved to 9.2-RC3 using SVN, but the
  system kept hanging on the same conditions.

- I also enabled DDB to get a minidump, but still I could only get solid
  locks.

- I downgraded the nvidia-driver port, just in case it has something to do
  with the crashes, but the crashes continued.

- I downgraded to a known-safe -STABLE of July, then June, but the host
  would still crash. The very weird thing is that I have been always
  building stuff while using X11, and it never hanged. After downgrading
  both the OS and nvidia-driver I effectively got back a configuration that
  did not hang at the time, but the issue persisted.

- However, this time I managed to get a minidump from the old -STABLE. I
  saved it here:

    http://olgeni.olgeni.com/~olgeni/core.txt.0

- After seeing the reference to kqueue, I remembered another thing that
  changed when the crashes started: gio-fam-backend went away, and glib20
  uses kqueue (r324037).

- I tried the same workload while using X11 with openbox only, and it
  worked fine.

- Then, I came back to Gnome but made sure that anything related to gvfsd
  was periodically killed by a script, and the system returned to normal
  (i.e. flawless builds).

- I remember that the gamin implementation uses to open and poll a lot of
  files, even files that were not used by the X11 environment or Nautilus
  specifically, and the gamin daemon could steal a good 5% of CPU for
  polling; restarting it brought it to 0%.

- Not sure if it is related in any way, but running a standard "buildworld"
  does not crash the host. The only difference that I could think of is
  that poudriere uses jails.

Unfortunately I'm not able to get a minidump for the latest RC, but at this
point I suspect that something is going on with glib20 and kqueue on both
-STABLE and -RC.

If anybody has any idea I can test it easily, as it usually takes only a
few minutes to hang everything.

--
jimmy
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to