John Baldwin wrote:
On Sunday 15 June 2008 07:23:19 am Stef Walter wrote:
I've been trying to track down a deadlock on some newish production
servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a
specific (although mundane) hardware configuration, and each of several
servers running this hardware deadlock about once per week.

Although I suspect that this is not hardware related, from a (naive)
perusal of the attached stack traces.

Forgive me if my interpretation of this is all wrong, but I'm pretty
desperate for help. So here's my basic understanding of the deadlock:

These processes seem to be waiting on the page queue mutex:
 sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter)
 bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc)
 httpd (in trap > trap_pfault > vm_fault)
 [g_up] (in g_vfs_done > bufdone)

The page queue mutex is held by rsync process:
 rsync (in trap > trap_pfault > vm_fault > pmap_enter)

Rsync kernel process (in pmap_enter) was interrupted while holding the
page queue lock?


Giant is enabled in loader.conf due to the needs of the pf firewall when
dealing with user credentials lookups. I do not believe that Giant plays
into this deadlock. Kernel config attached.

Any and all help or info is welcome. Thanks in advance.

Try this change:

jhb         2007-10-27 22:07:40 UTC

  FreeBSD src repository

  Modified files:
    sys/kern             sched_4bsd.c
  Log:
  Change the roundrobin implementation in the 4BSD scheduler to trigger a
  userland preemption directly from hardclock() via sched_clock() when a
  thread uses up a full quantum instead of using a periodic timeout to cause
  a userland preemption every so often.  This fixes a potential deadlock
  when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
  by a thread pinned or bound to another CPU.  The current thread on that
  CPU will never be preempted while softclock is blocked.

  Note that ULE already drives its round-robin userland preemption from
  sched_clock() as well and always enables IPI_PREEMPT.

  MFC after:      1 week

  Revision  Changes    Path
  1.108     +8 -29     src/sys/kern/sched_4bsd.c

We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD when softclock() (swi4: clock) blocks on a lock like Giant.

I've been seeing similar troubles on 6.2 and I'll have to give this a try as we upgrade to 6.3. I notice "MFC after: 1 week" in the log; it's been a week - any chance of seeing this fix rolled into 6.x?

- Jamie
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to