Hi -

I've learned some interesting things about libpager that I'd like to share
with the list.

I've found two bugs in the existing code that were "fixed" by the new
demuxer that sequentially services requests on each pager object.

For example, the data return code sets a flag PAGINGOUT in the pagemap
before it starts calling pager_write_page (which could be slow; writing to
a hard drive, say).  Future data returns check the PAGINGOUT flag and wait
on a condition variable if it's set.  The problem is that if multiple
threads start waiting on that, pthreads doesn't guarantee what order they
will run in when the conditional variable is signaled, so the data writes
can get reordered.  If three data returns come in 1, 2, 3, (maybe because
pager_sync is called three times), number 1 starts writing, but if it
doesn't finish quick enough, 2 and 3 can get reordered.

Except that they can't.  The new demux code queues the second and third
writes.  They don't process until the first one is done.  The pager object
is essentially locked until the pager_write_page() completes.

I went so far as to write a test case to exercise the bug!  Just good
coding practice - develop tests for your known bugs first.  Then I ran it,
and it couldn't reproduce the bug!  Only after thinking about the code more
did I understand why.

I know the demuxer code was rewritten to avoid thread storms, but it's
obviously got some issues and could become a performance bottleneck at some
point.  There's no good reason to block all access to page 100 while a disk
operation completes on page 1.  I'm not looking to re-write it right now,
but I'm curious.  Does anybody remember what characterized the thread
storms?  What conditions triggered them?  What kind of pager operations
were being done?

    agape
    brent

Reply via email to