I added some instrumentation to apr_sms_trivial_malloc
to study in more detail where its bottlenecks were.

As a result, I found a few interesting phenomena:

1. The good news is that allocations of small amounts
  of memory are very efficient.  They almost always
  take the fastest path through the code, in which
  some available space is reserved from the
  "sms->used_sentinel.prev" block with a handful of
  pointer arithmetic operations.

2. The bad news is that allocations for larger blocks
  (in the >=8KB range) typically require a call to the
  parent SMS to get data.  On my test machine, I'm seeing
  elapsed times in the 30 microsecond range when this
  happens, compared to less than 1 microsecond for small
  allocations that don't require more memory from the
  parent SMS.  And when an allocation falls through to
  the parent, it often seems to fall all the way through
  to the root SMS (I suspect that 30us includes a malloc).
  The problem seems to be particularly bad for things that
  create subrequests, like mod_include.

3. The worse news is that there seems to be lot of
  fragmentation.  For example, I saw this pattern
  during a server-parsed request:
    - the application code requests 12296 bytes
      from a pool
    - not enough memory is available in the SMS, so it
      requests 16400 bytes from its parent SMS.
    - the parent SMS doesn't have enough free space
      either, so it requests 20504 bytes from the
      grandparent SMS.
    - the grandparent SMS doesn't have enough space
      either, but it has to iterate through 15 blocks
      its free list to figure that out.  Each of these
      blocks has between 8176 and 12272 bytes available.
    - the grandparent calls through to the great-grandparent
      to get 24608 bytes.  The great-grandparent doesn't
      have a block with that much free space, but it
      iterates through 9 blocks in its free list in
      search of one; all of these blocks had 16376 bytes
      free.
    - the great-grandparent thus requests 28712 bytes from
      the great-great grandparent.  The great-great-grandparent
      doesn't have any blocks in its free list, so it calls
      through to its parent, which at last is an SMS that
      does a real malloc.
  This type of pattern may explain the reported higher memory
  use of the SMS-based httpd compared with the original pools;
  there's a lot of memory in those free lists that can't be
  used in this example.

For an SMS that's going to be a parent of other SMSs, we'll
need something with more sophisticated policies for reassigning
freed space than the current trivial-SMS.

--Brian





Reply via email to