I added some instrumentation to apr_sms_trivial_malloc
to study in more detail where its bottlenecks were.
As a result, I found a few interesting phenomena:
1. The good news is that allocations of small amounts
of memory are very efficient. They almost always
take the fastest path through the code, in which
some available space is reserved from the
"sms->used_sentinel.prev" block with a handful of
pointer arithmetic operations.
2. The bad news is that allocations for larger blocks
(in the >=8KB range) typically require a call to the
parent SMS to get data. On my test machine, I'm seeing
elapsed times in the 30 microsecond range when this
happens, compared to less than 1 microsecond for small
allocations that don't require more memory from the
parent SMS. And when an allocation falls through to
the parent, it often seems to fall all the way through
to the root SMS (I suspect that 30us includes a malloc).
The problem seems to be particularly bad for things that
create subrequests, like mod_include.
3. The worse news is that there seems to be lot of
fragmentation. For example, I saw this pattern
during a server-parsed request:
- the application code requests 12296 bytes
from a pool
- not enough memory is available in the SMS, so it
requests 16400 bytes from its parent SMS.
- the parent SMS doesn't have enough free space
either, so it requests 20504 bytes from the
grandparent SMS.
- the grandparent SMS doesn't have enough space
either, but it has to iterate through 15 blocks
its free list to figure that out. Each of these
blocks has between 8176 and 12272 bytes available.
- the grandparent calls through to the great-grandparent
to get 24608 bytes. The great-grandparent doesn't
have a block with that much free space, but it
iterates through 9 blocks in its free list in
search of one; all of these blocks had 16376 bytes
free.
- the great-grandparent thus requests 28712 bytes from
the great-great grandparent. The great-great-grandparent
doesn't have any blocks in its free list, so it calls
through to its parent, which at last is an SMS that
does a real malloc.
This type of pattern may explain the reported higher memory
use of the SMS-based httpd compared with the original pools;
there's a lot of memory in those free lists that can't be
used in this example.
For an SMS that's going to be a parent of other SMSs, we'll
need something with more sophisticated policies for reassigning
freed space than the current trivial-SMS.
--Brian