[PATCH] APR pool scalability and efficiency

Stefan Fuhrmann Sat, 10 May 2014 19:24:25 -0700

Hi there,

As part of my Subversion scalability testing, I ran into
two issues that were related to APR pools.


The first and more important one was getting an OOM
error because we hit the mmap region count limit on
Ubuntu (mmap seems to be enabled in the system libs
at least since 12.04 LTS). Its root cause was a large data
structure (many small allocations) being built up in a single
pool. That times multiple threads / concurrent requests
exceeded the ~0.5GB limit (~64k regions x 8kB).
[apr-pool-growth.*]

In a related scenario, we would allocate a large buffer
(e.g. 64kB) in a temporary pool, clear the pool and then
create another pool that is small (few allocations) but
relatively long-lived. The APR allocator would then tend
to reuse the large node for the new pool - wasting quite
a bit of memory. [apr-allocator-efficiency.*]

Attached are patches against apr/trunk to both issues plus
the corresponding commit messages. Neither should change
the behavior of "common" pool usage but rather improve
"worst-case" behavior.

-- Stefan^2.

Don't waste memory when creating or allocating from small, long-lived
APR pools after clearing or destroying pools that allocated larger
nodes.

When allocating nodes from the bucket allocator (<= 80k or 20 pages),
we would eagerly reuse *any* free node that is at least the minimum
size.  Depending on the pool usage scheme, that extra memory would
never be used.  This patch limits the node to approximate twice the
minimum.

* memory/unix/apr_pools.c
  (allocator_alloc): When searching the buckets for free nodes, limit
                     the range to twice the starting index. 

Patch by: stefan2

Index: memory/unix/apr_pools.c
===================================================================
--- memory/unix/apr_pools.c	(revision 1593752)
+++ memory/unix/apr_pools.c	(working copy)
@@ -239,7 +239,7 @@ static APR_INLINE
 apr_memnode_t *allocator_alloc(apr_allocator_t *allocator, apr_size_t in_size)
 {
     apr_memnode_t *node, **ref;
-    apr_uint32_t max_index;
+    apr_uint32_t max_index, upper_index;
     apr_size_t size, i, index;
 
     /* Round up the block size to the next boundary, but always
@@ -273,6 +273,11 @@ apr_memnode_t *allocator_alloc(apr_alloc
         /* Walk the free list to see if there are
          * any nodes on it of the requested size
          *
+         * If there is no exact match, look for nodes
+         * of up to twice the requested size, so we
+         * won't unnecessarily allocate more memory
+         * nor waste too much of what we have.
+         *
          * NOTE: an optimization would be to check
          * allocator->free[index] first and if no
          * node is present, directly use
@@ -281,9 +286,10 @@ apr_memnode_t *allocator_alloc(apr_alloc
          * memory waste.
          */
         max_index = allocator->max_index;
+        upper_index = 2 * index < max_index ? 2 * index : max_index;
         ref = &allocator->free[index];
         i = index;
-        while (*ref == NULL && i < max_index) {
+        while (*ref == NULL && i < upper_index) {
            ref++;
            i++;
         }

Reduce the number of OS-side memory allocations when allocating many
small elements from a single APR pool.  This reduces the probability
of hitting the mmap region count limit on systems that have it and use
mmap-based allocators (e.g. Ubuntu).

While e.g. building a large hash, large amounts of memory would be
allocated through many 8k OS-side allocations.  Instead, allocate
progressively larger nodes as the total size of the pool grows.
Also, don't apply this logic if the user allocates larger objects
as they might want a higher control over the actual allocation scheme.

The node growth scheme implemented here allows for TB-sized pools
without exceeding the 65530 mmap region default on Ubuntu.  It also
keeps the node size lower than a naive exponential scheme would and
the allocation sizes are power-of-two.  Both helps fighting address
space fragmentation.

* memory/unix/apr_pools.c
  (apr_pool_t): Add a field to track the current pool size.
  (apr_palloc): When the pool needs to allocate a new node to serve
                a small allocation request, use the new node growth
                scheme.  Always update the total pool size after
                allocating new nodes.
  (apr_pool_clear): Update / reset the pool size.
  (apr_pool_create,
   apr_pool_create_ex): Initialize the new struct member.

Patch by: stefan2

Index: memory/unix/apr_pools.c
===================================================================
--- memory/unix/apr_pools.c	(revision 1593752)
+++ memory/unix/apr_pools.c	(working copy)
@@ -575,6 +581,7 @@ struct apr_pool_t {
     volatile apr_uint32_t in_use;
     apr_os_thread_t       in_use_by;
 #endif /* APR_POOL_CONCURRENCY_CHECK */
+    apr_size_t            total_alloc; /* sum of pages across all nodes */
 };
 
 #define SIZEOF_POOL_T       APR_ALIGN_DEFAULT(sizeof(apr_pool_t))
@@ -803,13 +810,48 @@ APR_DECLARE(void *) apr_palloc(apr_pool_
         list_remove(node);
     }
     else {
-        if ((node = allocator_alloc(pool->allocator, size)) == NULL) {
+        /* If we allocate lots of small blocks in the same pool, we should
+         * ask for progressively larger nodes such that the number of malloc()
+         * or mmap() calls is reduced. */
+
+        /* Is this a smallish allocation?  Only then will we potentially
+         * request a larger node.*/
+        apr_size_t max_bucket_alloc = MAX_INDEX << BOUNDARY_INDEX;
+        apr_size_t to_alloc;
+        if (size > max_bucket_alloc / 4)
+          {
+            /* If the user requested a larger allocation, they may know best
+             * how to control their allocation pattern.  So, let them have
+             * their will and do as they asked. */
+            to_alloc = size;
+          }
+        else
+          {
+            /* Slowly(!) grow the minimum node size.  We don't want to be
+             * wasteful here.  Quadruple the amount of memory allocated
+             * before doubling the allocation size.  This also keeps the
+             * OS-side allocation size low accounting for possibly pre-
+             * existing address space fragmentation. */
+            apr_size_t min_alloc;
+            apr_size_t shift = 1;
+            while ((pool->total_alloc >> (2 * shift)) > 1)
+              ++shift;
+
+            /* To minimize further address space fragmentation, allocate
+             * nodes that are a power-of-two times the page size. */
+            min_alloc = (BOUNDARY_SIZE << shift) - APR_MEMNODE_T_SIZE;
+            to_alloc = size < min_alloc ? min_alloc : size;
+          }
+
+        if ((node = allocator_alloc(pool->allocator, to_alloc)) == NULL) {
             pool_concurrency_set_idle(pool);
             if (pool->abort_fn)
                 pool->abort_fn(APR_ENOMEM);
 
             return NULL;
         }
+
+        pool->total_alloc += node->index + 1;
     }
 
     node->free_index = 0;
@@ -929,6 +971,8 @@ APR_DECLARE(void) apr_pool_clear(apr_poo
     active->next = active;
     active->ref = &active->next;
 
+    pool->total_alloc = active->index + 1;
+
     pool_concurrency_set_idle(pool);
 }
 
@@ -1058,6 +1102,7 @@ APR_DECLARE(apr_status_t) apr_pool_creat
     pool->self_first_avail = (char *)pool + SIZEOF_POOL_T;
 #endif
     node->first_avail = pool->self_first_avail;
+    pool->total_alloc = node->index + 1;
 
     pool->allocator = allocator;
     pool->active = pool->self = node;
@@ -1145,6 +1190,7 @@ APR_DECLARE(apr_status_t) apr_pool_creat
 
     pool = (apr_pool_t *)node->first_avail;
     node->first_avail = pool->self_first_avail = (char *)pool + SIZEOF_POOL_T;
+    pool->total_alloc = node->index + 1;
 
     pool->allocator = pool_allocator;
     pool->active = pool->self = node;

[PATCH] APR pool scalability and efficiency

Reply via email to