We'd love to see such a patch. we've been hampered by this problem for years.
We use apr_hash_t to look up several in-memory lexicons, each of them with over
half a million words. The initial slot array size expands from 2^n - 1, n = 4.
That wastes too much as we need to put all the hash lexicons in persistent
pools. The easiest remedy we've been using is to add a new "API" function like
this:
APR_DECLARE(apr_hash_t *) apr_hash_make_ex(apr_pool_t *pool, unsigned int
init_size)
{
apr_hash_t *ht;
ht = apr_palloc(pool, sizeof(apr_hash_t));
ht->pool = pool;
ht->free = NULL;
ht->count = 0;
ht->max = init_size; // INITIAL_MAX;
ht->array = alloc_array(ht, ht->max);
ht->hash_func = apr_hashfunc_default;
return ht;
}
which resembles the other two APR functions:
APR_DECLARE(apr_table_t *) apr_table_make(apr_pool_t *p, int nelts);
APR_DECLARE(apr_array_header_t *) apr_array_make(apr_pool_t *p, int
nelts, int elt_size);
Also, we'd like very much to see the possiblity whether apr_hash_t could be
extended to make use of the "move-to-front" technique that is presented by the
following papers:
"In-memory hash tables for accumulating text vocabularies", J. Zobel,
S. Heinz, and H.E. Williams, Information Processing Letters, 80(6):271-277,
December 2001. (http://www.cs.rmit.edu.au/~jz/fulltext/ipl01.pdf)
"Cache-conscious collision resolution in string hash tables", N.
Askitis and J. Zobel, Proceedings of the SPIRE String Processing and
Information Retrieval Symposium, November 2005, pp. 91-102. LNCS 3772.
(http://www.cs.rmit.edu.au/~jz/fulltext/spire05.pdf)
Cheers,
Bing Swen
-----Original Message-----
From: Neil Conway [mailto:[email protected]]
Sent: Tuesday, September 15, 2009 3:55 AM
To: [email protected]
Subject: apr_hash: Memory consumption
I notice that the implementation of expand_array() in
tables/apr_hash.c allocates a new bucket array, without making any
attempt to release the memory allocated for the previous bucket array.
That wastes memory: if the hash table grows exponentially, this
strategy wastes O(n) extra memory.
Is there a rational for the current behavior? If not, perhaps the
easiest fix is to allocate the bucket array in a subpool, and then
create a new subpool and destroy the old one in expand_array(). If a
patch to do this would be accepted, let me know and I'll submit one.
Neil