On Mon, Sep 14, 2009 at 7:43 PM, Bing Swen <bs...@pku.edu.cn> wrote: > We'd love to see such a patch. we've been hampered by this problem for years.
Attached is a patch that implements the scheme I suggested earlier (allocating the bucket array in a subpool). The patch is against the 1.3 branch. Unfortunately, the APR hash table API doesn't provide any means to report errors, so we can't easily propagate apr_pool_create() failures back to the client program. Presuming changing the APR hash API isn't an option for 1.3.x / 1.4.x, is there a better solution than just ignoring apr_pool_create() failures? I've also attached a trivial benchmark program that performs 10 million insertions. Vanilla APR 1.3 branch (r813588): ./testhashperf 0.42s user 0.10s system 96% cpu 0.537 total ./testhashperf 0.42s user 0.09s system 97% cpu 0.526 total ./testhashperf 0.42s user 0.09s system 97% cpu 0.527 total ./testhashperf 0.42s user 0.09s system 97% cpu 0.526 total ./testhashperf 0.42s user 0.10s system 98% cpu 0.526 total ./testhashperf 0.42s user 0.10s system 98% cpu 0.527 total With the patch applied: ./testhashperf 0.39s user 0.10s system 97% cpu 0.501 total ./testhashperf 0.39s user 0.10s system 98% cpu 0.501 total ./testhashperf 0.39s user 0.09s system 97% cpu 0.502 total ./testhashperf 0.39s user 0.10s system 97% cpu 0.501 total ./testhashperf 0.39s user 0.10s system 98% cpu 0.501 total ./testhashperf 0.39s user 0.10s system 97% cpu 0.501 total (OSX 10.6, gcc 4.2.1, "-O2") > The easiest remedy we've been using is to add a new "API" function like this: > > APR_DECLARE(apr_hash_t *) apr_hash_make_ex(apr_pool_t *pool, unsigned int > init_size) Yeah, I added something similar to my local copy in the meanwhile. When the hash table size is predictable, this avoids the excess memory consumption, but can also improve performance significantly: I've found expand_array() to be quite expensive when a hash table grows from empty to a few million entries. Such an API might be worth considering for APR 1.4 or 2.0. > Also, we'd like very much to see the possiblity whether apr_hash_t could be > extended to make use of the "move-to-front" technique that is presented by > the following papers: > > "In-memory hash tables for accumulating text vocabularies", J. Zobel, > S. Heinz, and H.E. Williams, Information Processing Letters, 80(6):271-277, > December 2001. (http://www.cs.rmit.edu.au/~jz/fulltext/ipl01.pdf) > > "Cache-conscious collision resolution in string hash tables", N. > Askitis and J. Zobel, Proceedings of the SPIRE String Processing and > Information Retrieval Symposium, November 2005, pp. 91-102. LNCS 3772. > (http://www.cs.rmit.edu.au/~jz/fulltext/spire05.pdf) Seems like it would be easy to do, but I wonder if it is worth including in APR: this technique is only sensible when the hash workload has significant temporal locality. Neil
apr_hash_mem_leak-1.patch
Description: Binary data
testhashperf.c
Description: Binary data