Re: PCRE modules in APR?

Jacques Amar Tue, 30 Dec 2008 10:08:59 -0800

Thanks for the reply.

Wes Garland wrote:

1. There is no APR equivalent for free, as it is neither needed nordesired. Simply allocate your memory from a pool, and destroy thepool when it is no longer needed. I would suggest making a subpool onRE create and bury it in an opaque pointer describing your RE, ifyou're actually going to go whole-hog on this. Me? I use the OSregexec/regcomp (search only) and register an apr_pool_cleanuphandler to avoid leaking memory.

I'm creating a series of pre-compiled/analyzed regex expressions atserver start up - and doing a lot of S&R during processing. I do createa dedicated pool for this, however, I can never destroy it, thepre-compiled expression are stored there and should stay there tillserver shutdown. And the PCRE documentation states that I should use onememory allocation function before first usage. I will try to use onepool for the regex creations, and another to be used for the search part- see if that works.

2. Personally, I would never roll my own search and replace exceptunder exceptional circumstances. That said, your approach doesn'tsound unreasonable, but it's difficult to say what your problem iswithout profiling the code and looking at memory consumption. Start byconsulting the literature, S&R is a well-understood problem; and maybegoogle some stuff on ropes, they may serve you better than strings.

For those interested, I traced the issue to UTF-8 handling- PCRE_UTF8flag will significantly slow down the searches. Not all my regexes needto have UTF-8 enabled, only those dealing with embedded strings, so Ishaved a lot of time off by being more selective.

Here's a paper on ropes which discusses concatenation, which *should*be where you're spending your search and replace time:www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf<http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf>


Will read thanks!  But with UTF-8 out of the way,
output = apr_array_pstrcat ( subpool, strip_arr, 0 );
works perfectly fine and fast.

Note - if your S&R is regexp instead of strcmp, you could also bespending most of your time in the regex state machine. Profile!
Wes

correct!

I guess I now have to deal with my UTF-8 issues.. ugh. I wonder ifUTF-16 would be faster as all chars are 2 bytes long. I'll also trymemcached to cache the results so I don't have to do the same processingon every request.


Thanks again

Jacques

Re: PCRE modules in APR?

Reply via email to