Re: [PHP-DEV] Algorithm Optimizations - string search

Michal Dziemianko Thu, 19 Jun 2008 13:25:36 -0700

Hi,

I am also sorry for delay - got ill recently and spend a day in bedafter night at emergency. I am working on other things now, and hopeto post some patches soon. I will create patch for zend_memnstr asyou suggest and post it here probably tomorrow. I have some ideas/implementations/data already prepared for other functions (not onlystring related) but haven't got time to publish it neither here, noron the page (http://212.85.117.53/gsoc/), but will try to do it tillMonday.

Michal


On 2008-06-17, at 23:57, Nuno Lopes wrote:

Hi,
Sorry for taking so long to answer, but I'm trying to catch up laststuff.It's known that usually to optimize things for longer inputs youusually end up making things for short inputs worst. So IMHO, Ithink you should have the len==1 optimization and then use the KMPalgorithm. Your implementation can be further optimized (at leaston 32 bits machines), but seems ok for now.I suggest you to produce a final patch, send it here, and then moveon to optimize other things (like strtr() that I'm sure wikipediaguys will appreciate).
Nuno


----- Original Message -----
Hello again
I have setup small page where I am publishing all the profilingand evaluation results. It is here: http://83.168.205.202/~michal/standard15/So far I have put there function usage profile, zend_memnstranalysis, stripos and strrpos analysis including some charts etc.CVS diffs where applicable are also posted along with comparisonof original and patched code.
Michal

On 2008-06-11, at 09:47, Stanislav Malyshev wrote:
Hi!
Here are some statistics:
- average haystack length: 624.2
- average needle length: 1.9 ! -> 63% of needles of length 1
- avg length of haystacks shorter than avg: 41.0 -> 85% of allhaystacks
- avg length of haystacks  longer than avg: 5685.11
I think it would be interesting to see same excluding 1-charneedles since in this case it should do one-char lookup (btw, ifwe don't do it on C level, it might be a good idea to).
Although strpos implements fix for that, some other functionsdon't. My idea
is than to implement ZEND_MEMNSTR once again in shape:
if (needle_len = 1)
    here just linear sweep
else if haystack_len < 5000 (5000 is arbitrary - maybe somemore tests
needed to choose good value)
    original implementation (as it is the best one in this case)
else
    BM/KMP (i think BM will be better in this case, as some people
suggested)
I'm not sure very big haystacks really worth the trouble - howmany of them are used? It may be interesting to see mediansinstead of averages for that. But len=1 I think worth havingspecial case.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

Re: [PHP-DEV] Algorithm Optimizations - string search

Reply via email to