Hi,
I am also sorry for delay - got ill recently and spend a day in bed after night at emergency. I am working on other things now, and hope to post some patches soon. I will create patch for zend_memnstr as you suggest and post it here probably tomorrow. I have some ideas/ implementations/data already prepared for other functions (not only string related) but haven't got time to publish it neither here, nor on the page (http://212.85.117.53/gsoc/), but will try to do it till Monday.
Michal

On 2008-06-17, at 23:57, Nuno Lopes wrote:

Hi,

Sorry for taking so long to answer, but I'm trying to catch up last stuff. It's known that usually to optimize things for longer inputs you usually end up making things for short inputs worst. So IMHO, I think you should have the len==1 optimization and then use the KMP algorithm. Your implementation can be further optimized (at least on 32 bits machines), but seems ok for now. I suggest you to produce a final patch, send it here, and then move on to optimize other things (like strtr() that I'm sure wikipedia guys will appreciate).

Nuno


----- Original Message -----
Hello again
I have setup small page where I am publishing all the profiling and evaluation results. It is here: http://83.168.205.202/~michal/ standard15/ So far I have put there function usage profile, zend_memnstr analysis, stripos and strrpos analysis including some charts etc. CVS diffs where applicable are also posted along with comparison of original and patched code.
Michal

On 2008-06-11, at 09:47, Stanislav Malyshev wrote:

Hi!

Here are some statistics:
- average haystack length: 624.2
- average needle length: 1.9 ! -> 63% of needles of length 1
- avg length of haystacks shorter than avg: 41.0 -> 85% of all haystacks
- avg length of haystacks  longer than avg: 5685.11

I think it would be interesting to see same excluding 1-char needles since in this case it should do one-char lookup (btw, if we don't do it on C level, it might be a good idea to).


Although strpos implements fix for that, some other functions don't. My idea
is than to implement ZEND_MEMNSTR once again in shape:
if (needle_len = 1)
    here just linear sweep
else if haystack_len < 5000 (5000 is arbitrary - maybe some more tests
needed to choose good value)
    original implementation (as it is the best one in this case)
else
    BM/KMP (i think BM will be better in this case, as some people
suggested)

I'm not sure very big haystacks really worth the trouble - how many of them are used? It may be interesting to see medians instead of averages for that. But len=1 I think worth having special case.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]


Reply via email to