Hi,
I am also sorry for delay - got ill recently and spend a day in bed
after night at emergency. I am working on other things now, and hope
to post some patches soon. I will create patch for zend_memnstr as
you suggest and post it here probably tomorrow. I have some ideas/
implementations/data already prepared for other functions (not only
string related) but haven't got time to publish it neither here, nor
on the page (http://212.85.117.53/gsoc/), but will try to do it till
Monday.
Michal
On 2008-06-17, at 23:57, Nuno Lopes wrote:
Hi,
Sorry for taking so long to answer, but I'm trying to catch up last
stuff.
It's known that usually to optimize things for longer inputs you
usually end up making things for short inputs worst. So IMHO, I
think you should have the len==1 optimization and then use the KMP
algorithm. Your implementation can be further optimized (at least
on 32 bits machines), but seems ok for now.
I suggest you to produce a final patch, send it here, and then move
on to optimize other things (like strtr() that I'm sure wikipedia
guys will appreciate).
Nuno
----- Original Message -----
Hello again
I have setup small page where I am publishing all the profiling
and evaluation results. It is here: http://83.168.205.202/~michal/
standard15/
So far I have put there function usage profile, zend_memnstr
analysis, stripos and strrpos analysis including some charts etc.
CVS diffs where applicable are also posted along with comparison
of original and patched code.
Michal
On 2008-06-11, at 09:47, Stanislav Malyshev wrote:
Hi!
Here are some statistics:
- average haystack length: 624.2
- average needle length: 1.9 ! -> 63% of needles of length 1
- avg length of haystacks shorter than avg: 41.0 -> 85% of all
haystacks
- avg length of haystacks longer than avg: 5685.11
I think it would be interesting to see same excluding 1-char
needles since in this case it should do one-char lookup (btw, if
we don't do it on C level, it might be a good idea to).
Although strpos implements fix for that, some other functions
don't. My idea
is than to implement ZEND_MEMNSTR once again in shape:
if (needle_len = 1)
here just linear sweep
else if haystack_len < 5000 (5000 is arbitrary - maybe some
more tests
needed to choose good value)
original implementation (as it is the best one in this case)
else
BM/KMP (i think BM will be better in this case, as some people
suggested)
I'm not sure very big haystacks really worth the trouble - how
many of them are used? It may be interesting to see medians
instead of averages for that. But len=1 I think worth having
special case.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]