On Friday, 9 January 2015 at 15:57:21 UTC, ketmar via
Digitalmars-d-learn wrote:
On Fri, 09 Jan 2015 15:36:21 +0000
FrankLike via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com>
wrote:
On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via
Digitalmars-d-learn wrote:
> On Fri, 09 Jan 2015 13:54:00 +0000
> Robert burner Schadek via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via
>> Digitalmars-d-learn wrote:
>> > if you *really* concerned with speed here, you'd better
>> > consider using
>> > regular expressions. as regular expression can be
>> > precompiled and then
>> > search for multiple words with only one pass over the
>> > source string. i
>> > believe that std.regex will use variation of Thomson
>> > algorithm for
>> > regular expressions when it is able to do so.
>>
>> IMO that is not sound advice. Creating the state machine
>> and running will be more costly than using canFind or
>> indexOf how basically only compare char by char.
>>
>> If speed is really need use strstr and look if it uses sse
>> to compare multiple chars at a time. Anyway benchmark and
>> then benchmark some more.
> std.regex can use CTFE to compile regular expressions (yet
> it sometimes
> slower than non-CTFE variant), and i mean that we compile
> regexp before
> doing alot of searches, not before each single search. if
> you have alot
> of words to match or alot of strings to check, regexp can
> give a huge
> boost.
>
> sure, it all depends of code patterns.
import std.regex;
auto ctr = ctRegex!(`(home|office|sea|plane)`);
auto c2 = !matchFirst("He is in the sea.", ctr).empty;
----------------------------------------------------------
Test by auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000);
Result is :
filter is 42ms 85us
findAmong is 37ms 268us
foreach indexOf is 37ms 841us
canFind is 13ms
canFind indexOf is 39ms 455us
ctRegex is 138ms
1. stop doing captures in regexp, this will speedup the
comparison.
2. your sample is very artificial. i was talking about alot more
keywords and alot longer strings. sorry, i wasn't told that
clear
enough.
Yes. regex doing 'a lot more keywords and a lot longer strings'
will be better.
Thank you.