I like regular expressions, but I always think of them as a last resort,
sort of like finding your way through a labyrinth by feel. When you know
more about the structure of the mystery -- "keep your left hand on the
wall" or "spaces separate tokens"-- then other tools and approaches can
help tremendously in terms of speed, and size.

If there are keywords and there is a simple way to separate keywords
without deep understanding what has come before or what comes after, then a
two-stage lex/parse scheme may be suggested.

If the quest is mostly about keywords rather than the relative structure of
keywords, as in "find these hundred thousand words in this hundred billion
word corpus, then sophisticated linear approaches are suggested.

I have often been able to get huge speedups by doing just what Andy
suggested as I am typing here...by considering the match process as
iterative or as a series of mappings and matchings. Often the "or else" and
"or backup" can be avoided by an earlier pass.

Good luck!

On Tue, Dec 13, 2016 at 11:34 AM, David Sofo <sofodav...@gmail.com> wrote:

> Thanks Tamas. I am not aware of Ragel.
> Regard
> David
>
>
> Le mardi 13 décembre 2016 20:24:18 UTC+1, Tamás Gulácsi a écrit :
>>
>> 2016. december 13., kedd 16:53:45 UTC+1 időpontban David Sofo a
>> következőt írta:
>>>
>>>  Hi,
>>>
>>> For a set of rules expressed in regular expression (around 1000 rules
>>> expected) to find some keywords in a text file (~50Ko each file), how to
>>> speed up the execution time. Currently I compile the regex rule at
>>> initialization time with init function at put them in a map at package
>>> level then run the regex rules with a loop. The regex have this form:
>>>
>>> \b(?:( (A1|A2|A3) | (B1|B2|B3) ) )\b
>>>
>>> spaces are put for readability. A and B are classes of keywords.
>>>
>>> How to speed up the execution: at regular expression level or others
>>> levels (such execution priority). I am using Ubuntu 14.04. Any suggestion
>>> is welcome.  Thank you.
>>>
>>> Here a code
>>>
>>> Regards
>>> David
>>>
>>>
>> Are these really just words? Then split the string first, then match
>> these tokens.
>> Or try to create one huge regexp - maybe that's faster.
>> Or create one huge regexp, roll it through Ragel, and use the generated
>> Go code.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Michael T. Jones
michael.jo...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to