I like regular expressions, but I always think of them as a last resort, sort of like finding your way through a labyrinth by feel. When you know more about the structure of the mystery -- "keep your left hand on the wall" or "spaces separate tokens"-- then other tools and approaches can help tremendously in terms of speed, and size.
If there are keywords and there is a simple way to separate keywords without deep understanding what has come before or what comes after, then a two-stage lex/parse scheme may be suggested. If the quest is mostly about keywords rather than the relative structure of keywords, as in "find these hundred thousand words in this hundred billion word corpus, then sophisticated linear approaches are suggested. I have often been able to get huge speedups by doing just what Andy suggested as I am typing here...by considering the match process as iterative or as a series of mappings and matchings. Often the "or else" and "or backup" can be avoided by an earlier pass. Good luck! On Tue, Dec 13, 2016 at 11:34 AM, David Sofo <sofodav...@gmail.com> wrote: > Thanks Tamas. I am not aware of Ragel. > Regard > David > > > Le mardi 13 décembre 2016 20:24:18 UTC+1, Tamás Gulácsi a écrit : >> >> 2016. december 13., kedd 16:53:45 UTC+1 időpontban David Sofo a >> következőt írta: >>> >>> Hi, >>> >>> For a set of rules expressed in regular expression (around 1000 rules >>> expected) to find some keywords in a text file (~50Ko each file), how to >>> speed up the execution time. Currently I compile the regex rule at >>> initialization time with init function at put them in a map at package >>> level then run the regex rules with a loop. The regex have this form: >>> >>> \b(?:( (A1|A2|A3) | (B1|B2|B3) ) )\b >>> >>> spaces are put for readability. A and B are classes of keywords. >>> >>> How to speed up the execution: at regular expression level or others >>> levels (such execution priority). I am using Ubuntu 14.04. Any suggestion >>> is welcome. Thank you. >>> >>> Here a code >>> >>> Regards >>> David >>> >>> >> Are these really just words? Then split the string first, then match >> these tokens. >> Or try to create one huge regexp - maybe that's faster. >> Or create one huge regexp, roll it through Ragel, and use the generated >> Go code. >> > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Michael T. Jones michael.jo...@gmail.com -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.