Hi Andrew, Thanks for the advice and tips. Originally, I wrote my code this way (tokenization...), but later I was asked to support regular expressions in the patterns so I had to use a regular expression engine, and the question is "How?". BTW, isn't it more efficient to let the DFA run a single pass on the input string than to scan it several times?
Thanks, Ronen. -----Original Message----- From: Andrew Ho [mailto:[email protected]] Sent: Wednesday, July 29, 2009 8:28 PM To: Ronen Hod Cc: PCRE Developers Subject: Re: [pcre-dev] What is the best way to multi match Hi Ronen, >I am parsing an HTTP query-string ("s1&s2&...&sn"), and need to find >which of the patterns (p1, p2, ..., pm) exist there. >So far the best way that I found was to use the RegExp >^(|.+&)p1($|&)(?C0)|^(|.+&)p2($|&)(?C0)|...|^(|.+&)pm($|&)(?C0) >and remember the position of every "|" that follows the callout so I >can identify them when I get the callout (using pcre_dfa_exec()). Does >anybody have any better working solution for this problem? To be honest, a regular expression is the wrong tool for the job for parsing an HTTP query string. I would do the parsing by hand: separate by '&' characters, then, for each token, separate by '=' characters. In either the regex or manual parsing cases, you will need to do URI unescaping (for example, "%61" to "a"). You can do your manual parsing either using a simple state machine (at any given time you are either parsing a name, or a value), or with multiple calls to strtok() or strtok_r(). Humbly, Andrew ---------------------------------------------------------------------- 'Twas brillig, and the slithy toves Andrew Ho Did gyre and gimble in the wabe. [email protected] All mimsy were the borogoves, And the mome raths outgrabe. http://www.zeuscat.com/andrew/ ---------------------------------------------------------------------- -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
