Re: [pcre-dev] What is the best way to multi match

Ronen Hod Thu, 30 Jul 2009 08:12:03 -0700

Hi Andrew,

Thanks for the advice and tips.
Originally, I wrote my code this way (tokenization...), but later I was asked 
to support regular expressions in the patterns so I had to use a regular 
expression engine, and the question is "How?".
BTW, isn't it more efficient to let the DFA run a single pass on the input 
string than to scan it several times?

Thanks, Ronen.

-----Original Message-----
From: Andrew Ho [mailto:[email protected]] 
Sent: Wednesday, July 29, 2009 8:28 PM
To: Ronen Hod
Cc: PCRE Developers
Subject: Re: [pcre-dev] What is the best way to multi match

Hi Ronen,

>I am parsing an HTTP query-string ("s1&s2&...&sn"), and need to find 
>which of the patterns (p1, p2, ..., pm) exist there.
>So far the best way that I found was to use the RegExp
>^(|.+&)p1($|&)(?C0)|^(|.+&)p2($|&)(?C0)|...|^(|.+&)pm($|&)(?C0)
>and remember the position of every "|" that follows the callout so I 
>can identify them when I get the callout (using pcre_dfa_exec()). Does 
>anybody have any better working solution for this problem?

To be honest, a regular expression is the wrong tool for the job for 
parsing an HTTP query string.

I would do the parsing by hand: separate by '&' characters, then, for 
each token, separate by '=' characters. In either the regex or manual 
parsing cases, you will need to do URI unescaping (for example, "%61" to 
"a"). You can do your manual parsing either using a simple state machine 
(at any given time you are either parsing a name, or a value), or with 
multiple calls to strtok() or strtok_r().

Humbly,

Andrew

----------------------------------------------------------------------
'Twas brillig, and the slithy toves                         Andrew Ho
  Did gyre and gimble in the wabe.                  [email protected]
  All mimsy were the borogoves,
  And the mome raths outgrabe.          http://www.zeuscat.com/andrew/
----------------------------------------------------------------------

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] What is the best way to multi match

Reply via email to