On Tue, Oct 01, 2002 at 06:32:07PM -0400, Mike Lambert wrote:
> > guaranteeing that the subsqls have all text up to, but not including the string
> > "union".
> >
> > I suppose I could say:
> >
> > rule nonunion { (.*) :: { fail if ($1 =~ m"union$"); } }
>
> What's wrong with: ?
>
> rule getstuffbeforeunion { (.*?) union | (.*) }
>
> "a union" => "a "
> "b" => "b"
>
> Am I missing something here?
>
> Mike Lambert
>
hmm... well, it works, but its not very efficient. It basically
scans the whole string to the end to see if there is a "union" string, and
then backtracks to take the alternative. And hence, its not very scalable.
It also doesn't 'complexify' very well.
Suppose you had a long string of text, and you wanted to 'harden' your regex
against the substring union appearing in double-quoted strings, single-quoted
strings, etc. etc, without writing a sql parser. I just don't see how to do this
with ? - I would do something like (taking a page from Mr. Friedl's book ) -
rule regex_matching_sql
{
[
<-[u()"']>+ : |
<parens> : |
<double_string> : |
<single_string> : |
<non_union>
]*
}
rule parens
{
\(
[
<-["'()]>+ : |
<double_string> : |
<single_string> : |
<self>
]*
\)
}
rule single_string
{
\' [ <-[\'\\]>+ : | \.\' ]* \'
}
rule double_string
{
\" [ <-[\"\\]>+ : | \.\" ]* \"
}
rule non_union { [ u < - ['"()n] > | un ... | uni ... | unio ... | u$ ] * }
Of course I could also be missing something, but I just don't see how to do this
with .*?.
Ed
(ps:
As for:
/(.*) <commit> <!{ $1 =~ rx{union} }>/
I'm not sure how that works; and whether or not its very 'complexifiable'
(as per above) . If it does a match against every single substring (take all
characters, look for union, if it exists, roll back a character, do
the same thing, etc. etc. etc.) then this isn't good enough. The non_union
rule listed above is about as efficient as it can get; it does no backtracking,
and it keeps the common matches up front so they match first without
alternation.
)