On 30/10/2007, Stijn Verholen <[EMAIL PROTECTED]> wrote:
> Hey list,
>
> I'm having problems with grouped alternative patterns.
> The regex I would like to use, is the following:
>
> /\s*(`?.+`?)\s*int\s*(\(([0-9]+)\))?\s*(unsigned)?\s*(((auto_increment)?\s*(primary\s*key)?)|((not\s*null)?\s*(default\s*(`.*`|[0-9]*)?)?))\s*/i
>
> It matches this statement:
>
> `id` INT(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY
>
> But not this:
>
> `test4` INT(11) UNSIGNED NOT NULL DEFAULT 5
>
> However, if I switch the alternatives, the first statement doesn't
> match, but the second does.
> FYI: In both cases, the column name and data type are matched, as expected.
> It appears to be doing lazy evaluation on the pattern, even though every
> resource I can find states that every alternative is tried in turn until
> a match is found.

It's not lazy.

Given alternate matching subpatterns, the pcre engine choses the
leftmost pattern, not the longest. For instance:

<?php
  preg_match("/a|ab/", "abbot", $matches);
  print_r($matches);
?>

Array
(
    [0] => a
)

This isn't what you'd expect if you were familiar with POSIX regular
expressions, but matches Perl's behaviour.

Because each of your subpatterns can match an empty string, the
lefthand subpattern always matches and the righthand subpattern might
as well not be there.

The simplest solution, if you don't want to completely rethink your
regexp might be to replace \s with [[:space:]], remove the delimiters
and the i modifier and just use eregi(). like so:

$pattern = 
'[[:space:]]*(`?.+`?)[[:space:]]*int[[:space:]]*(\(([0-9]+)\))?[[:space:]]*(unsigned)?[[:space:]]*(((auto_increment)?[[:space:]]*(primary[[:space:]]*key)?)|((not[[:space:]]*null)?[[:space:]]*(default[[:space:]]*(`.*`|[0-9]*)?)?))[[:space:]]*';

eregi($pattern, $column1, $matches); print_r($matches); // match
eregi($pattern, $column2, $matches); print_r($matches); // match

-robin

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to