Edit report at http://bugs.php.net/bug.php?id=53309&edit=1

 ID:                 53309
 Updated by:         fel...@php.net
 Reported by:        michael at squiloople dot com
 Summary:            Capturing group failing with a colon
 Status:             Bogus
 Type:               Bug
 Package:            Regexps related
 Operating System:   Vista
 PHP Version:        5.3.3
 Block user comment: N

 New Comment:

You're comparing the regexes wrongly.



The dash version should be:

/^(([a-z])(?:-(?2))*)::(?:(?1)-)[a-z]$/



You can fix this by just not calling the subpattern (?1), but repeating
the pattern or turning the quantifier * ungreedy, thus avoiding the
atomic matching.



e.g.

/^(([a-z])(?::(?2))*)::(?:([a-z])(?::(?2))*:)[a-z]$/

/^(([a-z])(?::(?2))*?)::(?:(?1):)[a-z]$/

/^(([a-z])(?::(?2))*)::(?:(?1):)[a-z]$/U



When you does (?1), the PCRE internally is doing: (?>([a-z])(?::(?2))*)

which does the atomic matches, i.e. no backtracking will happens, that
is needed to match your "a::a:a" string.


Previous Comments:
------------------------------------------------------------------------
[2010-11-15 01:43:31] michael at squiloople dot com

I don't understand. The only difference between the two cases is that
one has a 

colon after the backreference and the other has a dash. Look at it like
this:



GROUP 1 :: COPY OF GROUP ONE :

GROUP 1 :: COPY OF GROUP ONE -



Why would the first fail but the second not? They should both work.

------------------------------------------------------------------------
[2010-11-14 22:19:13] fel...@php.net

This is a behavior of PCRE library.



PCRE manpages says:



       Like  recursive  subpatterns, a subroutine call is always treated
as an

       atomic group. That is, once it has matched some of the subject 
string,

       it  is  never  re-entered, even if it contains untried
alternatives and

       there is a subsequent matching failure. Any capturing parentheses
 that

       are  set  during  the  subroutine  call revert to their previous
values

       afterwards.

------------------------------------------------------------------------
[2010-11-14 16:16:33] michael at squiloople dot com

Description:
------------
In some circumstances, when a colon is a specified character in a
capturing group, 

it unexpectedly fails.

Test script:
---------------
preg_match('/^(([a-z])(?::(?2))*)::(?:(?1):)[a-z]$/', 'a::a:a');

preg_match('/^(([a-z])(?::(?2))*)::(?:(?1)-)[a-z]$/', 'a::a-a');

Expected result:
----------------
int(1)

int(1)

Actual result:
--------------
int(0)

int(1)


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=53309&edit=1

Reply via email to