Edit report at http://bugs.php.net/bug.php?id=53309&edit=1
ID: 53309 Updated by: fel...@php.net Reported by: michael at squiloople dot com Summary: Capturing group failing with a colon Status: Bogus Type: Bug Package: Regexps related Operating System: Vista PHP Version: 5.3.3 Block user comment: N New Comment: You're comparing the regexes wrongly. The dash version should be: /^(([a-z])(?:-(?2))*)::(?:(?1)-)[a-z]$/ You can fix this by just not calling the subpattern (?1), but repeating the pattern or turning the quantifier * ungreedy, thus avoiding the atomic matching. e.g. /^(([a-z])(?::(?2))*)::(?:([a-z])(?::(?2))*:)[a-z]$/ /^(([a-z])(?::(?2))*?)::(?:(?1):)[a-z]$/ /^(([a-z])(?::(?2))*)::(?:(?1):)[a-z]$/U When you does (?1), the PCRE internally is doing: (?>([a-z])(?::(?2))*) which does the atomic matches, i.e. no backtracking will happens, that is needed to match your "a::a:a" string. Previous Comments: ------------------------------------------------------------------------ [2010-11-15 01:43:31] michael at squiloople dot com I don't understand. The only difference between the two cases is that one has a colon after the backreference and the other has a dash. Look at it like this: GROUP 1 :: COPY OF GROUP ONE : GROUP 1 :: COPY OF GROUP ONE - Why would the first fail but the second not? They should both work. ------------------------------------------------------------------------ [2010-11-14 22:19:13] fel...@php.net This is a behavior of PCRE library. PCRE manpages says: Like recursive subpatterns, a subroutine call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure. Any capturing parentheses that are set during the subroutine call revert to their previous values afterwards. ------------------------------------------------------------------------ [2010-11-14 16:16:33] michael at squiloople dot com Description: ------------ In some circumstances, when a colon is a specified character in a capturing group, it unexpectedly fails. Test script: --------------- preg_match('/^(([a-z])(?::(?2))*)::(?:(?1):)[a-z]$/', 'a::a:a'); preg_match('/^(([a-z])(?::(?2))*)::(?:(?1)-)[a-z]$/', 'a::a-a'); Expected result: ---------------- int(1) int(1) Actual result: -------------- int(0) int(1) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=53309&edit=1