Edit report at https://bugs.php.net/bug.php?id=61780&edit=1
ID: 61780 Comment by: michael at mbaas dot de Reported by: danielklein at airpost dot net Summary: Inconsistent PCRE captures in match results Status: Open Type: Bug Package: PCRE related PHP Version: 5.4.0 Block user comment: N Private report: N New Comment: Here is a reproduceable example (PHP 5.3.20 and 5.3.21) where named captures do not return matches at all! I've tested this pattern against the PCRE- Implementation in another language and it worked... <?php $QQ=chr(92) . chr(34); $delimeters = "{}"; $del0 = preg_quote($delimeters{0}); $del1 = preg_quote($delimeters{1}); $tag="language"; $string="fdfdfdfdf{language=1}testhgg"; $preg = "~" . $del0 . $tag . "\s*=\s*(?P<" . "quote>[" . $QQ . "\']*)(? P<att>.*?)(?P=quote)\s*/" . $del1 . "~"; $match=array(); preg_match($preg,$string,$match); echo "<br>string = " . htmlspecialchars($string) . "<br>preg=" . htmlspecialchars($preg) . "<br>match:<pre>";var_dump($match);echo"</pre>"; ?> Previous Comments: ------------------------------------------------------------------------ [2012-04-20 00:54:39] danielklein at airpost dot net Description: ------------ Named and unnamed captures in both preg_match and preg_match_all (and probably preg_replace and the other PCRE functions too but I haven't tested them all) can capture the wrong number of parentheses if alternation or a zero-or-more quantifier is used. If the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', both 'b' and 'c' will be set in the results array but 'd' won't be. 'b' should not be set (even to an empty string) as it failed to match anything. However, if it was trying to match '/(?<b>b?)(?<c>c)/' (note: optional 'b' AND mandatory 'c'), 'b' _should_ be set to '' as it's allowed to match a zero-length string. If a match gets tried but it fails and a capture later in the pattern works, the skipped capture should never produce a key in the results array. It should be OK to leave holes in the numbered sequence (e.g. match 0 and 2 but not 1). Currently, you need to use PREG_OFFSET_CAPTURE and test to see if the key exists, and if it does, test to see if the capture position is -1. If this bug is fixed, capture positions will never be -1 as the key won't exist. Alternatively, an additional flag could be added (e.g. PREG_KEEP_NONMATCHES) to create keys for ALL captures whether used or not (so, in the first pattern above, keys would be created for 'b', 'c' and 'd' in all cases, and if matching the string 'c' the offsets for both 'b' and 'd' would be -1). In summary, if the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', by default it should only ever create a key for 'c'. If desired, an additional flag could be added so that it creates keys for all captures: 'b', 'c' and 'd'. The current behaviour where it creates a key for 'b' and 'c' but not 'd' should be considered a bug and fixed. Test script: --------------- print('<pre>'); $offset = 0; while (preg_match('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, PREG_OFFSET_CAPTURE, $offset)) { $offset = $matches[0][1] + strlen($matches[0][0]); var_export($matches); print("\n\n"); } print("****************\n\n"); preg_match_all('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); var_export($matches); print('</pre>'); Expected result: ---------------- array ( 0 => array ( 0 => 'c', 1 => 0, ), 'c' => array ( 0 => 'c', 1 => 0, ), 2 => array ( 0 => 'c', 1 => 0, ), ) array ( 0 => array ( 0 => 'de', 1 => 1, ), 'd' => array ( 0 => 'd', 1 => 1, ), 3 => array ( 0 => 'd', 1 => 1, ), 'e' => array ( 0 => 'e', 1 => 2, ), 4 => array ( 0 => 'e', 1 => 2, ), ) array ( 0 => array ( 0 => 'c', 1 => 3, ), 'c' => array ( 0 => 'c', 1 => 3, ), 2 => array ( 0 => 'c', 1 => 3, ), ) **************** array ( 0 => array ( 0 => array ( 0 => 'c', 1 => 0, ), 'c' => array ( 0 => 'c', 1 => 0, ), 2 => array ( 0 => 'c', 1 => 0, ), ), 1 => array ( 0 => array ( 0 => 'de', 1 => 1, ), 'd' => array ( 0 => 'd', 1 => 1, ), 3 => array ( 0 => 'd', 1 => 1, ), 'e' => array ( 0 => 'e', 1 => 2, ), 4 => array ( 0 => 'e', 1 => 2, ), ), 2 => array ( 0 => array ( 0 => 'c', 1 => 3, ), 'c' => array ( 0 => 'c', 1 => 3, ), 2 => array ( 0 => 'c', 1 => 3, ), ), ) Actual result: -------------- array ( 0 => array ( 0 => 'c', 1 => 0, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => 'c', 1 => 0, ), 2 => array ( 0 => 'c', 1 => 0, ), ) array ( 0 => array ( 0 => 'de', 1 => 1, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => '', 1 => -1, ), 2 => array ( 0 => '', 1 => -1, ), 'd' => array ( 0 => 'd', 1 => 1, ), 3 => array ( 0 => 'd', 1 => 1, ), 'e' => array ( 0 => 'e', 1 => 2, ), 4 => array ( 0 => 'e', 1 => 2, ), ) array ( 0 => array ( 0 => 'c', 1 => 3, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => 'c', 1 => 3, ), 2 => array ( 0 => 'c', 1 => 3, ), ) **************** array ( 0 => array ( 0 => array ( 0 => 'c', 1 => 0, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => 'c', 1 => 0, ), 2 => array ( 0 => 'c', 1 => 0, ), ), 1 => array ( 0 => array ( 0 => 'de', 1 => 1, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => '', 1 => -1, ), 2 => array ( 0 => '', 1 => -1, ), 'd' => array ( 0 => 'd', 1 => 1, ), 3 => array ( 0 => 'd', 1 => 1, ), 'e' => array ( 0 => 'e', 1 => 2, ), 4 => array ( 0 => 'e', 1 => 2, ), ), 2 => array ( 0 => array ( 0 => 'c', 1 => 3, ), 'b' => array ( 0 => '', 1 => -1, ), 1 => array ( 0 => '', 1 => -1, ), 'c' => array ( 0 => 'c', 1 => 3, ), 2 => array ( 0 => 'c', 1 => 3, ), ), ) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=61780&edit=1