Edit report at https://bugs.php.net/bug.php?id=61780&edit=1

 ID:                 61780
 Comment by:         michael at mbaas dot de
 Reported by:        danielklein at airpost dot net
 Summary:            Inconsistent PCRE captures in match results
 Status:             Open
 Type:               Bug
 Package:            PCRE related
 PHP Version:        5.4.0
 Block user comment: N
 Private report:     N

 New Comment:

Here is a reproduceable example (PHP 5.3.20 and 5.3.21) where named captures do 
not return matches at all! I've tested this pattern against the PCRE-
Implementation in another language and it worked...

<?php


$QQ=chr(92) . chr(34);
$delimeters = "{}";
$del0 = preg_quote($delimeters{0});
$del1 = preg_quote($delimeters{1});
$tag="language";
$string="fdfdfdfdf{language=1}testhgg";
$preg = "~" . $del0 . $tag . "\s*=\s*(?P<" . "quote>[" . $QQ . "\']*)(?
P<att>.*?)(?P=quote)\s*/" . $del1 . "~";
$match=array();
preg_match($preg,$string,$match);
echo "<br>string = " . htmlspecialchars($string) . "<br>preg=" . 
htmlspecialchars($preg) . "<br>match:<pre>";var_dump($match);echo"</pre>";


?>


Previous Comments:
------------------------------------------------------------------------
[2012-04-20 00:54:39] danielklein at airpost dot net

Description:
------------
Named and unnamed captures in both preg_match and preg_match_all (and probably 
preg_replace and the other PCRE functions too but I haven't tested them all) 
can capture the wrong number of parentheses if alternation or a zero-or-more 
quantifier is used.

If the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', both 'b' and 
'c' will be set in the results array but 'd' won't be. 'b' should not be set 
(even to an empty string) as it failed to match anything. However, if it was 
trying to match '/(?<b>b?)(?<c>c)/' (note: optional 'b' AND mandatory 'c'), 'b' 
_should_ be set to '' as it's allowed to match a zero-length string. If a match 
gets tried but it fails and a capture later in the pattern works, the skipped 
capture should never produce a key in the results array. It should be OK to 
leave holes in the numbered sequence (e.g. match 0 and 2 but not 1).

Currently, you need to use PREG_OFFSET_CAPTURE and test to see if the key 
exists, and if it does, test to see if the capture position is -1. If this bug 
is fixed, capture positions will never be -1 as the key won't exist. 
Alternatively, an additional flag could be added (e.g. PREG_KEEP_NONMATCHES) to 
create keys for ALL captures whether used or not (so, in the first pattern 
above, keys would be created for 'b', 'c' and 'd' in all cases, and if matching 
the string 'c' the offsets for both 'b' and 'd' would be -1).

In summary, if the pattern '/(?<b>b)|(?<c>c)|(?<d>d)/' is used to match 'c', by 
default it should only ever create a key for 'c'. If desired, an additional 
flag could be added so that it creates keys for all captures: 'b', 'c' and 'd'. 
The current behaviour where it creates a key for 'b' and 'c' but not 'd' should 
be considered a bug and fixed.

Test script:
---------------
print('<pre>');
$offset = 0;
while (preg_match('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, 
PREG_OFFSET_CAPTURE, $offset)) {
  $offset = $matches[0][1] + strlen($matches[0][0]);
  var_export($matches);
  print("\n\n");
}

print("****************\n\n");

preg_match_all('/(?:(?<b>b)|(?<c>c)|(?<d>d))(?<e>e)?/', 'cdec', $matches, 
PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
var_export($matches);
print('</pre>');


Expected result:
----------------
array (
  0 => 
  array (
    0 => 'c',
    1 => 0,
  ),
  'c' => 
  array (
    0 => 'c',
    1 => 0,
  ),
  2 => 
  array (
    0 => 'c',
    1 => 0,
  ),
)

array (
  0 => 
  array (
    0 => 'de',
    1 => 1,
  ),
  'd' => 
  array (
    0 => 'd',
    1 => 1,
  ),
  3 => 
  array (
    0 => 'd',
    1 => 1,
  ),
  'e' => 
  array (
    0 => 'e',
    1 => 2,
  ),
  4 => 
  array (
    0 => 'e',
    1 => 2,
  ),
)

array (
  0 => 
  array (
    0 => 'c',
    1 => 3,
  ),
  'c' => 
  array (
    0 => 'c',
    1 => 3,
  ),
  2 => 
  array (
    0 => 'c',
    1 => 3,
  ),
)

****************

array (
  0 => 
  array (
    0 => 
    array (
      0 => 'c',
      1 => 0,
    ),
    'c' => 
    array (
      0 => 'c',
      1 => 0,
    ),
    2 => 
    array (
      0 => 'c',
      1 => 0,
    ),
  ),
  1 => 
  array (
    0 => 
    array (
      0 => 'de',
      1 => 1,
    ),
    'd' => 
    array (
      0 => 'd',
      1 => 1,
    ),
    3 => 
    array (
      0 => 'd',
      1 => 1,
    ),
    'e' => 
    array (
      0 => 'e',
      1 => 2,
    ),
    4 => 
    array (
      0 => 'e',
      1 => 2,
    ),
  ),
  2 => 
  array (
    0 => 
    array (
      0 => 'c',
      1 => 3,
    ),
    'c' => 
    array (
      0 => 'c',
      1 => 3,
    ),
    2 => 
    array (
      0 => 'c',
      1 => 3,
    ),
  ),
)

Actual result:
--------------
array (
  0 => 
  array (
    0 => 'c',
    1 => 0,
  ),
  'b' => 
  array (
    0 => '',
    1 => -1,
  ),
  1 => 
  array (
    0 => '',
    1 => -1,
  ),
  'c' => 
  array (
    0 => 'c',
    1 => 0,
  ),
  2 => 
  array (
    0 => 'c',
    1 => 0,
  ),
)

array (
  0 => 
  array (
    0 => 'de',
    1 => 1,
  ),
  'b' => 
  array (
    0 => '',
    1 => -1,
  ),
  1 => 
  array (
    0 => '',
    1 => -1,
  ),
  'c' => 
  array (
    0 => '',
    1 => -1,
  ),
  2 => 
  array (
    0 => '',
    1 => -1,
  ),
  'd' => 
  array (
    0 => 'd',
    1 => 1,
  ),
  3 => 
  array (
    0 => 'd',
    1 => 1,
  ),
  'e' => 
  array (
    0 => 'e',
    1 => 2,
  ),
  4 => 
  array (
    0 => 'e',
    1 => 2,
  ),
)

array (
  0 => 
  array (
    0 => 'c',
    1 => 3,
  ),
  'b' => 
  array (
    0 => '',
    1 => -1,
  ),
  1 => 
  array (
    0 => '',
    1 => -1,
  ),
  'c' => 
  array (
    0 => 'c',
    1 => 3,
  ),
  2 => 
  array (
    0 => 'c',
    1 => 3,
  ),
)

****************

array (
  0 => 
  array (
    0 => 
    array (
      0 => 'c',
      1 => 0,
    ),
    'b' => 
    array (
      0 => '',
      1 => -1,
    ),
    1 => 
    array (
      0 => '',
      1 => -1,
    ),
    'c' => 
    array (
      0 => 'c',
      1 => 0,
    ),
    2 => 
    array (
      0 => 'c',
      1 => 0,
    ),
  ),
  1 => 
  array (
    0 => 
    array (
      0 => 'de',
      1 => 1,
    ),
    'b' => 
    array (
      0 => '',
      1 => -1,
    ),
    1 => 
    array (
      0 => '',
      1 => -1,
    ),
    'c' => 
    array (
      0 => '',
      1 => -1,
    ),
    2 => 
    array (
      0 => '',
      1 => -1,
    ),
    'd' => 
    array (
      0 => 'd',
      1 => 1,
    ),
    3 => 
    array (
      0 => 'd',
      1 => 1,
    ),
    'e' => 
    array (
      0 => 'e',
      1 => 2,
    ),
    4 => 
    array (
      0 => 'e',
      1 => 2,
    ),
  ),
  2 => 
  array (
    0 => 
    array (
      0 => 'c',
      1 => 3,
    ),
    'b' => 
    array (
      0 => '',
      1 => -1,
    ),
    1 => 
    array (
      0 => '',
      1 => -1,
    ),
    'c' => 
    array (
      0 => 'c',
      1 => 3,
    ),
    2 => 
    array (
      0 => 'c',
      1 => 3,
    ),
  ),
)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=61780&edit=1

Reply via email to