From: webmaster at unitedscripters dot com Operating system: Windows XPP PHP version: 5.0.2 PHP Bug Type: Regexps related Bug description: INDEX POSITIONS OF A REGULAR EXPRESSION
Description: ------------ Object: FINDING INDEX POSITIONS OF A REGULAR EXPRESSION MATCH IS APPARENTLY A NON-AVAILABLE FEATURE I might be wrong but apparently PHP lacks a way to spot not only matches but their _index_ positions within a string. I at first thought that once found the matches by preg_match_all, all one had to do to draw also their index positions in the input string, was to iterate the returned array of matches and recursively grab any match from the string by strpos, removing the already inspected substring. Though it may seem an obvious idea, yet it may not work. The position in a string searched by a string oriented function is not necessarily the same poistion searched by a regular expression oriented function. Consider this example, input string is: "A thesaurus for the pupil" whereas the regular expression searches for: "/the\\b/" which is obviusly a word like "the" followed by a word boundary (\\b). The preg_match_all matches would report, correctly, only the isolated article "the", for that is followed by a word boundary. But attempting to retrieve the index position of that match by strpos would report the index position of THEsaurus. So do _not_ use strpos in combination with preg_match_all having in mind the retrieval of the index positions of the matches: that won't work the expected way. Reproduce code: --------------- function foo($string, $regexp){ $found=0; $indexes=array(); preg_match_all($regexp, $string, $matches); print("<strong>".$matches[0][0]."</strong>"); $matchSize=sizeof($matches[0]); for($m=0; $m < $matchSize; $m++){ $found=strlen(substr($string, 0, $found)); preg_match($regexp, $string, $specificMatch, PREG_OFFSET_CAPTURE, $found); $indexes[$m]=$found+ strpos(substr($string, $found), $specificMatch[0][0]);/*shortcoming: it's not a real index*/ $found=$indexes[$m]+strlen($matches[$m]); }; return $indexes; } $in="A thesaurus for the pupil"; print "In string <strong>$in</strong>, match is: "; $out=foo($in, "/the\\b/"); print "<br>Wrong Index reported: "; print_r($out); Expected result: ---------------- The result is correct, it is the feature that we lack and that _apparently_ we cannot even implement: grabbing the correct index of a Regular Expression match. Whatever the case, the feature is needed: javascript has it, the regular expression oriented function named search(), which reports at least one index and thus can be used recursively on gradually shrinking substrings of the input string to retrieve the positions of all the matches. If there is a way and I was not aware of it, I apologize. Yet the list of perl regexps clearly lacks a function for the retrieval of the indexes. -- Edit bug report at http://bugs.php.net/?id=30618&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=30618&r=trysnapshot4 Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=30618&r=trysnapshot50 Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=30618&r=trysnapshot51 Fixed in CVS: http://bugs.php.net/fix.php?id=30618&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=30618&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=30618&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=30618&r=needscript Try newer version: http://bugs.php.net/fix.php?id=30618&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=30618&r=support Expected behavior: http://bugs.php.net/fix.php?id=30618&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=30618&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=30618&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=30618&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=30618&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=30618&r=dst IIS Stability: http://bugs.php.net/fix.php?id=30618&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=30618&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=30618&r=float MySQL Configuration Error: http://bugs.php.net/fix.php?id=30618&r=mysqlcfg