From:             webmaster at unitedscripters dot com
Operating system: Windows XPP
PHP version:      5.0.2
PHP Bug Type:     Regexps related
Bug description:  INDEX POSITIONS OF A REGULAR EXPRESSION

Description:
------------
Object: FINDING INDEX POSITIONS OF A REGULAR EXPRESSION MATCH IS
APPARENTLY A NON-AVAILABLE FEATURE

I might be wrong but apparently PHP lacks a way to spot not only matches
but their _index_ positions within a string.

I at first thought that once found the matches by preg_match_all, all one
had to do to draw also their index positions in the input string, was to
iterate the returned array of matches and recursively grab any match from
the string by strpos, removing the already inspected substring.

Though it may seem an obvious idea, yet it may not work.

The position in a string searched by a string oriented function is not
necessarily the same poistion searched by a regular expression oriented
function.

Consider this example, input string is:
"A thesaurus for the pupil"
whereas the regular expression searches for:
"/the\\b/"
which is obviusly a word like "the" followed by a word boundary (\\b).

The preg_match_all matches would report, correctly, only the isolated
article "the", for that is followed by a word boundary.
But attempting to retrieve the index position of that match by strpos
would report the index position of THEsaurus.

So do _not_ use strpos in combination with preg_match_all having in mind
the retrieval of the index positions of the matches: that won't work the
expected way.

Reproduce code:
---------------
function foo($string, $regexp){
$found=0;
$indexes=array();
preg_match_all($regexp, $string, $matches);
        print("<strong>".$matches[0][0]."</strong>");
$matchSize=sizeof($matches[0]);
for($m=0; $m < $matchSize; $m++){
$found=strlen(substr($string, 0, $found));
preg_match($regexp, $string, $specificMatch, PREG_OFFSET_CAPTURE,
$found);
$indexes[$m]=$found+
strpos(substr($string, $found), $specificMatch[0][0]);/*shortcoming: it's
not a real index*/
$found=$indexes[$m]+strlen($matches[$m]);
};
return $indexes;
}

$in="A thesaurus for the pupil";
print "In string <strong>$in</strong>, match is: ";
$out=foo($in, "/the\\b/");
print "<br>Wrong Index reported: ";
print_r($out);

Expected result:
----------------
The result is correct, it is the feature that we lack and that
_apparently_ we cannot even implement: grabbing the correct index of a
Regular Expression match.
Whatever the case, the feature is needed: javascript has it, the regular
expression oriented function named search(), which reports at least one
index and thus can be used recursively on gradually shrinking substrings
of the input string to retrieve the positions of all the matches.

If there is a way and I was not aware of it, I apologize. Yet the list of
perl regexps clearly lacks a function for the retrieval of the indexes.


-- 
Edit bug report at http://bugs.php.net/?id=30618&edit=1
-- 
Try a CVS snapshot (php4):   http://bugs.php.net/fix.php?id=30618&r=trysnapshot4
Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=30618&r=trysnapshot50
Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=30618&r=trysnapshot51
Fixed in CVS:                http://bugs.php.net/fix.php?id=30618&r=fixedcvs
Fixed in release:            http://bugs.php.net/fix.php?id=30618&r=alreadyfixed
Need backtrace:              http://bugs.php.net/fix.php?id=30618&r=needtrace
Need Reproduce Script:       http://bugs.php.net/fix.php?id=30618&r=needscript
Try newer version:           http://bugs.php.net/fix.php?id=30618&r=oldversion
Not developer issue:         http://bugs.php.net/fix.php?id=30618&r=support
Expected behavior:           http://bugs.php.net/fix.php?id=30618&r=notwrong
Not enough info:             http://bugs.php.net/fix.php?id=30618&r=notenoughinfo
Submitted twice:             http://bugs.php.net/fix.php?id=30618&r=submittedtwice
register_globals:            http://bugs.php.net/fix.php?id=30618&r=globals
PHP 3 support discontinued:  http://bugs.php.net/fix.php?id=30618&r=php3
Daylight Savings:            http://bugs.php.net/fix.php?id=30618&r=dst
IIS Stability:               http://bugs.php.net/fix.php?id=30618&r=isapi
Install GNU Sed:             http://bugs.php.net/fix.php?id=30618&r=gnused
Floating point limitations:  http://bugs.php.net/fix.php?id=30618&r=float
MySQL Configuration Error:   http://bugs.php.net/fix.php?id=30618&r=mysqlcfg

Reply via email to