From:             
Operating system: All (AFAIK)
PHP version:      Irrelevant
Package:          *Regular Expressions
Bug Type:         Feature/Change Request
Bug description:Adding additional backreferencing indicators for use with 
PREG_OFFSET_CAPTURE

Description:
------------
This suggestion is related to PREG_MATCH_ALL when using
PREG_OFFSET_CAPTURE.



When specifying PREG_OFFSET_CAPTURE as a flag, each subpattern matched
results in the return of the subpatterned matched and the offset of the
subpattern matched in the $matches array.  Yet, there are instances where I
may only need one of these pieces of information for a particular
subpattern match, but want the other piece (or both pieces) of information
for a different particular subpattern match within the expression.  In
these instances, resources are being unnecessarily wasted to store
undesired information in the $matches array.



My suggestion is to add two additional indicators for backreference
capturing that can be used when the PREG_OFFSET_CAPTURE flags is specified.
 These indicators would tell the engine to set the results of either the
offset or the subpattern string in the $matches array to null.  I believe
this change would reduce the space required to hold the information in
$matches, while extending the typical functional use of PREG_MATCH_ALL when
used with PREG_OFFSET_CAPTURE (the same could also be done for PREG_SPLIT
and PREG_SPLIT_OFFSET_CAPTURE)

Test script:
---------------
Take, for instance, the following preg_match_all expressions to match
opening tags of BBCode:



1.

preg_match_all('/\\[(B|I|U|URL|COLOR|SIZE|LIST)(?:=([^]]*?))?](?=\\s*?[^\\s])/iu',$bbc,$openers,PREG_SET_ORDER|PREG_OFFSET_CAPTURE);

foreach($openers as $key => $val) {

        foreach($val as $key2 => $val2) {

                foreach($val2 as $key3 => $val3) {

                        echo '$openers['.$key.']['.$key2.']['.$key3.'] = 
'.$val3.'<br>';

                }

        }

}



2.

preg_match_all('/\\[(B|I|U|URL|COLOR|SIZE|LIST)(?:=([^]]*?))?](?=(\\s*?[^\\s]))/iu',$bbc,$openers,PREG_SET_ORDER|PREG_OFFSET_CAPTURE);

foreach($openers as $key => $val) {

        foreach($val as $key2 => $val2) {

                foreach($val2 as $key3 => $val3) {

                        echo '$openers['.$key.']['.$key2.']['.$key3.'] = 
'.$val3.'<br>';

                }

        }

}

Expected result:
----------------
In expression 1, the subpattern '(?=\\s*?[^\\s])' is used to check for
basic validity of an opening tag.  The beginning of the contents of the
opening tag would have to be found using the offset of the whole match
($matches[#][0][1]) plus the length of the whole match ($matches[#][0][0]):
 $matches[#][0][1] + strlen($matches[#][0][0]) = $contentstartposition.



In expression 2, the subpattern '(?=(\\s*?[^\\s]))' is used to check for
basic validity of an opening tag AND capture the position of where the
content starts in order to prevent performing a mathematical equation and a
strlen in order to find the starting position of the content: 
$matches[#][3][1] = $contentstartposition.



In terms of processing power involved, expression 2 is superior to
expression 1, as it is merely relaying information already gathered and
known by the engine instead of performing addition and a strlen(). 
However, in terms of the resources required to store the match information,
expression 1 is superior to expression 2 and still ensures a valid tag is
found (but will require additional processing to get a piece of information
returned by expression 2).



The commonalities among both of these expressions:

-Neither requires the offsets for subpattern [1] or [2], merely the
contents of it (for parsing / filtering).  The offsets are returned at the
expense of memory resources to store these unneeded offsets.  The only
other alternative to obtaining only the contents of the match without using
the memory is to spend significant processing resources to parse for the
same contents the subpattern match returns in $matches.

-Neither requires the contents of the last subpattern (captured or not) --
the offset is the only desired portion.  In expression 1, the offset must
be attained by comprimising processing resources; in expression 2, the
offset is attained by comprimising memory resources.



If there were additional indicators to restrict the returned value in
$matches for each subpattern, the $matches array returned could require
substantially less resources to store, while retaining its current
functionality and adding functionality to situations where it would not be
feasible to comprimise an increased use of memory resources for a decreased
use of CPU resources.



Thanks for your time!


-- 
Edit bug report at http://bugs.php.net/bug.php?id=51531&edit=1
-- 
Try a snapshot (PHP 5.2):            
http://bugs.php.net/fix.php?id=51531&r=trysnapshot52
Try a snapshot (PHP 5.3):            
http://bugs.php.net/fix.php?id=51531&r=trysnapshot53
Try a snapshot (PHP 6.0):            
http://bugs.php.net/fix.php?id=51531&r=trysnapshot60
Fixed in SVN:                        
http://bugs.php.net/fix.php?id=51531&r=fixed
Fixed in SVN and need be documented: 
http://bugs.php.net/fix.php?id=51531&r=needdocs
Fixed in release:                    
http://bugs.php.net/fix.php?id=51531&r=alreadyfixed
Need backtrace:                      
http://bugs.php.net/fix.php?id=51531&r=needtrace
Need Reproduce Script:               
http://bugs.php.net/fix.php?id=51531&r=needscript
Try newer version:                   
http://bugs.php.net/fix.php?id=51531&r=oldversion
Not developer issue:                 
http://bugs.php.net/fix.php?id=51531&r=support
Expected behavior:                   
http://bugs.php.net/fix.php?id=51531&r=notwrong
Not enough info:                     
http://bugs.php.net/fix.php?id=51531&r=notenoughinfo
Submitted twice:                     
http://bugs.php.net/fix.php?id=51531&r=submittedtwice
register_globals:                    
http://bugs.php.net/fix.php?id=51531&r=globals
PHP 4 support discontinued:          http://bugs.php.net/fix.php?id=51531&r=php4
Daylight Savings:                    http://bugs.php.net/fix.php?id=51531&r=dst
IIS Stability:                       
http://bugs.php.net/fix.php?id=51531&r=isapi
Install GNU Sed:                     
http://bugs.php.net/fix.php?id=51531&r=gnused
Floating point limitations:          
http://bugs.php.net/fix.php?id=51531&r=float
No Zend Extensions:                  
http://bugs.php.net/fix.php?id=51531&r=nozend
MySQL Configuration Error:           
http://bugs.php.net/fix.php?id=51531&r=mysqlcfg

Reply via email to