On Tue, Mar 19, 2019 at 10:58 AM Nikita Popov <nikita....@gmail.com> wrote:

> After thinking about this some more, while this may be a minor performance
> improvement, it still does more work than necessary. In particular the use
> of OFFSET_CAPTURE (which would be pretty much required here) needs one new
> two-element array for each subpattern. If the captured strings are short,
> this is where the main cost is going to be.
>

The primary use of this feature is when the captured strings are *long*, as
that's when we most want to avoid copying a substring.


> I'm wondering if we shouldn't consider a new object oriented API for PCRE
> which can return a match object where subpattern positions and contents can
> be queried via method calls, so you only pay for the parts that you do
> access.
>

Seems like this is letting the perfect be the enemy of the good.  The
LENGTH_CAPTURE significantly reduces allocation for long match strings, and
it allocates the same two-element arrays that OFFSET_CAPTURE would -- it
just stores an integer where there would otherwise be an expensive
substring.  Furthermore, since the array structure is left mostly alone, it
would be not-too-hard to support earlier-PHP versions, with something like:

$hasLengthCapture = defined('PREG_LENGTH_CAPTURE') ? PREG_LENGTH_CAPTURE :
0;
$r = preg_match($pat, $sub, $m, PREG_OFFSET_CAPTURE | $hasLengthCapture);
$matchOneLength = $hasLengthCapture ? $m[1][0] : strlen($m[1][0]);
$matchOneOffset = $m[1][1];

If you introduce a whole new OO accessor object, it starts becoming very
hard to write backward-compatible code.
 --scott

Reply via email to