On Tue, Mar 19, 2019 at 10:58 AM Nikita Popov <nikita....@gmail.com> wrote:
> After thinking about this some more, while this may be a minor performance > improvement, it still does more work than necessary. In particular the use > of OFFSET_CAPTURE (which would be pretty much required here) needs one new > two-element array for each subpattern. If the captured strings are short, > this is where the main cost is going to be. > The primary use of this feature is when the captured strings are *long*, as that's when we most want to avoid copying a substring. > I'm wondering if we shouldn't consider a new object oriented API for PCRE > which can return a match object where subpattern positions and contents can > be queried via method calls, so you only pay for the parts that you do > access. > Seems like this is letting the perfect be the enemy of the good. The LENGTH_CAPTURE significantly reduces allocation for long match strings, and it allocates the same two-element arrays that OFFSET_CAPTURE would -- it just stores an integer where there would otherwise be an expensive substring. Furthermore, since the array structure is left mostly alone, it would be not-too-hard to support earlier-PHP versions, with something like: $hasLengthCapture = defined('PREG_LENGTH_CAPTURE') ? PREG_LENGTH_CAPTURE : 0; $r = preg_match($pat, $sub, $m, PREG_OFFSET_CAPTURE | $hasLengthCapture); $matchOneLength = $hasLengthCapture ? $m[1][0] : strlen($m[1][0]); $matchOneOffset = $m[1][1]; If you introduce a whole new OO accessor object, it starts becoming very hard to write backward-compatible code. --scott