On 15 March 2011 12:41, Ben Schmidt <[email protected]> wrote:
>>>>> static $re = '/(^|[^\\\\])\'/';
>>>
>>> Did no one see why the regex was wrong?
>
> I saw what the regex was. I didn't think like you that it was 'wrong'.
>
> Once you unescape the characters in the PHP single-quoted string above
> (where two backslashes count as one, and backslash-quote counts as a
> quote), the actual pattern that reaches the preg_replace function is:
>
> /(^|[^\\])'/
>
>>> RegexBuddy (a windows app) explains regexes VERY VERY well.
>
> What kind of patterns? Does it support PCRE ones?
>
Yep and MANY other flavours (C#, C++, Dephi, Groovy, Java,
Javascript, MySQL, ...)
>> The important bit (where the problem lies with regard to the regex) is
>> ...
>>
>> Match a single character NOT present in the list below «[^\\\\]»
>> A \ character «\\»
>> A \ character «\\»
>
> This is not the case.
>
> 1. As above, the pattern reaching preg_replace is /(^|[^\\])'/
>
> 2. PCRE, unlike many other regular expression implementations, allows
> backslash-escaping inside character classes (square brackets). So the
> doubled backslash only actually counts as a single backslash character
> to be excluded from the set of characters the atom will match.
>
> There is no error here. (And even if there were two backslashes being
> excluded, of course, it wouldn't hurt anything or change the meaning of
> the pattern.)
>
>> The issue is the word _single_.
>
> I don't think anybody thought otherwise.
>
> The problem was that, to a casual observer, the pattern seems to mean "a
> quote which doesn't already have a backslash before it". I believe this
> was its intent. (And the replacement added the 'missing' backslash.)
>
> But the pattern doesn't mean that. It actually means "a character which
> isn't a backslash, followed by a quote". This is subtly different.
>
> And it's most noticeable when two quotes follow each other in the
> subject string. In
>
> str''str
>
> first the pattern matches "r'" (non-backslash followed by quote), and
> then it keeps searching from that point, i.e. it searches "'str". Since
> this isn't the beginning of the string, and there is no quote following
> a non-backslash character, there are no further matches.
>
> Now, here is a pattern which actually means "a quote which doesn't
> already have a backslash before it" which is achieved by means of a
> lookbehind assertion, which, even when searching the string after the
> first match, "'str", still 'looks back' on the earlier part of the
> string to recognise the second quote is not preceded by a backslash and
> match a second time:
>
> /(^|(?<!\\))'/
>
> As a PHP single-quoted string this is:
>
> '/(^|(?<!\\\\))\'/'
>
> Hope this helps,
>
> Ben.
>
>
>
>
If I say ...
<?php
echo '/(^|[^\\\\])\'/';
?>
I get ...
/(^|[^\\])'/
which is explained as ...
(^|[^\\])'
Options: case insensitive; ^ and $ match at line breaks
Match the regular expression below and capture its match into
backreference number 1 «(^|[^\\])»
Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^\\]»
Match any character that is NOT a \ character «[^\\]»
Match the character “'” literally «'»
And that certainly makes a LOT more sense.
Decoding regexes and handling the escaping needed for the language is
a real headache sometimes.
Just imagine creating regex code for use by client side Javascript using PHP.
8 \ in a row for a single \ wouldn't be impossible.
Sorry for the confusion.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php