Re: Raw string substitution problem

MRAB Thu, 17 Dec 2009 11:48:13 -0800

Alan G Isaac wrote:

Alan G Isaac<[email protected]>  wrote:
>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') ==re.sub('abc', 'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc','a\nb\n.c\a','123abcdefg')
          True
Why are the first two strings being treated as if they are the last one?

On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

They aren't.  The last string is different.


Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)

Alan G Isaac<[email protected]>  wrote:

More simply, consider::

          >>>  re.sub('abc', '\\', '123abcdefg')
          Traceback (most recent call last):
            File "<stdin>", line 1, in<module>
            File "C:\Python26\lib\re.py", line 151, in sub
              return _compile(pattern, 0).sub(repl, string, count)
            File "C:\Python26\lib\re.py", line 273, in _subx
              template = _compile_repl(template, pattern)
            File "C:\Python26\lib\re.py", line 260, in _compile_repl
              raise error, v # invalid expression
          sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?



On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

Is this what you want?  What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.

        >>>> re.sub('abc', r'\\', '123abcdefg')
        > '123\\defg'


Turning again to the documentation:
        "if it is a string, any backslash escapes in it are processed.
        That is, \n is converted to a single newline character, \r is
        converted to a linefeed, and so forth."
So why is '\n' converted to a newline but '\\' does not become a literal
backslash?  OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::

        >>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
        True
        >>> re.compile('a\\nc') == re.compile('a\nc')
        False

So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?

re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').

However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').

However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.

If the answer looks too obvious to state, assume I'm missing it anyway
and please state it.  As I said, I seldom use the re module.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Raw string substitution problem

Reply via email to