Alan G Isaac wrote:
Alan G Isaac<[email protected]> wrote:>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')True Why are the first two strings being treated as if they are the last one?On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:They aren't. The last string is different.Of course it is different. That is the basis of my question. Why is it being treated as if it is the same? (See the end of this post.)Alan G Isaac<[email protected]> wrote:More simply, consider:: >>> re.sub('abc', '\\', '123abcdefg') Traceback (most recent call last): File "<stdin>", line 1, in<module> File "C:\Python26\lib\re.py", line 151, in sub return _compile(pattern, 0).sub(repl, string, count) File "C:\Python26\lib\re.py", line 273, in _subx template = _compile_repl(template, pattern) File "C:\Python26\lib\re.py", line 260, in _compile_repl raise error, v # invalid expression sre_constants.error: bogus escape (end of line) Why is this the proper handling of what one might think would be an obvious substitution?On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:Is this what you want? What you have is a re expression consisting of a single backslash that doesn't escape anything (EOL) so it barfs.>>>> re.sub('abc', r'\\', '123abcdefg') > '123\\defg' Turning again to the documentation: "if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth." So why is '\n' converted to a newline but '\\' does not become a literal backslash? OK, I don't do much string processing, so perhaps this is where I am missing the point: how is the replacement being "converted"? (As Peter's example shows, if you supply the replacement via a function, this does not happen.) You suggest it is just a matter of it being an re, but:: >>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd') True >>> re.compile('a\\nc') == re.compile('a\nc') False So I have two string that are not the same, nor do they compile equivalently, yet apparently they are "converted" to something equivalent for the substitution. Why? Is my question clearer?
re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').
However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').
However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.
If the answer looks too obvious to state, assume I'm missing it anyway and please state it. As I said, I seldom use the re module.
-- http://mail.python.org/mailman/listinfo/python-list
