On Tue, 06 Jul 2010 19:10:17 +0200, Javier Collado wrote: > Hello, > > Let's imagine that we have a simple function that generates a > replacement for a regular expression: > > def process(match): > return match.string > > If we use that simple function with re.sub using a simple pattern and a > string we get the expected output: > re.sub('123', process, '123') > '123' > > However, if the string passed to re.sub contains a trailing new line > character, then we get an extra new line character unexpectedly: > re.sub(r'123', process, '123\n') > '123\n\n'
I don't know why you say it is unexpected. The regex "123" matched the first three characters of "123\n". Those three characters are replaced by a copy of the string you are searching "123\n", which gives "123\n\n" exactly as expected. Perhaps these examples might help: >>> re.sub('W', process, 'Hello World') 'Hello Hello Worldorld' >>> re.sub('o', process, 'Hello World') 'HellHello World WHello Worldrld' Here's a simplified pure-Python equivalent of what you are doing: def replace_with_match_string(target, s): n = s.find(target) if n != -1: s = s[:n] + s + s[n+len(target):] return s > If we try to get the same result using a replacement string, instead of > a function, the strange behaviour cannot be reproduced: re.sub(r'123', > '123', '123') > '123' > > re.sub('123', '123', '123\n') > '123\n' The regex "123" matches the first three characters of "123\n", which is then replaced by "123", giving "123\n", exactly as expected. >>> re.sub("o", "123", "Hello World") 'Hell123 W123rld' -- Steven -- http://mail.python.org/mailman/listinfo/python-list