504cr...@gmail.com wrote: > I wonder if you (or anyone else) might attempt a different explanation > for the use of the special sequence '\1' in the RegEx syntax. > > The Python documentation explains: > > \number > Matches the contents of the group of the same number. Groups are > numbered starting from 1. For example, (.+) \1 matches 'the the' or > '55 55', but not 'the end' (note the space after the group). This > special sequence can only be used to match one of the first 99 groups. > If the first digit of number is 0, or number is 3 octal digits long, > it will not be interpreted as a group match, but as the character with > octal value number. Inside the '[' and ']' of a character class, all > numeric escapes are treated as characters. > > In practice, this appears to be the key to the key device to your > clever solution: > >>>> re.compile(r"(\d+)").sub(r"INSERT \1", string) > > 'abc INSERT 123 def INSERT 456 ghi INSERT 789' > >>>> re.compile(r"(\d+)").sub(r"INSERT ", string) > > 'abc INSERT def INSERT ghi INSERT ' > > I don't, however, precisely understand what is meant by "the group of > the same number" -- or maybe I do, but it isn't explicit. Is this just > a shorthand reference to match.group(1) -- if that were valid -- > implying that the group match result is printed in the compile > execution?
If I understand you correctly you are right. Another example: >>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42") 'number=1 word=a number=42 word=zzz' For every match of "[a-z]+\d+" in the original string "\1" in "number=\2 word=\1" is replaced with the actual match for "[a-z]+" and "\2" is replaced with the actual match for "\d+". The result, e. g. "number=1 word=a", is then used to replace the actual match for group 0, i. e. "a1" in the example. Peter -- http://mail.python.org/mailman/listinfo/python-list