Steven D'Aprano wrote: > > or you can use a more well-suited function: > > > > # replace runs of _ and . with a single character > > newname = re.sub("_+", "_", newname) > > newname = re.sub("\.+", ".", newname) > > You know, I really must sit down and learn how to use > reg exes one of these days. But somehow, every time I > try, I get the feeling that the work required to learn > to use them effectively is infinitely greater than the > work required to re-invent the wheel every time.
here's all you need to understand the code above: . ^ $ * + ? ( ) [] { } | \ are reserved characters all other characters match themselves reserved characters must be escaped to match themselves; to match a dot, use \. (which the RE engine sees as \.) + means match one or more of the preceeding item so _+ matches one or more underscores, and \.+ matches one or more dots re.sub(pattern, replacement, text) replaces all matches for the given pattern in text with the given replacement string so re.sub("_+", "_", newname) replaces runs of underscores with a single underscore. > > or, slightly more obscure: > > > > newname = re.sub("([_.])\\1+", "\\1", newname) > > _Slightly_? this introduces three new concepts: [ ] defines a set of characters so [_.] will match either _ or . ( ) defines a group of matched characters. \\1 (which the RE engine sees as \1) refers to the first group this can be used both in the pattern and in the replacement string so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting of either a . or an _ followed by one or more copies of itself, with a single instance of itself. (using r-strings lets you remove some of extra backslashes, btw) </F> -- http://mail.python.org/mailman/listinfo/python-list