>> Well, if all you want to do is remove everything from a "<" to a >> ">", you can use >> >> >>> s = "<B>Today</B> is <U>Friday</U>" >> >>> import re >> >>> r = re.compile('<[^>]*>') >> >>> print r.sub('', s) >> Today is Friday >> [Tim's ramblings about pathological cases snipped] > > The real answer to this question is "learn how to use Beautiful Soup" -- > see http://www.crummy.com/software/BeautifulSoup/
Yes, for more pathological cases, BS does a great job of parsing junk :) However, as BS isn't batteries-included [Aside: BS and pyparsing are two common solutions to problems that would make great additions to the standard library], using a RE to make a best-effort guess is a good first approximation of a solution without needing to download extra packages--no matter how useful those extra packages may be. -tkc -- http://mail.python.org/mailman/listinfo/python-list