> I have some marked up text and would like to convert it to plain text, > by simply removing all the tags. Of course I can do it from first > principles but I felt that among all Python's markup tools there must > be something that would do this simply, without having to create an > XML parser etc. > > I've looked around a bit but failed to find anything, any tips? > > (e.g. convert "<B>Today</B> is <U>Friday</U>" to "Today is Friday")
Well, if all you want to do is remove everything from a "<" to a ">", you can use >>> s = "<B>Today</B> is <U>Friday</U>" >>> import re >>> r = re.compile('<[^>]*>') >>> print r.sub('', s) Today is Friday it should even work for semi-pathological cases such as s = """You can find my <a href='http://example.com'>thesis</a > online""" where the tag contents are split across lines. There are more pathological cases where tags aren't well-formed, e.g. s ="This <tag>has a > sign in it and <odd<ly>-nested> tags" in which case you get what you deserve for making such pathological conditions ;-) -tkc -- http://mail.python.org/mailman/listinfo/python-list