H! I do this to get a htmlTOtext file
class mvbHTMLParser(htmllib.HTMLParser): def __init__(self, formatter, verbose=0): htmllib.HTMLParser.__init__(self,formatter,verbose) self.imglist = [] def handle_image(self,src,alt,*args): self.imglist.append(src) file = StringIO.StringIO() f = formatter.AbstractFormatter(formatter.DumbWriter(file)) p = mvbHTMLParser(f) p.feed(html) p.close() print file.getvalue() But then the _ characters are away. is it possible to keep that character in file.getvalue() [the p.anchorlist = oke : test_bla.html] -- http://mail.python.org/mailman/listinfo/python-list