I've got an XML feed from a vendor that is not well-formed, and having them change it is not an option. I'm trying to figure out how to create an error-handler that will ignore the invalid token and continue on.
The file is large, so I'd prefer not to put it all in memory or save it off and strip out the bad characters before I parse it. I've included one of the problematic characters in a small XML snippet below. I'm new to Python, and I don't know how to accomplish this. Any help is greatly appreciated! ----------------------------------------------------------------- Here is my code: from xml.sax import make_parser from xml.sax.handler import ContentHandler import StringIO class ErrorHandler: def __init__(self, parser): self.parser = parser def warning(self, msg): print '*** (ErrorHandler.warning) msg:', msg def error(self, msg): print '*** (ErrorHandler.error) msg:', msg def fatalError(self, msg): print msg class ContentHandler(ContentHandler): def __init__ (self): pass def startElement(self, name, attrs): pass def characters (self, ch): pass def endElement(self, name): pass xmlstr = """ <cities> <city> <name>Tampa</name> <description>A great city and place to live</description> </city> <city> <name>Clearwater</name> <description>Beautiful beaches</description> </city> </cities> """ parser = make_parser() curHandler = ContentHandler() errorHandler = ErrorHandler(parser) parser.setContentHandler(curHandler) parser.setErrorHandler(errorHandler) parser.parse(StringIO.StringIO(xmlstr)) -- http://mail.python.org/mailman/listinfo/python-list