On Mar 27, 9:59 am, [EMAIL PROTECTED] wrote: > I've been using the xml.sax.handler module to do event-driven parsing > of XML files in this python application I'm working on. However, I > keep having really pesky invalid token exceptions. Initially, I was > only getting them on control characters, and a little "sed -e 's/ > [^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've > been getting these invalid token excpetions with n-tildes (like the n > in EspaƱa), smart/fancy/curly quotes and other seemingly harmless > characters. Specifying encoding="utf-8" in the xml header hasn't > helped matters. > > Any ideas? As a last resort, I'd be willing to scrub invalid > characters.... it just seems strange that curly quotes and n-tildes > wouldn't be valid XML! Is that really the case? > > TIA! > > Jason
Are you making sure to encode the strings you pass into the parser in UTF-8 or UTF-16? This article was illuminating in that respect and may be helpful in diagnosing your problem: http://www.xml.com/pub/a/2002/11/13/py-xml.html?page=2 Mike -- http://mail.python.org/mailman/listinfo/python-list