Rickard Lindberg, 08.02.2011 16:57:
Hi,Here is a bash script to reproduce my error: #!/bin/sh cat> å.timeline<<EOF <?xml version="1.0" encoding="utf-8"?> <timeline> <version>0.13.0devb38ace0a572b+</version> <categories> </categories> <events> <event> <start>2011-02-01 00:00:00</start> <end>2011-02-03 08:46:00</end> <text>asdsd</text> </event> </events> <view> <displayed_period> <start>2011-01-24 16:38:11</start> <end>2011-02-23 16:38:11</end> </displayed_period> <hidden_categories> </hidden_categories> </view> </timeline> EOF python<<EOF # encoding: utf-8 from xml.sax import parse from xml.sax.handler import ContentHandler parse(u"å.timeline", ContentHandler()) EOF If I instead do parse(u"å.timeline".encode("utf-8"), ContentHandler()) the script runs without errors. Is this a bug or expected behavior?
Expected behaviour. You cannot parse XML from unicode strings, especially not when the XML data explicitly declares itself as being encoded in UTF-8.
Parse from a byte string instead, as you do in your fixed code. Stefan -- http://mail.python.org/mailman/listinfo/python-list
