Re: unicode + xml

Mark Tolonen Mon, 07 Sep 2009 21:16:56 -0700

"Laurent Luce" <[email protected]> wrote in messagenews:[email protected]...

Hello,
I am trying to do the following:
- read list of folders in a specific directory: os.listdir() - somefolders have Japanese characters- post list of folders as xml to a web server: I used content-type'text/xml' and I use '<?xml version="1.0" encoding="utf-8"?>' to start thexml data.- on the server side (Django), I get the data using post_data and I useminidom.parseString() to parse it. I get an exception because of thefollowing in the xml for one of the folder name:
'/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd['
The weird thing is that I see 5 bytes for each unicode character: ie:/ufffdX
Should I format the data differently inside the xml so minidom is happy ?

You aren't seeing 5 bytes for each unicode character. You are seeing'\ufffd' (the code point REPLACEMENT_CHARACTER) intermixed with othercharacters. The wrong encoding was probably used to decode the filenamebyte strings to Unicode.

We can give more specific help if you specify your operating system andversion of Python used.


-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode + xml

Reply via email to