Re: unicode + xml
Still doesn't work from Windows Japanese python (2.6.2) to Django Python 2.5.2. Works fine from Linux python 2.5.2 to Django Python 2.5.2. Here is the flow: - post xml utf-8 encoded data from Windows client to Django server - On server: pass raw_post_data to minidom.parseString() --- throws exception Here is the code I use to post data: url = mysite req = urllib2.Request(url) req.add_header('Content-Type', 'text/xml; charset=utf-8') opener.open(req, data.encode('utf-8')) data is the xml data opener is a urllib2 opener I create when user logs in. Here is the code I use to receive the data: dom = minidom.parseString(request.raw_post_data) default charset on django side is utf-8. Please advise. Thanks. Laurent - Original Message From: Stefan Behnel stefan...@behnel.de To: python-list@python.org Sent: Monday, September 7, 2009 11:50:28 PM Subject: Re: unicode + xml Laurent Luce wrote: Can someone confirm that the issue here is I need to encode the xml data using: # encode as UTF-8 utf8_string = xml.encode( 'utf-8' ) and then post it to the server. Well, since you declared it to be UTF-8, it must be UTF-8 encoded. However, your question seems to imply that you generate the XML manually using string concatenation, which is a rather bad idea. Python has great XML tools like ElementTree that help in generating and serialising XML correctly (besides parsing, searching and other things). Stefan -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode + xml
Can someone confirm that the issue here is I need to encode the xml data using: # encode as UTF-8 utf8_string = xml.encode( 'utf-8' ) and then post it to the server. Laurent - Original Message From: Laurent Luce laurentluc...@yahoo.com To: Mark Tolonen metolone+gm...@gmail.com; python-list@python.org Sent: Monday, September 7, 2009 10:50:22 PM Subject: Re: unicode + xml The xml data is generated on Windows (python 2.6.2) and sent using a post request to a Django server. The django server is running on Ubuntu server with python 2.6.2. The post data is passed to minidom for parsing. Laurent - Original Message From: Mark Tolonen metolone+gm...@gmail.com To: python-list@python.org Sent: Monday, September 7, 2009 9:15:15 PM Subject: Re: unicode + xml Laurent Luce laurentluc...@yahoo.com wrote in message news:255473.44957...@web54203.mail.re2.yahoo.com... Hello, I am trying to do the following: - read list of folders in a specific directory: os.listdir() - some folders have Japanese characters - post list of folders as xml to a web server: I used content-type 'text/xml' and I use '?xml version=1.0 encoding=utf-8?' to start the xml data. - on the server side (Django), I get the data using post_data and I use minidom.parseString() to parse it. I get an exception because of the following in the xml for one of the folder name: '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd[' The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX Should I format the data differently inside the xml so minidom is happy ? You aren't seeing 5 bytes for each unicode character. You are seeing '\ufffd' (the code point REPLACEMENT_CHARACTER) intermixed with other characters. The wrong encoding was probably used to decode the filename byte strings to Unicode. We can give more specific help if you specify your operating system and version of Python used. -Mark -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
unicode + xml
Hello, I am trying to do the following: - read list of folders in a specific directory: os.listdir() - some folders have Japanese characters - post list of folders as xml to a web server: I used content-type 'text/xml' and I use '?xml version=1.0 encoding=utf-8?' to start the xml data. - on the server side (Django), I get the data using post_data and I use minidom.parseString() to parse it. I get an exception because of the following in the xml for one of the folder name: '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd[' The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX Should I format the data differently inside the xml so minidom is happy ? Laurent -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode + xml
The xml data is generated on Windows (python 2.6.2) and sent using a post request to a Django server. The django server is running on Ubuntu server with python 2.6.2. The post data is passed to minidom for parsing. Laurent - Original Message From: Mark Tolonen metolone+gm...@gmail.com To: python-list@python.org Sent: Monday, September 7, 2009 9:15:15 PM Subject: Re: unicode + xml Laurent Luce laurentluc...@yahoo.com wrote in message news:255473.44957...@web54203.mail.re2.yahoo.com... Hello, I am trying to do the following: - read list of folders in a specific directory: os.listdir() - some folders have Japanese characters - post list of folders as xml to a web server: I used content-type 'text/xml' and I use '?xml version=1.0 encoding=utf-8?' to start the xml data. - on the server side (Django), I get the data using post_data and I use minidom.parseString() to parse it. I get an exception because of the following in the xml for one of the folder name: '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd[' The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX Should I format the data differently inside the xml so minidom is happy ? You aren't seeing 5 bytes for each unicode character. You are seeing '\ufffd' (the code point REPLACEMENT_CHARACTER) intermixed with other characters. The wrong encoding was probably used to decode the filename byte strings to Unicode. We can give more specific help if you specify your operating system and version of Python used. -Mark -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
windows explorer integration
Hello, Do you know if it is possible to write a plugin for windows explorer using win32 module ? The idea is to modify the way the folders/files are displayed in the explorer window and also to provide some controls. Laurent -- http://mail.python.org/mailman/listinfo/python-list
strip char from list of strings
I have the following list: [ 'test\n', test2\n', 'test3\n' ] I want to remove the '\n' from each string in place, what is the most efficient way to do that ? Regards, Laurent -- http://mail.python.org/mailman/listinfo/python-list
Re: strip char from list of strings
I had a simple loop stripping each string but I was looking for something concise and efficient. I like the following answer: x = [s.rstrip('\n') for s in x] David Stanek wrote: On Mon, May 18, 2009 at 3:30 PM, Laurent Luce laurentluc...@yahoo.com wrote: I have the following list: [ 'test\n', test2\n', 'test3\n' ] I want to remove the '\n' from each string in place, what is the most efficient way to do that ? What have you tried so far? -- http://mail.python.org/mailman/listinfo/python-list
Re: strip char from list of strings
Thanks Casey. I like your solution. Casey Webster wrote: On May 18, 3:30 pm, Laurent Luce laurentluc...@yahoo.com wrote: I have the following list: [ 'test\n', test2\n', 'test3\n' ] I want to remove the '\n' from each string in place, what is the most efficient way to do that ? Regards, Laurent Do you _really_ need to do this in place? If not, the simplest answer is probably: x = ['test\n', test2\n', 'test3\n'] x = [s.rstrip('\n') for s in x] And if what you really want to do is strip off all trailing whitespace (tabs, spaces, and newlines), then: x = [s.rstrip() for s in x] A quick test of 1,000,000 strings of length 27 took less than 0.2 seconds on my PC. Efficiency isn't really much of an issue for most data sets. -- http://mail.python.org/mailman/listinfo/python-list