Re: unicode + xml

2009-09-10 Thread Laurent Luce
Still doesn't work from Windows Japanese python (2.6.2) to Django Python 2.5.2. 
Works fine from Linux python 2.5.2 to Django Python 2.5.2.

Here is the flow:

- post xml utf-8 encoded data from Windows client to Django server
- On server: pass raw_post_data to minidom.parseString() --- throws exception

Here is the code I use to post data:

url = mysite
req = urllib2.Request(url)
req.add_header('Content-Type', 'text/xml; charset=utf-8')
opener.open(req, data.encode('utf-8'))

data is the xml data
opener is a urllib2 opener I create when user logs in.

Here is the code I use to receive the data:

dom = minidom.parseString(request.raw_post_data)

default charset on django side is utf-8.

Please advise. Thanks.

Laurent



- Original Message 
From: Stefan Behnel stefan...@behnel.de
To: python-list@python.org
Sent: Monday, September 7, 2009 11:50:28 PM
Subject: Re: unicode + xml

Laurent Luce wrote:
 Can someone confirm that the issue here is I need to encode the xml data 
 using:
 # encode as UTF-8
 utf8_string = xml.encode( 'utf-8' )
 and then post it to the server.

Well, since you declared it to be UTF-8, it must be UTF-8 encoded.

However, your question seems to imply that you generate the XML manually
using string concatenation, which is a rather bad idea. Python has great
XML tools like ElementTree that help in generating and serialising XML
correctly (besides parsing, searching and other things).

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode + xml

2009-09-08 Thread Laurent Luce
Can someone confirm that the issue here is I need to encode the xml data using:
# encode as UTF-8
utf8_string = xml.encode( 'utf-8' )
and then post it to the server.

Laurent



- Original Message 
From: Laurent Luce laurentluc...@yahoo.com
To: Mark Tolonen metolone+gm...@gmail.com; python-list@python.org
Sent: Monday, September 7, 2009 10:50:22 PM
Subject: Re: unicode + xml

The xml data is generated on Windows (python 2.6.2) and sent using a post 
request to a Django server. The django server is running on Ubuntu server with 
python 2.6.2. The post data is passed to minidom for parsing.

Laurent



- Original Message 
From: Mark Tolonen metolone+gm...@gmail.com
To: python-list@python.org
Sent: Monday, September 7, 2009 9:15:15 PM
Subject: Re: unicode + xml


Laurent Luce laurentluc...@yahoo.com wrote in message 
news:255473.44957...@web54203.mail.re2.yahoo.com...
 Hello,
 
 I am trying to do the following:
 
 - read list of folders in a specific directory: os.listdir() - some folders 
 have Japanese characters
 - post list of folders as xml to a web server: I used content-type 'text/xml' 
 and I use '?xml version=1.0 encoding=utf-8?' to start the xml data.
 - on the server side (Django), I get the data using post_data and I use 
 minidom.parseString() to parse it. I get an exception because of the 
 following in the xml for one of the folder name:
 '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd['
 
 The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX
 
 Should I format the data differently inside the xml so minidom is happy ?

You aren't seeing 5 bytes for each unicode character.  You are seeing '\ufffd' 
(the code point REPLACEMENT_CHARACTER) intermixed with other characters.  The 
wrong encoding was probably used to decode the filename byte strings to Unicode.

We can give more specific help if you specify your operating system and version 
of Python used.

-Mark


-- http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


unicode + xml

2009-09-07 Thread Laurent Luce
Hello,

I am trying to do the following:

- read list of folders in a specific directory: os.listdir() - some folders 
have Japanese characters
- post list of folders as xml to a web server: I used content-type 'text/xml' 
and I use '?xml version=1.0 encoding=utf-8?' to start the xml data.
- on the server side (Django), I get the data using post_data and I use 
minidom.parseString() to parse it. I get an exception because of the following 
in the xml for one of the folder name:
'/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd['

The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX

Should I format the data differently inside the xml so minidom is happy ?

Laurent

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode + xml

2009-09-07 Thread Laurent Luce
The xml data is generated on Windows (python 2.6.2) and sent using a post 
request to a Django server. The django server is running on Ubuntu server with 
python 2.6.2. The post data is passed to minidom for parsing.

Laurent



- Original Message 
From: Mark Tolonen metolone+gm...@gmail.com
To: python-list@python.org
Sent: Monday, September 7, 2009 9:15:15 PM
Subject: Re: unicode + xml


Laurent Luce laurentluc...@yahoo.com wrote in message 
news:255473.44957...@web54203.mail.re2.yahoo.com...
 Hello,
 
 I am trying to do the following:
 
 - read list of folders in a specific directory: os.listdir() - some folders 
 have Japanese characters
 - post list of folders as xml to a web server: I used content-type 'text/xml' 
 and I use '?xml version=1.0 encoding=utf-8?' to start the xml data.
 - on the server side (Django), I get the data using post_data and I use 
 minidom.parseString() to parse it. I get an exception because of the 
 following in the xml for one of the folder name:
 '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd['
 
 The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX
 
 Should I format the data differently inside the xml so minidom is happy ?

You aren't seeing 5 bytes for each unicode character.  You are seeing '\ufffd' 
(the code point REPLACEMENT_CHARACTER) intermixed with other characters.  The 
wrong encoding was probably used to decode the filename byte strings to Unicode.

We can give more specific help if you specify your operating system and version 
of Python used.

-Mark


-- http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


windows explorer integration

2009-07-10 Thread Laurent Luce

Hello,

Do you know if it is possible to write a plugin for windows explorer using 
win32 module ?

The idea is to modify the way the folders/files are displayed in the explorer 
window and also to provide some controls.

Laurent

-- 
http://mail.python.org/mailman/listinfo/python-list


strip char from list of strings

2009-05-19 Thread Laurent Luce

I have the following list:

[ 'test\n', test2\n', 'test3\n' ]

I want to remove the '\n' from each string in place, what is the most efficient 
way to do that ?

Regards,

Laurent
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: strip char from list of strings

2009-05-19 Thread Laurent Luce
I had a simple loop stripping each string but I was looking for
something concise and efficient. I like the following answer:
x = [s.rstrip('\n') for s in x]

David Stanek wrote:
 On Mon, May 18, 2009 at 3:30 PM, Laurent Luce laurentluc...@yahoo.com wrote:
 I have the following list:

 [ 'test\n', test2\n', 'test3\n' ]

 I want to remove the '\n' from each string in place, what is the most 
 efficient way to do that ?

 
 What have you tried so far?
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: strip char from list of strings

2009-05-19 Thread Laurent Luce
Thanks Casey. I like your solution.

Casey Webster wrote:
 On May 18, 3:30 pm, Laurent Luce laurentluc...@yahoo.com wrote:
 I have the following list:

 [ 'test\n', test2\n', 'test3\n' ]

 I want to remove the '\n' from each string in place, what is the most 
 efficient way to do that ?

 Regards,

 Laurent
 
 Do you _really_ need to do this in place?  If not, the simplest answer
 is probably:
 
 x = ['test\n', test2\n', 'test3\n']
 x = [s.rstrip('\n') for s in x]
 
 And if what you really want to do is strip off all trailing whitespace
 (tabs, spaces, and newlines), then:
 
 x = [s.rstrip() for s in x]
 
 A quick test of 1,000,000 strings of length 27 took less than 0.2
 seconds on my PC.  Efficiency isn't really much of an issue for most
 data sets.
-- 
http://mail.python.org/mailman/listinfo/python-list