Graham,

Thanks I didn't realize I had actually sent this one to the python- dev list by mistake. I will go ahead and get it over to the right list now.

I am also running 3.2.9 with two minor modifications (MySQL Sessions, and a try: except: around a split in util.py to fix a bug with Flash 8 and file uploads, which we must support). Additionally we're using mod_python.psp as the handler.

Mahalo,
earle.


On Aug 11, 2006, at 1:05 PM, Graham Dumpleton wrote:

For future reference, a general question like this is better posted to the mod_python user mailing list and not the developer mailing list as it isn't related to internal development of mod_python. There are also a lot more
people on the user mailing list with much more diverse knowledge and
thus you might get a quicker/better answer on the user mailing list.

Anyway, lets see if anyone comes up with anything on the developer
mailing list, but if you don't get an answer in a day or so, you might instead
post it to the more general user mailing list.

The user mailing list is the one mentioned on the mod_python home page.

BTW, changing default character encoding in Python site.py is I believe not generally seen as a good idea. It would also help in future if you specify which version of mod_python you are using and in the case of PSP whether you are triggering PSP direct with mod_python.psp as the handler or whether you are
manually using PSP objects from a mod_python.publisher handler.

Except for those comments I am not a Unicode person so don't know the ins
and outs of using Unicode with mod_python.

Graham

On 12/08/2006, at 8:33 AM, Earle Ady wrote:

Aloha!

I've done some searching online regarding character encoding and UTF-8 support within mod_python, but haven't been able to get the proper functionality out of mod_python.

Here's the situation: I have changed my site.py in Python 2.4.3 to use "utf-8" as the default encoding. I have a database with correct unicode representations in it. I execute routines from the interpreter and get correct unicode objects out of the database. When I run these exact routines from inside of a PSP page, the unicode object has now been latin1 decoded. Please note that from the examples below that I am using identical MySQLdb connection settings.

I am still a bit unclear as to where exactly this is happening inside of mod_python, and any advice to a solution would be greatly appreciated. It's pretty critical that a developer can provide UTF-8 support in order for mod_python to gain traction in enterprise applications.

If this is a user error on my part, I'd greatly appreciate being pointed to a proper solution.

Best,
earle.

------  THIS WORKS FROM WITHIN THE INTERPRETER:
(conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)

cursor.execute("SELECT * from unicode_test")
items = cursor.fetchall()

for item in items:
        print item,

# RESULTS:  correct unicode:
# ([EMAIL PROTECTED] 14:55 266) python utest.py
# {'data': u'\u9577\u5ca1', 'id': 35L}
# {'data': u'\u9577\u5ca1', 'id': 36L}


------- THIS DOES NOT WORK FROM .PSP, it produces a latin1 decoded unicode object of the correct unicode (see below):

<%
req.content_type = 'text/html;charset=UTF-8;';
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm
l1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"; dir="ltr" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<%@ include file="include/webglobals.psps" %>
<%
(conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)
req.write("MYSQL CONNECTION CHARSET: ")
req.write(conn.character_set_name())
req.write("<p/>")
req.write("SYS.DEFAULTENCODING: ")
req.write(sys.getdefaultencoding())
req.write("<p/>")

res = cursor.execute("SELECT * from unicode_test")
items = cursor.fetchall()

for i in items:
        #
        req.write("DATA: ")
        req.write(i['data'])
        req.write(", item: ")
%>
<%= i %>
<%
        req.write(",  BYTES: ")
        req.write(i['data'].encode('unicode_escape'))

        req.write("<p/>")
        #
# end: items

req.write("SHOULD LOOK LIKE THIS: %s" % ( u'\u9577\u5ca1', ))
%>
</body>
</html>

---- RESULTS:

MYSQL CONNECTION CHARSET: utf8

SYS.DEFAULTENCODING: utf-8

DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1', 'id': 35L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1

DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1', 'id': 36L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1

SHOULD LOOK LIKE THIS: 長岡


------- Notice if I latin1 decode the -correct- unicode object, i get the exact
unicode object that is appearing inside of the PSP:

>>> u'\u9577\u5ca1'.decode('latin1')
u'\xe9\x95\xb7\xe5\xb2\xa1'








Reply via email to