[sqlalchemy] Re: utf hex instead of utf-8 return
you should also be on MySQLdb 1.2.2. Using the Unicode type in conjunction with charset=utf8use_unicode=0 and always passing Python unicode (u'') objects is the general recipe for unicode with MySQL. All this means is that SQLA sends utf-8-encoded strings to MySQLdb, MySQLdb does not try to encode them itself and makes MySQL aware the data should be considered as utf-8. I'm not sure what version of MySQL you're on or how older versions of that might get in the way. On Dec 6, 2008, at 1:26 PM, n00b wrote: thanks for the quick reply. i kept trying with it and no have reached the utter state of confusion. the specification of Unicode versus String in the table def's coupled with actual str representation has my totally confused. here's a quick script, have a look at the mysql table itself to see character display: #!/usr/bin/env python # -*- coding: utf-8 -*- import os, sys import unicodedata from sqlalchemy import * from sqlalchemy.orm import * #set db import MySQLdb db = MySQLdb.connect(host='localhost', user='root', passwd='', db='xxx', use_unicode=True, charset='utf8') cur = db.cursor() cur.execute('SET NAMES utf8') cur.execute('SET CHARACTER SET utf8') cur.execute('SET character_set_connection=utf8') cur.execute('SET character_set_server=utf8') cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') print cur.fetchall() utf_repr = '\xc3\xab' hex_repr = '\xeb' mysql_url = 'mysql://root:@localhost/xxx' connect_args = {'charset':'utf8', 'use_unicode':'0'} engine = create_engine(mysql_url, connect_args=connect_args) metadata = MetaData() test_table = Table('encoding_test', metadata, Column(u'id', Integer, primary_key=True), Column(u'unicode', Integer), Column(u'u_hex', Unicode(10)), Column(u'u_utf', Unicode(10)), Column(u'u_str', Unicode(10)), Column(u's_hex', String(10)), Column(u's_utf', String(10)), Column(u's_str', String(10)) ) class EncodingTest(object): pass mapper(EncodingTest, test_table) metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() et = EncodingTest() et.unicode = 1 et.u_str = u'ë' et.u_hex = u'\xeb' et.u_utf = u'\xc3\xab' et.s_str = u'ë' et.s_hex = u'\xeb' et.s_utf = u'\xc3\xab' session.add(et) session.commit() et = EncodingTest() et.unicode = 0 et.u_str = 'ë' et.u_hex = '\xeb' et.u_utf = '\xc3\xab' et.s_str = 'ë' et.s_hex = '\xeb' et.s_utf = '\xc3\xab' session.add(et) session.commit() session.close() session = Session() results = session.query(EncodingTest).all() for result in results: print result.unicode print repr(result.u_hex), repr(result.u_utf), repr(result.u_str) print repr(result.s_hex), repr(result.s_utf), repr(result.s_str) print in addition, i don't seem to be able to run the mysql settings (# set db) from SA. any insights are greatly appreciated. btw, the use_unciode, either in MySQLdb or SA, doesn't seem to have any effect on results. thx On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote: I'm not sure of the mechanics of what you're experiencing, but make sure you use charset=utf8use_unicode=0 with MySQL. On Dec 5, 2008, at 4:17 PM, n00b wrote: greetings, SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the process driving me batty. all the mysql setup is fine, the chars look good and are umlauting to goethe's delight. moreover, insert and select are working perfectly with the MySQLdb api on three different *nix systems, two servers, ... it works. where things fall apart is on the retrieval side of SA; inserts are fine (using the config_args = {'charset':'utf8'} dict in the create_engine call). for example, ë, the latin small letter e with diaeresis, is stored in mysql hex as C3 AB; using the MySQldb client, this is exactly what i get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no further codecs work required. SA, on the other hand, hands me back the utf-hex representation, '\xeb'. there must be some setting that i'm missing that'll give the appropriate utf-8 representation at the SA (api) level. any ideas, suggestions? thx yes, i could do '\xeb'.encode('utf8) but it's not an option. we got too much data to deal with and MySQLdb is working perfectly well without the extra step. thx. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: utf hex instead of utf-8 return
thanks!! you just confirmed my empirical observations, which puts me very much at ease :) for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql (community ed.) thank again. On Dec 7, 8:52 am, Michael Bayer [EMAIL PROTECTED] wrote: you should also be on MySQLdb 1.2.2. Using the Unicode type in conjunction with charset=utf8use_unicode=0 and always passing Python unicode (u'') objects is the general recipe for unicode with MySQL. All this means is that SQLA sends utf-8-encoded strings to MySQLdb, MySQLdb does not try to encode them itself and makes MySQL aware the data should be considered as utf-8. I'm not sure what version of MySQL you're on or how older versions of that might get in the way. On Dec 6, 2008, at 1:26 PM, n00b wrote: thanks for the quick reply. i kept trying with it and no have reached the utter state of confusion. the specification of Unicode versus String in the table def's coupled with actual str representation has my totally confused. here's a quick script, have a look at the mysql table itself to see character display: #!/usr/bin/env python # -*- coding: utf-8 -*- import os, sys import unicodedata from sqlalchemy import * from sqlalchemy.orm import * #set db import MySQLdb db = MySQLdb.connect(host='localhost', user='root', passwd='', db='xxx', use_unicode=True, charset='utf8') cur = db.cursor() cur.execute('SET NAMES utf8') cur.execute('SET CHARACTER SET utf8') cur.execute('SET character_set_connection=utf8') cur.execute('SET character_set_server=utf8') cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') print cur.fetchall() utf_repr = '\xc3\xab' hex_repr = '\xeb' mysql_url = 'mysql://root:@localhost/xxx' connect_args = {'charset':'utf8', 'use_unicode':'0'} engine = create_engine(mysql_url, connect_args=connect_args) metadata = MetaData() test_table = Table('encoding_test', metadata, Column(u'id', Integer, primary_key=True), Column(u'unicode', Integer), Column(u'u_hex', Unicode(10)), Column(u'u_utf', Unicode(10)), Column(u'u_str', Unicode(10)), Column(u's_hex', String(10)), Column(u's_utf', String(10)), Column(u's_str', String(10)) ) class EncodingTest(object): pass mapper(EncodingTest, test_table) metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() et = EncodingTest() et.unicode = 1 et.u_str = u'ë' et.u_hex = u'\xeb' et.u_utf = u'\xc3\xab' et.s_str = u'ë' et.s_hex = u'\xeb' et.s_utf = u'\xc3\xab' session.add(et) session.commit() et = EncodingTest() et.unicode = 0 et.u_str = 'ë' et.u_hex = '\xeb' et.u_utf = '\xc3\xab' et.s_str = 'ë' et.s_hex = '\xeb' et.s_utf = '\xc3\xab' session.add(et) session.commit() session.close() session = Session() results = session.query(EncodingTest).all() for result in results: print result.unicode print repr(result.u_hex), repr(result.u_utf), repr(result.u_str) print repr(result.s_hex), repr(result.s_utf), repr(result.s_str) print in addition, i don't seem to be able to run the mysql settings (# set db) from SA. any insights are greatly appreciated. btw, the use_unciode, either in MySQLdb or SA, doesn't seem to have any effect on results. thx On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote: I'm not sure of the mechanics of what you're experiencing, but make sure you use charset=utf8use_unicode=0 with MySQL. On Dec 5, 2008, at 4:17 PM, n00b wrote: greetings, SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the process driving me batty. all the mysql setup is fine, the chars look good and are umlauting to goethe's delight. moreover, insert and select are working perfectly with the MySQLdb api on three different *nix systems, two servers, ... it works. where things fall apart is on the retrieval side of SA; inserts are fine (using the config_args = {'charset':'utf8'} dict in the create_engine call). for example, ë, the latin small letter e with diaeresis, is stored in mysql hex as C3 AB; using the MySQldb client, this is exactly what i get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no further codecs work required. SA, on the other hand, hands me back the utf-hex representation, '\xeb'. there must be some setting that i'm missing that'll give the appropriate utf-8 representation at the SA (api) level. any ideas, suggestions? thx yes, i could do '\xeb'.encode('utf8) but it's not an option. we got too much data to deal with and MySQLdb is working perfectly well without the extra step. thx. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more
[sqlalchemy] Re: utf hex instead of utf-8 return
I'm not sure if that was sarcasm or not...if so, consider the time better spent analyzing the issue. The attached test illustrates a round trip of unicode data containing multibyte codepoints in both directions using both a raw cursor as well as a SQLAlchemy engine. Use this as a guide with regards to how to send and receive unicode data. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~--- # -*- coding: utf-8 -*- import os, sys from sqlalchemy import * from sqlalchemy.orm import * engine = create_engine('mysql://scott:[EMAIL PROTECTED]/test?use_unicode=0charset=utf8', echo=True) m = MetaData() test_table = Table('encoding_test', m, Column('data', Unicode(255)) ) m.drop_all(engine) m.create_all(engine) import MySQLdb db = MySQLdb.connect(host='localhost', user='scott', passwd='tiger', db='test', use_unicode=True, charset='utf8') thedata = uAlors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix mâa réveillé. Elle disait: « Sâil vous plaît⦠dessine-moi un mouton! » # insert unicode data with MySQLdb cursor = db.cursor() cursor.execute(insert into encoding_test values(%s), thedata) cursor.close() db.commit() # insert unicode data with SQLAlchemy engine.execute(test_table.insert(), data=thedata) # retrieve both rows with MySQLdb cursor = db.cursor() cursor.execute(select data from encoding_test) back_from_mysql = [x[0] for x in cursor.fetchall()] # retrieve both with SQLAlchemy back_from_sqla = [x[0] for x in engine.execute(test_table.select()).fetchall()] # put all the strings in a set - they are all identical and it therefore has length one assert len(set(back_from_mysql + back_from_sqla + [thedata])) == 1 # MySQL agrees that both rows are identical since COUNT DISTINCT returns one cursor = db.cursor() cursor.execute(select count(distinct data) from encoding_test) assert cursor.fetchone()[0] == 1 cursor.close() for x in [thedata] + back_from_mysql + back_from_sqla: print x.encode('utf-8') On Dec 7, 2008, at 12:00 PM, n00b wrote: thanks!! you just confirmed my empirical observations, which puts me very much at ease :) for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql (community ed.) thank again. On Dec 7, 8:52 am, Michael Bayer [EMAIL PROTECTED] wrote: you should also be on MySQLdb 1.2.2. Using the Unicode type in conjunction with charset=utf8use_unicode=0 and always passing Python unicode (u'') objects is the general recipe for unicode with MySQL. All this means is that SQLA sends utf-8-encoded strings to MySQLdb, MySQLdb does not try to encode them itself and makes MySQL aware the data should be considered as utf-8. I'm not sure what version of MySQL you're on or how older versions of that might get in the way. On Dec 6, 2008, at 1:26 PM, n00b wrote: thanks for the quick reply. i kept trying with it and no have reached the utter state of confusion. the specification of Unicode versus String in the table def's coupled with actual str representation has my totally confused. here's a quick script, have a look at the mysql table itself to see character display: #!/usr/bin/env python # -*- coding: utf-8 -*- import os, sys import unicodedata from sqlalchemy import * from sqlalchemy.orm import * #set db import MySQLdb db = MySQLdb.connect(host='localhost', user='root', passwd='', db='xxx', use_unicode=True, charset='utf8') cur = db.cursor() cur.execute('SET NAMES utf8') cur.execute('SET CHARACTER SET utf8') cur.execute('SET character_set_connection=utf8') cur.execute('SET character_set_server=utf8') cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') print cur.fetchall() utf_repr = '\xc3\xab' hex_repr = '\xeb' mysql_url = 'mysql://root:@localhost/xxx' connect_args = {'charset':'utf8', 'use_unicode':'0'} engine = create_engine(mysql_url, connect_args=connect_args) metadata = MetaData() test_table = Table('encoding_test', metadata, Column(u'id', Integer, primary_key=True), Column(u'unicode', Integer), Column(u'u_hex', Unicode(10)), Column(u'u_utf', Unicode(10)), Column(u'u_str', Unicode(10)), Column(u's_hex', String(10)), Column(u's_utf', String(10)), Column(u's_str', String(10)) ) class EncodingTest(object): pass mapper(EncodingTest, test_table) metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() et = EncodingTest() et.unicode = 1 et.u_str = u'ë' et.u_hex = u'\xeb' et.u_utf = u'\xc3\xab' et.s_str = u'ë' et.s_hex = u'\xeb' et.s_utf = u'\xc3\xab' session.add(et) session.commit() et = EncodingTest() et.unicode = 0
[sqlalchemy] Re: utf hex instead of utf-8 return
thanks for the quick reply. i kept trying with it and no have reached the utter state of confusion. the specification of Unicode versus String in the table def's coupled with actual str representation has my totally confused. here's a quick script, have a look at the mysql table itself to see character display: #!/usr/bin/env python # -*- coding: utf-8 -*- import os, sys import unicodedata from sqlalchemy import * from sqlalchemy.orm import * #set db import MySQLdb db = MySQLdb.connect(host='localhost', user='root', passwd='', db='xxx', use_unicode=True, charset='utf8') cur = db.cursor() cur.execute('SET NAMES utf8') cur.execute('SET CHARACTER SET utf8') cur.execute('SET character_set_connection=utf8') cur.execute('SET character_set_server=utf8') cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') print cur.fetchall() utf_repr = '\xc3\xab' hex_repr = '\xeb' mysql_url = 'mysql://root:@localhost/xxx' connect_args = {'charset':'utf8', 'use_unicode':'0'} engine = create_engine(mysql_url, connect_args=connect_args) metadata = MetaData() test_table = Table('encoding_test', metadata, Column(u'id', Integer, primary_key=True), Column(u'unicode', Integer), Column(u'u_hex', Unicode(10)), Column(u'u_utf', Unicode(10)), Column(u'u_str', Unicode(10)), Column(u's_hex', String(10)), Column(u's_utf', String(10)), Column(u's_str', String(10)) ) class EncodingTest(object): pass mapper(EncodingTest, test_table) metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() et = EncodingTest() et.unicode = 1 et.u_str = u'ë' et.u_hex = u'\xeb' et.u_utf = u'\xc3\xab' et.s_str = u'ë' et.s_hex = u'\xeb' et.s_utf = u'\xc3\xab' session.add(et) session.commit() et = EncodingTest() et.unicode = 0 et.u_str = 'ë' et.u_hex = '\xeb' et.u_utf = '\xc3\xab' et.s_str = 'ë' et.s_hex = '\xeb' et.s_utf = '\xc3\xab' session.add(et) session.commit() session.close() session = Session() results = session.query(EncodingTest).all() for result in results: print result.unicode print repr(result.u_hex), repr(result.u_utf), repr(result.u_str) print repr(result.s_hex), repr(result.s_utf), repr(result.s_str) print in addition, i don't seem to be able to run the mysql settings (# set db) from SA. any insights are greatly appreciated. btw, the use_unciode, either in MySQLdb or SA, doesn't seem to have any effect on results. thx On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote: I'm not sure of the mechanics of what you're experiencing, but make sure you use charset=utf8use_unicode=0 with MySQL. On Dec 5, 2008, at 4:17 PM, n00b wrote: greetings, SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the process driving me batty. all the mysql setup is fine, the chars look good and are umlauting to goethe's delight. moreover, insert and select are working perfectly with the MySQLdb api on three different *nix systems, two servers, ... it works. where things fall apart is on the retrieval side of SA; inserts are fine (using the config_args = {'charset':'utf8'} dict in the create_engine call). for example, ë, the latin small letter e with diaeresis, is stored in mysql hex as C3 AB; using the MySQldb client, this is exactly what i get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no further codecs work required. SA, on the other hand, hands me back the utf-hex representation, '\xeb'. there must be some setting that i'm missing that'll give the appropriate utf-8 representation at the SA (api) level. any ideas, suggestions? thx yes, i could do '\xeb'.encode('utf8) but it's not an option. we got too much data to deal with and MySQLdb is working perfectly well without the extra step. thx. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: utf hex instead of utf-8 return
I'm not sure of the mechanics of what you're experiencing, but make sure you use charset=utf8use_unicode=0 with MySQL. On Dec 5, 2008, at 4:17 PM, n00b wrote: greetings, SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the process driving me batty. all the mysql setup is fine, the chars look good and are umlauting to goethe's delight. moreover, insert and select are working perfectly with the MySQLdb api on three different *nix systems, two servers, ... it works. where things fall apart is on the retrieval side of SA; inserts are fine (using the config_args = {'charset':'utf8'} dict in the create_engine call). for example, ë, the latin small letter e with diaeresis, is stored in mysql hex as C3 AB; using the MySQldb client, this is exactly what i get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no further codecs work required. SA, on the other hand, hands me back the utf-hex representation, '\xeb'. there must be some setting that i'm missing that'll give the appropriate utf-8 representation at the SA (api) level. any ideas, suggestions? thx yes, i could do '\xeb'.encode('utf8) but it's not an option. we got too much data to deal with and MySQLdb is working perfectly well without the extra step. thx. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---