[sqlalchemy] Re: utf hex instead of utf-8 return

2008-12-07 Thread Michael Bayer

you should also be on MySQLdb 1.2.2.  Using the Unicode type in  
conjunction with charset=utf8use_unicode=0 and always passing Python  
unicode (u'') objects is the general recipe for unicode with MySQL. 
All this means is that SQLA sends utf-8-encoded strings to MySQLdb,  
MySQLdb does not try to encode them itself and makes MySQL aware the  
data should be considered as utf-8.   I'm not sure what version of  
MySQL you're on or how older versions of that might get in the way.

On Dec 6, 2008, at 1:26 PM, n00b wrote:


 thanks for the quick reply. i kept trying with it and no have reached
 the utter state of confusion.
 the specification of Unicode versus String in the table def's coupled
 with actual str representation
 has my totally confused. here's a quick script, have a look at the
 mysql table itself to see character
 display:

 #!/usr/bin/env python
 # -*- coding: utf-8 -*-

 import os, sys
 import unicodedata

 from sqlalchemy import *
 from sqlalchemy.orm import *

 #set db
 import MySQLdb
 db = MySQLdb.connect(host='localhost', user='root', passwd='',
 db='xxx', use_unicode=True, charset='utf8')
 cur = db.cursor()
 cur.execute('SET NAMES utf8')
 cur.execute('SET CHARACTER SET utf8')
 cur.execute('SET character_set_connection=utf8')
 cur.execute('SET character_set_server=utf8')
 cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
 print cur.fetchall()

 utf_repr = '\xc3\xab'
 hex_repr = '\xeb'

 mysql_url = 'mysql://root:@localhost/xxx'
 connect_args = {'charset':'utf8', 'use_unicode':'0'}
 engine = create_engine(mysql_url, connect_args=connect_args)
 metadata = MetaData()


 test_table = Table('encoding_test', metadata,
Column(u'id', Integer, primary_key=True),
Column(u'unicode', Integer),
Column(u'u_hex', Unicode(10)),
Column(u'u_utf', Unicode(10)),
Column(u'u_str', Unicode(10)),
Column(u's_hex', String(10)),
Column(u's_utf', String(10)),
Column(u's_str', String(10))
 )

 class EncodingTest(object): pass

 mapper(EncodingTest, test_table)

 metadata.create_all(engine)
 Session = sessionmaker(bind=engine)

 session = Session()
 et = EncodingTest()
 et.unicode = 1
 et.u_str = u'ë'
 et.u_hex = u'\xeb'
 et.u_utf = u'\xc3\xab'
 et.s_str = u'ë'
 et.s_hex = u'\xeb'
 et.s_utf = u'\xc3\xab'
 session.add(et)
 session.commit()
 et = EncodingTest()
 et.unicode = 0
 et.u_str = 'ë'
 et.u_hex = '\xeb'
 et.u_utf = '\xc3\xab'
 et.s_str = 'ë'
 et.s_hex = '\xeb'
 et.s_utf = '\xc3\xab'
 session.add(et)
 session.commit()
 session.close()

 session = Session()
 results = session.query(EncodingTest).all()
 for result in results:
print result.unicode
print repr(result.u_hex), repr(result.u_utf), repr(result.u_str)
print repr(result.s_hex), repr(result.s_utf), repr(result.s_str)
print

 in addition, i don't seem to be able to run the mysql settings (# set
 db) from SA.
 any insights are greatly appreciated. btw, the use_unciode, either in
 MySQLdb or SA,
 doesn't seem to have any effect on results.

 thx

 On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote:
 I'm not sure of the mechanics of what you're experiencing, but make
 sure you use charset=utf8use_unicode=0 with MySQL.

 On Dec 5, 2008, at 4:17 PM, n00b wrote:



 greetings,

 SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the
 process driving me batty.  all the mysql setup is fine, the chars  
 look
 good and are umlauting to goethe's delight. moreover, insert and
 select are working perfectly with the MySQLdb api on three different
 *nix systems, two servers, ... it works.

 where things fall apart is on the retrieval side of SA; inserts are
 fine (using the config_args = {'charset':'utf8'} dict in the
 create_engine call).

 for example, ë, the latin small letter e with diaeresis, is stored  
 in
 mysql hex as C3 AB; using the MySQldb client, this is exactly what i
 get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no
 further codecs work required. SA, on the other hand, hands me back  
 the
 utf-hex representation, '\xeb'.

 there must be some setting that i'm missing that'll give the
 appropriate utf-8 representation at the SA (api) level. any ideas,
 suggestions?

 thx

 yes, i could do  '\xeb'.encode('utf8) but it's not an option. we got
 too much data to deal with and MySQLdb is working perfectly well
 without the extra step. thx.
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: utf hex instead of utf-8 return

2008-12-07 Thread n00b

thanks!!
you just confirmed my empirical observations, which puts me very much
at ease :)
for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql
(community ed.)

thank again.

On Dec 7, 8:52 am, Michael Bayer [EMAIL PROTECTED] wrote:
 you should also be on MySQLdb 1.2.2.  Using the Unicode type in  
 conjunction with charset=utf8use_unicode=0 and always passing Python  
 unicode (u'') objects is the general recipe for unicode with MySQL.    
 All this means is that SQLA sends utf-8-encoded strings to MySQLdb,  
 MySQLdb does not try to encode them itself and makes MySQL aware the  
 data should be considered as utf-8.   I'm not sure what version of  
 MySQL you're on or how older versions of that might get in the way.

 On Dec 6, 2008, at 1:26 PM, n00b wrote:



  thanks for the quick reply. i kept trying with it and no have reached
  the utter state of confusion.
  the specification of Unicode versus String in the table def's coupled
  with actual str representation
  has my totally confused. here's a quick script, have a look at the
  mysql table itself to see character
  display:

  #!/usr/bin/env python
  # -*- coding: utf-8 -*-

  import os, sys
  import unicodedata

  from sqlalchemy import *
  from sqlalchemy.orm import *

  #set db
  import MySQLdb
  db = MySQLdb.connect(host='localhost', user='root', passwd='',
  db='xxx', use_unicode=True, charset='utf8')
  cur = db.cursor()
  cur.execute('SET NAMES utf8')
  cur.execute('SET CHARACTER SET utf8')
  cur.execute('SET character_set_connection=utf8')
  cur.execute('SET character_set_server=utf8')
  cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
  print cur.fetchall()

  utf_repr = '\xc3\xab'
  hex_repr = '\xeb'

  mysql_url = 'mysql://root:@localhost/xxx'
  connect_args = {'charset':'utf8', 'use_unicode':'0'}
  engine = create_engine(mysql_url, connect_args=connect_args)
  metadata = MetaData()

  test_table = Table('encoding_test', metadata,
     Column(u'id', Integer, primary_key=True),
     Column(u'unicode', Integer),
     Column(u'u_hex', Unicode(10)),
     Column(u'u_utf', Unicode(10)),
     Column(u'u_str', Unicode(10)),
     Column(u's_hex', String(10)),
     Column(u's_utf', String(10)),
     Column(u's_str', String(10))
  )

  class EncodingTest(object): pass

  mapper(EncodingTest, test_table)

  metadata.create_all(engine)
  Session = sessionmaker(bind=engine)

  session = Session()
  et = EncodingTest()
  et.unicode = 1
  et.u_str = u'ë'
  et.u_hex = u'\xeb'
  et.u_utf = u'\xc3\xab'
  et.s_str = u'ë'
  et.s_hex = u'\xeb'
  et.s_utf = u'\xc3\xab'
  session.add(et)
  session.commit()
  et = EncodingTest()
  et.unicode = 0
  et.u_str = 'ë'
  et.u_hex = '\xeb'
  et.u_utf = '\xc3\xab'
  et.s_str = 'ë'
  et.s_hex = '\xeb'
  et.s_utf = '\xc3\xab'
  session.add(et)
  session.commit()
  session.close()

  session = Session()
  results = session.query(EncodingTest).all()
  for result in results:
     print result.unicode
     print repr(result.u_hex), repr(result.u_utf), repr(result.u_str)
     print repr(result.s_hex), repr(result.s_utf), repr(result.s_str)
     print

  in addition, i don't seem to be able to run the mysql settings (# set
  db) from SA.
  any insights are greatly appreciated. btw, the use_unciode, either in
  MySQLdb or SA,
  doesn't seem to have any effect on results.

  thx

  On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote:
  I'm not sure of the mechanics of what you're experiencing, but make
  sure you use charset=utf8use_unicode=0 with MySQL.

  On Dec 5, 2008, at 4:17 PM, n00b wrote:

  greetings,

  SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the
  process driving me batty.  all the mysql setup is fine, the chars  
  look
  good and are umlauting to goethe's delight. moreover, insert and
  select are working perfectly with the MySQLdb api on three different
  *nix systems, two servers, ... it works.

  where things fall apart is on the retrieval side of SA; inserts are
  fine (using the config_args = {'charset':'utf8'} dict in the
  create_engine call).

  for example, ë, the latin small letter e with diaeresis, is stored  
  in
  mysql hex as C3 AB; using the MySQldb client, this is exactly what i
  get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no
  further codecs work required. SA, on the other hand, hands me back  
  the
  utf-hex representation, '\xeb'.

  there must be some setting that i'm missing that'll give the
  appropriate utf-8 representation at the SA (api) level. any ideas,
  suggestions?

  thx

  yes, i could do  '\xeb'.encode('utf8) but it's not an option. we got
  too much data to deal with and MySQLdb is working perfectly well
  without the extra step. thx.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more 

[sqlalchemy] Re: utf hex instead of utf-8 return

2008-12-07 Thread Michael Bayer
I'm not sure if that was sarcasm or not...if so, consider the time  
better spent analyzing the issue.   The attached test illustrates a  
round trip of unicode data containing multibyte codepoints in both  
directions using both a raw cursor as well as a SQLAlchemy engine.
Use this as a guide with regards to how to send and receive unicode  
data.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

# -*- coding: utf-8 -*-

import os, sys

from sqlalchemy import *
from sqlalchemy.orm import *

engine = create_engine('mysql://scott:[EMAIL PROTECTED]/test?use_unicode=0charset=utf8', echo=True)

m = MetaData()
test_table = Table('encoding_test', m,
   Column('data', Unicode(255))
)
m.drop_all(engine)
m.create_all(engine)

import MySQLdb
db = MySQLdb.connect(host='localhost', user='scott', passwd='tiger', db='test', use_unicode=True, charset='utf8')

thedata = uAlors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »

# insert unicode data with MySQLdb
cursor = db.cursor()
cursor.execute(insert into encoding_test values(%s), thedata)
cursor.close()
db.commit()

# insert unicode data with SQLAlchemy
engine.execute(test_table.insert(), data=thedata)

# retrieve both rows with MySQLdb
cursor = db.cursor()
cursor.execute(select data from encoding_test)
back_from_mysql = [x[0] for x in cursor.fetchall()]

# retrieve both with SQLAlchemy
back_from_sqla = [x[0] for x in engine.execute(test_table.select()).fetchall()]

# put all the strings in a set - they are all identical and it therefore has length one
assert len(set(back_from_mysql + back_from_sqla + [thedata])) == 1

# MySQL agrees that both rows are identical since COUNT DISTINCT returns one
cursor = db.cursor()
cursor.execute(select count(distinct data) from encoding_test)
assert cursor.fetchone()[0] == 1
cursor.close()

for x in [thedata] + back_from_mysql + back_from_sqla:
print x.encode('utf-8')


On Dec 7, 2008, at 12:00 PM, n00b wrote:


 thanks!!
 you just confirmed my empirical observations, which puts me very much
 at ease :)
 for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql
 (community ed.)

 thank again.

 On Dec 7, 8:52 am, Michael Bayer [EMAIL PROTECTED] wrote:
 you should also be on MySQLdb 1.2.2.  Using the Unicode type in
 conjunction with charset=utf8use_unicode=0 and always passing Python
 unicode (u'') objects is the general recipe for unicode with MySQL.
 All this means is that SQLA sends utf-8-encoded strings to MySQLdb,
 MySQLdb does not try to encode them itself and makes MySQL aware the
 data should be considered as utf-8.   I'm not sure what version of
 MySQL you're on or how older versions of that might get in the way.

 On Dec 6, 2008, at 1:26 PM, n00b wrote:



 thanks for the quick reply. i kept trying with it and no have  
 reached
 the utter state of confusion.
 the specification of Unicode versus String in the table def's  
 coupled
 with actual str representation
 has my totally confused. here's a quick script, have a look at the
 mysql table itself to see character
 display:

 #!/usr/bin/env python
 # -*- coding: utf-8 -*-

 import os, sys
 import unicodedata

 from sqlalchemy import *
 from sqlalchemy.orm import *

 #set db
 import MySQLdb
 db = MySQLdb.connect(host='localhost', user='root', passwd='',
 db='xxx', use_unicode=True, charset='utf8')
 cur = db.cursor()
 cur.execute('SET NAMES utf8')
 cur.execute('SET CHARACTER SET utf8')
 cur.execute('SET character_set_connection=utf8')
 cur.execute('SET character_set_server=utf8')
 cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
 print cur.fetchall()

 utf_repr = '\xc3\xab'
 hex_repr = '\xeb'

 mysql_url = 'mysql://root:@localhost/xxx'
 connect_args = {'charset':'utf8', 'use_unicode':'0'}
 engine = create_engine(mysql_url, connect_args=connect_args)
 metadata = MetaData()

 test_table = Table('encoding_test', metadata,
Column(u'id', Integer, primary_key=True),
Column(u'unicode', Integer),
Column(u'u_hex', Unicode(10)),
Column(u'u_utf', Unicode(10)),
Column(u'u_str', Unicode(10)),
Column(u's_hex', String(10)),
Column(u's_utf', String(10)),
Column(u's_str', String(10))
 )

 class EncodingTest(object): pass

 mapper(EncodingTest, test_table)

 metadata.create_all(engine)
 Session = sessionmaker(bind=engine)

 session = Session()
 et = EncodingTest()
 et.unicode = 1
 et.u_str = u'ë'
 et.u_hex = u'\xeb'
 et.u_utf = u'\xc3\xab'
 et.s_str = u'ë'
 et.s_hex = u'\xeb'
 et.s_utf = u'\xc3\xab'
 session.add(et)
 session.commit()
 et = EncodingTest()
 et.unicode = 0
 

[sqlalchemy] Re: utf hex instead of utf-8 return

2008-12-06 Thread n00b

thanks for the quick reply. i kept trying with it and no have reached
the utter state of confusion.
the specification of Unicode versus String in the table def's coupled
with actual str representation
has my totally confused. here's a quick script, have a look at the
mysql table itself to see character
display:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os, sys
import unicodedata

from sqlalchemy import *
from sqlalchemy.orm import *

#set db
import MySQLdb
db = MySQLdb.connect(host='localhost', user='root', passwd='',
db='xxx', use_unicode=True, charset='utf8')
cur = db.cursor()
cur.execute('SET NAMES utf8')
cur.execute('SET CHARACTER SET utf8')
cur.execute('SET character_set_connection=utf8')
cur.execute('SET character_set_server=utf8')
cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
print cur.fetchall()

utf_repr = '\xc3\xab'
hex_repr = '\xeb'

mysql_url = 'mysql://root:@localhost/xxx'
connect_args = {'charset':'utf8', 'use_unicode':'0'}
engine = create_engine(mysql_url, connect_args=connect_args)
metadata = MetaData()


test_table = Table('encoding_test', metadata,
Column(u'id', Integer, primary_key=True),
Column(u'unicode', Integer),
Column(u'u_hex', Unicode(10)),
Column(u'u_utf', Unicode(10)),
Column(u'u_str', Unicode(10)),
Column(u's_hex', String(10)),
Column(u's_utf', String(10)),
Column(u's_str', String(10))
)

class EncodingTest(object): pass

mapper(EncodingTest, test_table)

metadata.create_all(engine)
Session = sessionmaker(bind=engine)

session = Session()
et = EncodingTest()
et.unicode = 1
et.u_str = u'ë'
et.u_hex = u'\xeb'
et.u_utf = u'\xc3\xab'
et.s_str = u'ë'
et.s_hex = u'\xeb'
et.s_utf = u'\xc3\xab'
session.add(et)
session.commit()
et = EncodingTest()
et.unicode = 0
et.u_str = 'ë'
et.u_hex = '\xeb'
et.u_utf = '\xc3\xab'
et.s_str = 'ë'
et.s_hex = '\xeb'
et.s_utf = '\xc3\xab'
session.add(et)
session.commit()
session.close()

session = Session()
results = session.query(EncodingTest).all()
for result in results:
print result.unicode
print repr(result.u_hex), repr(result.u_utf), repr(result.u_str)
print repr(result.s_hex), repr(result.s_utf), repr(result.s_str)
print

in addition, i don't seem to be able to run the mysql settings (# set
db) from SA.
any insights are greatly appreciated. btw, the use_unciode, either in
MySQLdb or SA,
doesn't seem to have any effect on results.

thx

On Dec 5, 3:25 pm, Michael Bayer [EMAIL PROTECTED] wrote:
 I'm not sure of the mechanics of what you're experiencing, but make  
 sure you use charset=utf8use_unicode=0 with MySQL.

 On Dec 5, 2008, at 4:17 PM, n00b wrote:



  greetings,

  SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the
  process driving me batty.  all the mysql setup is fine, the chars look
  good and are umlauting to goethe's delight. moreover, insert and
  select are working perfectly with the MySQLdb api on three different
  *nix systems, two servers, ... it works.

  where things fall apart is on the retrieval side of SA; inserts are
  fine (using the config_args = {'charset':'utf8'} dict in the
  create_engine call).

  for example, ë, the latin small letter e with diaeresis, is stored in
  mysql hex as C3 AB; using the MySQldb client, this is exactly what i
  get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no
  further codecs work required. SA, on the other hand, hands me back the
  utf-hex representation, '\xeb'.

  there must be some setting that i'm missing that'll give the
  appropriate utf-8 representation at the SA (api) level. any ideas,
  suggestions?

  thx

  yes, i could do  '\xeb'.encode('utf8) but it's not an option. we got
  too much data to deal with and MySQLdb is working perfectly well
  without the extra step. thx.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: utf hex instead of utf-8 return

2008-12-05 Thread Michael Bayer

I'm not sure of the mechanics of what you're experiencing, but make  
sure you use charset=utf8use_unicode=0 with MySQL.

On Dec 5, 2008, at 4:17 PM, n00b wrote:


 greetings,

 SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the
 process driving me batty.  all the mysql setup is fine, the chars look
 good and are umlauting to goethe's delight. moreover, insert and
 select are working perfectly with the MySQLdb api on three different
 *nix systems, two servers, ... it works.

 where things fall apart is on the retrieval side of SA; inserts are
 fine (using the config_args = {'charset':'utf8'} dict in the
 create_engine call).

 for example, ë, the latin small letter e with diaeresis, is stored in
 mysql hex as C3 AB; using the MySQldb client, this is exactly what i
 get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no
 further codecs work required. SA, on the other hand, hands me back the
 utf-hex representation, '\xeb'.

 there must be some setting that i'm missing that'll give the
 appropriate utf-8 representation at the SA (api) level. any ideas,
 suggestions?

 thx

 yes, i could do  '\xeb'.encode('utf8) but it's not an option. we got
 too much data to deal with and MySQLdb is working perfectly well
 without the extra step. thx.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---