Re: [sqlalchemy] Char encoding..

2010-11-30 Thread Michael Bayer
Simple enough Blob is a binary type, not a character type.   You need to 
put binary data in there, so in python 2x that's str, ie encode it.   Direct 
Unicode support in sqla is via the string/Unicode types.

Sent from my iPhone

On Nov 29, 2010, at 8:43 PM, Warwick Prince warwi...@mushroomsys.com wrote:

 Hi Michael
 
 Thanks for your thoughts and comments to date.
 
 I can replicate the problem with ease, so perhaps this will help;
 
 # -*- coding: utf-8 -*-
 e = 
 create_engine('mysql+mysqlconnector://user:passw...@127.0.0.1/testdb?use_unicode=0',
  encoding='utf8', echo=False)
 m = MetaData(e)
 t = Table('test_table', m, autoload=True)
 #test_table is;
 Table('test_table', 
 MetaData(Engine(mysql+mysqlconnector://user:passw...@127.0.0.1/testdb?use_unicode=0)),
  Column(u'ID', INTEGER(display_width=11), table=test_table, 
 primary_key=True, nullable=False), Column(u'SourceType', VARCHAR(length=10), 
 table=test_table), Column(u'SourceID', VARCHAR(length=128), 
 table=test_table), Column(u'Date', DATE(), table=test_table), 
 Column(u'Time', TIME(timezone=False), table=test_table), Column(u'UserID', 
 VARCHAR(length=10), table=test_table), Column(u'Note', BLOB(length=None), 
 table=test_table), Column(u'Division', VARCHAR(length=3), 
 table=test_table), schema=None)
 
 # Set some row  data in a dict
 columns = dict(ID=1, SourceType='TEST', SourceID='WAP', Note=u'Aligot\xe9')  
 # The Note column is set to a unicode value for a French word with accents. 
 Column type is BLOB
 
 # insert it
 t.insert(values=columns).execute()
 
 get this;
 Traceback (most recent call last):
  File interactive input, line 1, in module
  File C:\Python26\lib\site-packages\sqlalchemy\sql\expression.py, line 
 1217, in execute
return e._execute_clauseelement(self, multiparams, params)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1722, 
 in _execute_clauseelement
return connection._execute_clauseelement(elem, multiparams, params)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1235, 
 in _execute_clauseelement
parameters=params
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1343, 
 in __create_execution_context
connection=self, **kwargs)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\default.py, line 384, 
 in __init__
self.parameters = self.__convert_compiled_params(self.compiled_parameters)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\default.py, line 513, 
 in __convert_compiled_params
param[key] = processors[key](compiled_params[key])
  File C:\Python26\lib\site-packages\sqlalchemy\types.py, line 1209, in 
 process
return DBAPIBinary(value)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 
 6: ordinal not in range(128)
 
 It appears to be in the processing of the Binary type that something is going 
 wrong. 
 
 Further testing showed something interesting.  I changed around the data 
 above and set the unicode value to the VARCHAR column SourceID.  That 
 worked..  Therefore, the issue is related to storing a unicode value into a 
 BLOB.  Surely I can store anything in a BLOB, or am I missing something?
 
 Cheers
 Warwick
 
 Warwick Prince 
 Managing Director 
 mobile: +61 411 026 992 
 skype: warwickprince  
 
 phone: +61 7 3102 3730 
 fax:  +61 7 3319 6734 
 web: www.mushroomsys.com 
 
 On 30/11/2010, at 1:29 AM, Michael Bayer wrote:
 
 we've got unicode round trips down very well for years now with plenty of 
 tests, so would need a specific series of steps to reproduce what you're 
 doing here.  Note that the recommended connect string for MySQL + Mysqldb 
 looks like mysql://scott:ti...@localhost/test?charset=utf8use_unicode=0 .
 
 On Nov 29, 2010, at 2:37 AM, Warwick Prince wrote:
 
 Hi All
 
 I thought I had Character Encoding licked, but I've hit something I can't 
 work through.  Any help appreciated.
 
 I have a legacy non SQL database that I read legacy data from (using cool 
 Python code that emulates the old ISDB binary comms) and it reads a str 
 which has Foreign language chars in it.  (French for example).  
 
 So, firstly, I have   myStr = ''Aligot\xc3\xa9  which when printed is 
 Aligoté.   So far so good.
 
 I then convert that to unicode by  myUnicode = unicode(myStr, 'utf-8', 
 errors='ignore') and get u'Aligot\xe9'.   This printed is also Aligoté, 
 therefore all is good.
 
 I have a MySQL database, InnoDB table, charset utf-8.
 
 I set up my values in a dict called setValues with all the columns and 
 their respective unicode'd values ready to go
 
 I then do a table.insert(values=setValues).execute() and get this error.
 
 Traceback (most recent call last):
 File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 148, in SYNC_IT
  SyncFunction(ceDB, session, meta)
 File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 840, in SYNC_VarietiesOUT
  

Re: [sqlalchemy] Char encoding..

2010-11-29 Thread Michael Bayer
we've got unicode round trips down very well for years now with plenty of 
tests, so would need a specific series of steps to reproduce what you're doing 
here.  Note that the recommended connect string for MySQL + Mysqldb looks like 
mysql://scott:ti...@localhost/test?charset=utf8use_unicode=0 .

On Nov 29, 2010, at 2:37 AM, Warwick Prince wrote:

 Hi All
 
 I thought I had Character Encoding licked, but I've hit something I can't 
 work through.  Any help appreciated.
 
 I have a legacy non SQL database that I read legacy data from (using cool 
 Python code that emulates the old ISDB binary comms) and it reads a str which 
 has Foreign language chars in it.  (French for example).  
 
 So, firstly, I have   myStr = ''Aligot\xc3\xa9  which when printed is 
 Aligoté.   So far so good.
 
 I then convert that to unicode by  myUnicode = unicode(myStr, 'utf-8', 
 errors='ignore') and get u'Aligot\xe9'.   This printed is also Aligoté, 
 therefore all is good.
 
 I have a MySQL database, InnoDB table, charset utf-8.
 
 I set up my values in a dict called setValues with all the columns and their 
 respective unicode'd values ready to go
 
 I then do a table.insert(values=setValues).execute() and get this error.
 
 Traceback (most recent call last):
  File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 148, in SYNC_IT
SyncFunction(ceDB, session, meta)
  File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 840, in SYNC_VarietiesOUT
DAPDB_SetColumns(meta, 'varieties', 
 {'DescriptiveText':self.CEUnicode(tVarieties.ceVarietyText.value), 
 'FlavourText':self.CEUnicode(tVarieties.ceFlavourText.value), 
 'ImageURL':imageURL}, Variety=variety)
  File C:\Python26\lib\DAPDBHelpers.py, line 323, in DAPDB_SetColumns
table.insert(values=setColumns).execute()
  File C:\Python26\lib\site-packages\sqlalchemy\sql\expression.py, line 
 1217, in execute
return e._execute_clauseelement(self, multiparams, params)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1722, 
 in _execute_clauseelement
return connection._execute_clauseelement(elem, multiparams, params)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 
 4: ordinal not in range(128)
 
 I know what the error means, I just don't know why I'm getting it.  The 
 offending u'\xe9' character is in the DescriptiveText column.  
 DAPDB_SetColumns is a simple wrapper around an update/insert that builds up 
 the table.insert(values=setColumns).execute() you see.
 
 This is what setColumns looks like;
 {'ImageURL': '', 'DescriptiveText': u'Carm\xe9n\xe8re is a red wine grape 
 variety originally from Bordeaux, France. Having lost favor in France, the 
 largest area planted with this variety is in now Chile. It only survived, due 
 to growers believing it was Merlot. The vines were imported into Chil', 
 'FlavourText': u'Carmenere is a full bodied red wine with approachable 
 tannins and a combination of sweet berry fruit, savory pepper, smoke, tar, 
 with a slight leafy character.\n', 'Variety': u'Carmenere'}
 
 'Variety' is the primary key BTW.
 
 What gives?  It feels like SQLA is encoding/decoding somewhere it shouldn't..
 
 Cheers
 Warwick
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalch...@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.
 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Char encoding..

2010-11-29 Thread Warwick Prince
Hi Michael

Thanks for your thoughts and comments to date.

I can replicate the problem with ease, so perhaps this will help;

# -*- coding: utf-8 -*-
e = 
create_engine('mysql+mysqlconnector://user:passw...@127.0.0.1/testdb?use_unicode=0',
 encoding='utf8', echo=False)
m = MetaData(e)
t = Table('test_table', m, autoload=True)
#test_table is;
Table('test_table', 
MetaData(Engine(mysql+mysqlconnector://user:passw...@127.0.0.1/testdb?use_unicode=0)),
 Column(u'ID', INTEGER(display_width=11), table=test_table, primary_key=True, 
nullable=False), Column(u'SourceType', VARCHAR(length=10), table=test_table), 
Column(u'SourceID', VARCHAR(length=128), table=test_table), Column(u'Date', 
DATE(), table=test_table), Column(u'Time', TIME(timezone=False), 
table=test_table), Column(u'UserID', VARCHAR(length=10), table=test_table), 
Column(u'Note', BLOB(length=None), table=test_table), Column(u'Division', 
VARCHAR(length=3), table=test_table), schema=None)

# Set some row  data in a dict
columns = dict(ID=1, SourceType='TEST', SourceID='WAP', Note=u'Aligot\xe9')  # 
The Note column is set to a unicode value for a French word with accents. 
Column type is BLOB

# insert it
t.insert(values=columns).execute()

get this;
Traceback (most recent call last):
  File interactive input, line 1, in module
  File C:\Python26\lib\site-packages\sqlalchemy\sql\expression.py, line 1217, 
in execute
return e._execute_clauseelement(self, multiparams, params)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1722, in 
_execute_clauseelement
return connection._execute_clauseelement(elem, multiparams, params)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1235, in 
_execute_clauseelement
parameters=params
  File C:\Python26\lib\site-packages\sqlalchemy\engine\base.py, line 1343, in 
__create_execution_context
connection=self, **kwargs)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\default.py, line 384, 
in __init__
self.parameters = self.__convert_compiled_params(self.compiled_parameters)
  File C:\Python26\lib\site-packages\sqlalchemy\engine\default.py, line 513, 
in __convert_compiled_params
param[key] = processors[key](compiled_params[key])
  File C:\Python26\lib\site-packages\sqlalchemy\types.py, line 1209, in 
process
return DBAPIBinary(value)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: 
ordinal not in range(128)

It appears to be in the processing of the Binary type that something is going 
wrong. 

Further testing showed something interesting.  I changed around the data above 
and set the unicode value to the VARCHAR column SourceID.  That worked..  
Therefore, the issue is related to storing a unicode value into a BLOB.  Surely 
I can store anything in a BLOB, or am I missing something?

Cheers
Warwick

Warwick Prince 
Managing Director 
mobile: +61 411 026 992 
skype: warwickprince  
 
phone: +61 7 3102 3730 
fax:  +61 7 3319 6734 
web: www.mushroomsys.com 

On 30/11/2010, at 1:29 AM, Michael Bayer wrote:

 we've got unicode round trips down very well for years now with plenty of 
 tests, so would need a specific series of steps to reproduce what you're 
 doing here.  Note that the recommended connect string for MySQL + Mysqldb 
 looks like mysql://scott:ti...@localhost/test?charset=utf8use_unicode=0 .
 
 On Nov 29, 2010, at 2:37 AM, Warwick Prince wrote:
 
 Hi All
 
 I thought I had Character Encoding licked, but I've hit something I can't 
 work through.  Any help appreciated.
 
 I have a legacy non SQL database that I read legacy data from (using cool 
 Python code that emulates the old ISDB binary comms) and it reads a str 
 which has Foreign language chars in it.  (French for example).  
 
 So, firstly, I have   myStr = ''Aligot\xc3\xa9  which when printed is 
 Aligoté.   So far so good.
 
 I then convert that to unicode by  myUnicode = unicode(myStr, 'utf-8', 
 errors='ignore') and get u'Aligot\xe9'.   This printed is also Aligoté, 
 therefore all is good.
 
 I have a MySQL database, InnoDB table, charset utf-8.
 
 I set up my values in a dict called setValues with all the columns and their 
 respective unicode'd values ready to go
 
 I then do a table.insert(values=setValues).execute() and get this error.
 
 Traceback (most recent call last):
 File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 148, in SYNC_IT
   SyncFunction(ceDB, session, meta)
 File C:\Documents and Settings\wprince\Desktop\PY CODE 
 DEVELOPMENT\CESyncSQL\TEST_Sync.py, line 840, in SYNC_VarietiesOUT
   DAPDB_SetColumns(meta, 'varieties', 
 {'DescriptiveText':self.CEUnicode(tVarieties.ceVarietyText.value), 
 'FlavourText':self.CEUnicode(tVarieties.ceFlavourText.value), 
 'ImageURL':imageURL}, Variety=variety)
 File C:\Python26\lib\DAPDBHelpers.py, line 323, in DAPDB_SetColumns
   table.insert(values=setColumns).execute()
 File C:\Python26\lib\site-packages\sqlalchemy\sql\expression.py, line