[sqlalchemy] Re: Multiple encodings in my database
Am Freitag, 27. Juni 2008 01:20 schrieb Michael Bayer: first of all, the stack trace suggests you have not set the encoding parameter on create_engine() as it's still using UTF-8. If you mean that a single database column may have different encodings in different rows, you want to do your own encoding/decoding with encoding errors set to something liberal like ignore. You also need to use your own custom type, as below: from sqlalchemy import types class MyEncodedType(types.TypeDecorator): impl = String def process_bind_param(self, value, dialect): assert isinstance(value, unicode) return value.encode('latin-1') def process_result_value(self, value, dialect): return value.decode('latin-1', 'ignore') then use MyEncodedType() as the type for all your columns which contain random encoding. No convert_unicode setting should be used on your engine as this type replaces that usage. Perfect, that works, thanks! Best Regards, Hermann -- [EMAIL PROTECTED] GPG key ID: 299893C7 (on keyservers) FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Multiple encodings in my database
If I am using the mysql-specific Column constructs with the charset option, will things be automatically encoded/ decoded by SA using that charset? Or is the charset option only used for Create Table? On Thu, Jun 26, 2008 at 7:20 PM, Michael Bayer [EMAIL PROTECTED] wrote: first of all, the stack trace suggests you have not set the encoding parameter on create_engine() as it's still using UTF-8. If you mean that a single database column may have different encodings in different rows, you want to do your own encoding/decoding with encoding errors set to something liberal like ignore. You also need to use your own custom type, as below: from sqlalchemy import types class MyEncodedType(types.TypeDecorator): impl = String def process_bind_param(self, value, dialect): assert isinstance(value, unicode) return value.encode('latin-1') def process_result_value(self, value, dialect): return value.decode('latin-1', 'ignore') then use MyEncodedType() as the type for all your columns which contain random encoding. No convert_unicode setting should be used on your engine as this type replaces that usage. On Jun 26, 2008, at 6:55 PM, Hermann Himmelbauer wrote: Hi, I'm trying to access a database via SA, which contains varchars with different, arbitrary encodings. Most of them are ascii or ISO-8859-2 encoded, however, many are windows-1252 encoded and there are also some other weird ones. In my engine setup, I set the encoding to latin1 and set convert_unicode to True, as I my application requires the database values in unicode format. If SA now tries to retrieve such a key, the following traceback occurs: -- File /home/dusty/prog/python_modules/sqlalchemy/engine/base.py, line 1605, in _get_col return processor(row[index]) File /home/dusty/prog/python_modules/sqlalchemy/databases/ maxdb.py, line 112, in process return value.decode(dialect.encoding) File /local/home/dusty/python/Python-2.4.4/lib/python2.4/encodings/ utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data - What can I do? It's not so important that all characters are correctly displayed, but it's vital that such improper encodings do not crash my application. Perhaps, there's some universal encoding that is able to deal with such problems? Best Regards, Hermann -- [EMAIL PROTECTED] GPG key ID: 299893C7 (on keyservers) FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Multiple encodings in my database
my understanding is that mysql works a little differently here. the column-level character sets are storage encodings only. all data to and from the database is encoded in the database connection's configured encoding. that can either be left as-is or converted to Unicode for you. Bobby Impollonia wrote: If I am using the mysql-specific Column constructs with the charset option, will things be automatically encoded/ decoded by SA using that charset? Or is the charset option only used for Create Table? On Thu, Jun 26, 2008 at 7:20 PM, Michael Bayer [EMAIL PROTECTED] wrote: first of all, the stack trace suggests you have not set the encoding parameter on create_engine() as it's still using UTF-8. If you mean that a single database column may have different encodings in different rows, you want to do your own encoding/decoding with encoding errors set to something liberal like ignore. You also need to use your own custom type, as below: from sqlalchemy import types class MyEncodedType(types.TypeDecorator): impl = String def process_bind_param(self, value, dialect): assert isinstance(value, unicode) return value.encode('latin-1') def process_result_value(self, value, dialect): return value.decode('latin-1', 'ignore') then use MyEncodedType() as the type for all your columns which contain random encoding. No convert_unicode setting should be used on your engine as this type replaces that usage. On Jun 26, 2008, at 6:55 PM, Hermann Himmelbauer wrote: Hi, I'm trying to access a database via SA, which contains varchars with different, arbitrary encodings. Most of them are ascii or ISO-8859-2 encoded, however, many are windows-1252 encoded and there are also some other weird ones. In my engine setup, I set the encoding to latin1 and set convert_unicode to True, as I my application requires the database values in unicode format. If SA now tries to retrieve such a key, the following traceback occurs: -- File /home/dusty/prog/python_modules/sqlalchemy/engine/base.py, line 1605, in _get_col return processor(row[index]) File /home/dusty/prog/python_modules/sqlalchemy/databases/ maxdb.py, line 112, in process return value.decode(dialect.encoding) File /local/home/dusty/python/Python-2.4.4/lib/python2.4/encodings/ utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data - What can I do? It's not so important that all characters are correctly displayed, but it's vital that such improper encodings do not crash my application. Perhaps, there's some universal encoding that is able to deal with such problems? Best Regards, Hermann -- [EMAIL PROTECTED] GPG key ID: 299893C7 (on keyservers) FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Multiple encodings in my database
first of all, the stack trace suggests you have not set the encoding parameter on create_engine() as it's still using UTF-8. If you mean that a single database column may have different encodings in different rows, you want to do your own encoding/decoding with encoding errors set to something liberal like ignore. You also need to use your own custom type, as below: from sqlalchemy import types class MyEncodedType(types.TypeDecorator): impl = String def process_bind_param(self, value, dialect): assert isinstance(value, unicode) return value.encode('latin-1') def process_result_value(self, value, dialect): return value.decode('latin-1', 'ignore') then use MyEncodedType() as the type for all your columns which contain random encoding. No convert_unicode setting should be used on your engine as this type replaces that usage. On Jun 26, 2008, at 6:55 PM, Hermann Himmelbauer wrote: Hi, I'm trying to access a database via SA, which contains varchars with different, arbitrary encodings. Most of them are ascii or ISO-8859-2 encoded, however, many are windows-1252 encoded and there are also some other weird ones. In my engine setup, I set the encoding to latin1 and set convert_unicode to True, as I my application requires the database values in unicode format. If SA now tries to retrieve such a key, the following traceback occurs: -- File /home/dusty/prog/python_modules/sqlalchemy/engine/base.py, line 1605, in _get_col return processor(row[index]) File /home/dusty/prog/python_modules/sqlalchemy/databases/ maxdb.py, line 112, in process return value.decode(dialect.encoding) File /local/home/dusty/python/Python-2.4.4/lib/python2.4/encodings/ utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data - What can I do? It's not so important that all characters are correctly displayed, but it's vital that such improper encodings do not crash my application. Perhaps, there's some universal encoding that is able to deal with such problems? Best Regards, Hermann -- [EMAIL PROTECTED] GPG key ID: 299893C7 (on keyservers) FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---