Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

2010-11-18 Thread Guilherme Menezes
An update:

It seems that cx_Oracle always sends data in UTF-16 if WITH_UNICODE is
unset and unicode() objects are passed.

http://sourceforge.net/mailarchive/message.php?msg_name=AANLkTinTVdfPCn15HgXL6zo7cvRdSoE1t0_Z4eqXTGmQ%40mail.gmail.com

Regards,
Guilherme.

On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:

 Hi,

 I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
 cxOracle 5.0.4 to access an Oracle 11g database.

 I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
 Oracle is correctly configured to accept Unicode.

 First, I compiled cxOracle without the WITH_UNICODE flag (as
 recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
 unicode() objects. Everything worked without errors or warnings.

 However, sometimes Oracle would complain that the string I was trying
 to insert into a VARCHAR2 field was too big ( 4000), even when the
 string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
 (before SqlAlchemy). I used a sniffer to verify that the Oracle client
 was sending two bytes for each character (even the ASCII ones),
 instead of sending two bytes only for special characters.

 I repeated this insertion experiment using cx_Oracle directly and
 passing unicode() objects to it. Same thing: no errors or warnings,
 and again the sniffer shown that the Oracle client was sending two
 bytes per character. It seemed to me that cx_Oracle does not really
 supports unicode() objects; it accepts them but it does not encode()
 them to the correct encoding (as set in NLS_LANG variable).

 However, when reading data from Oracle everything worked as expected
 (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
 Oracle itself).

 I tried two approaches to make the Oracle client send data in UTF-8
 (not UTF-16):

 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
 in a warning:
 * UnicodeWarning: Unicode equal comparison failed to convert both
 arguments to Unicode - interpreting them as being unequal *
 It worked, but it did not seem right, since I was receiving unicode()
 objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
 know how things work inside SqlAlchemy, so I tried something else.

 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
 (utf-8 was sent to Oracle), but another warning was issued:
 * SAWarning: cx_Oracle is compiled under Python 2.xx using the
 WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
 which is in no way necessary for full support of Unicode. Otherwise,
 all string-holding bind parameters must be explicitly typed using
 SQLAlchemy's String type or one of its subtypes,or otherwise be passed
 as Python unicode.  Plain Python strings passed as bind parameters
 will be silently corrupted by cx_Oracle. *

 Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
 works, but this warning is worrying me. I tried to pass a str() object
 in UTF-8 to SqlAlchemy, and the execution halted with an encoding
 error (UnicodeDecodeError). In other words, cx_Oracle did not silently
 corrupted the data (at least when special characters were sent).

 I would like to know if anyone has a better (safer) solution to my
 problem. Or am I safe enough by using the WITH_UNICODE flag and
 passing only unicode() objects?


 Regarding the passing of utf-8 vs. utf-16 and the differences in WITH_UNICODE 
 mode, you should really email the cx_oracle list about that as WITH_UNICODE 
 to my knowledge only has to do with the Python interpretation of arguments, 
 not its interaction with OCI.   If you email their list, make sure all code 
 examples are purely derived from cx_oracle, to eliminate any doubt that each 
 behavior is one of cx_oracle directly.  WITH_UNICODE was not intended for 
 general use in Python 2.xx.   if your whole application really works with it, 
 then there's no reason not to use it, its just that you'll always have to 
 ensure that no non-Python unicode strings ever get sent to cx_oracle which 
 can be fairly tedious.   It sounds like a bug that cx_oracle would be 
 expanding into a two-byte-per-character stream like that.

 Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the 
 key to cx_oracle's behavior here.   We call setinputsizes for every Oracle 
 cursor but we currently exclude the string types from that list as it had 
 some unwanted side effects.

 Also note that VARCHAR2(4000) is measuring itself in bytes, not characters.  
 For true Unicode support Oracle provides the NVARCHAR2 type, where the 
 length specifies the number of characters, instead of bytes.   Recent 
 versions of Oracle also support the form VARCHAR2(4000 CHAR) which will 
 similarly measure the column in terms of characters based on the databases 
 encoding instead of bytes.    Its worth investigating if using NVARCHAR2 
 causes cx_oracle, or 

Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

2010-11-18 Thread Michael Bayer
SQLAlchemy can do the encoding for you as Anthony suggests, in this case if you 
use the Unicode() type with the flag convert_unicode='force'.   Unfortunately 
the 'force' value isn't accepted at the create_engine() level (thought it was, 
apparently not).Its likely that WITH_UNICODE is the easier route for you at 
the moment.


On Nov 18, 2010, at 11:31 AM, Guilherme Menezes wrote:

 An update:
 
 It seems that cx_Oracle always sends data in UTF-16 if WITH_UNICODE is
 unset and unicode() objects are passed.
 
 http://sourceforge.net/mailarchive/message.php?msg_name=AANLkTinTVdfPCn15HgXL6zo7cvRdSoE1t0_Z4eqXTGmQ%40mail.gmail.com
 
 Regards,
 Guilherme.
 
 On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:
 
 Hi,
 
 I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
 cxOracle 5.0.4 to access an Oracle 11g database.
 
 I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
 Oracle is correctly configured to accept Unicode.
 
 First, I compiled cxOracle without the WITH_UNICODE flag (as
 recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
 unicode() objects. Everything worked without errors or warnings.
 
 However, sometimes Oracle would complain that the string I was trying
 to insert into a VARCHAR2 field was too big ( 4000), even when the
 string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
 (before SqlAlchemy). I used a sniffer to verify that the Oracle client
 was sending two bytes for each character (even the ASCII ones),
 instead of sending two bytes only for special characters.
 
 I repeated this insertion experiment using cx_Oracle directly and
 passing unicode() objects to it. Same thing: no errors or warnings,
 and again the sniffer shown that the Oracle client was sending two
 bytes per character. It seemed to me that cx_Oracle does not really
 supports unicode() objects; it accepts them but it does not encode()
 them to the correct encoding (as set in NLS_LANG variable).
 
 However, when reading data from Oracle everything worked as expected
 (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
 Oracle itself).
 
 I tried two approaches to make the Oracle client send data in UTF-8
 (not UTF-16):
 
 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
 in a warning:
 * UnicodeWarning: Unicode equal comparison failed to convert both
 arguments to Unicode - interpreting them as being unequal *
 It worked, but it did not seem right, since I was receiving unicode()
 objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
 know how things work inside SqlAlchemy, so I tried something else.
 
 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
 (utf-8 was sent to Oracle), but another warning was issued:
 * SAWarning: cx_Oracle is compiled under Python 2.xx using the
 WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
 which is in no way necessary for full support of Unicode. Otherwise,
 all string-holding bind parameters must be explicitly typed using
 SQLAlchemy's String type or one of its subtypes,or otherwise be passed
 as Python unicode.  Plain Python strings passed as bind parameters
 will be silently corrupted by cx_Oracle. *
 
 Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
 works, but this warning is worrying me. I tried to pass a str() object
 in UTF-8 to SqlAlchemy, and the execution halted with an encoding
 error (UnicodeDecodeError). In other words, cx_Oracle did not silently
 corrupted the data (at least when special characters were sent).
 
 I would like to know if anyone has a better (safer) solution to my
 problem. Or am I safe enough by using the WITH_UNICODE flag and
 passing only unicode() objects?
 
 
 Regarding the passing of utf-8 vs. utf-16 and the differences in 
 WITH_UNICODE mode, you should really email the cx_oracle list about that as 
 WITH_UNICODE to my knowledge only has to do with the Python interpretation 
 of arguments, not its interaction with OCI.   If you email their list, make 
 sure all code examples are purely derived from cx_oracle, to eliminate any 
 doubt that each behavior is one of cx_oracle directly.  WITH_UNICODE was not 
 intended for general use in Python 2.xx.   if your whole application really 
 works with it, then there's no reason not to use it, its just that you'll 
 always have to ensure that no non-Python unicode strings ever get sent to 
 cx_oracle which can be fairly tedious.   It sounds like a bug that cx_oracle 
 would be expanding into a two-byte-per-character stream like that.
 
 Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the 
 key to cx_oracle's behavior here.   We call setinputsizes for every Oracle 
 cursor but we currently exclude the string types from that list as it had 
 some unwanted side effects.
 
 Also note that VARCHAR2(4000) is measuring itself 

[sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

2010-11-17 Thread Guilherme Menezes
Hi,

I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
cxOracle 5.0.4 to access an Oracle 11g database.

I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
Oracle is correctly configured to accept Unicode.

First, I compiled cxOracle without the WITH_UNICODE flag (as
recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
unicode() objects. Everything worked without errors or warnings.

However, sometimes Oracle would complain that the string I was trying
to insert into a VARCHAR2 field was too big ( 4000), even when the
string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
(before SqlAlchemy). I used a sniffer to verify that the Oracle client
was sending two bytes for each character (even the ASCII ones),
instead of sending two bytes only for special characters.

I repeated this insertion experiment using cx_Oracle directly and
passing unicode() objects to it. Same thing: no errors or warnings,
and again the sniffer shown that the Oracle client was sending two
bytes per character. It seemed to me that cx_Oracle does not really
supports unicode() objects; it accepts them but it does not encode()
them to the correct encoding (as set in NLS_LANG variable).

However, when reading data from Oracle everything worked as expected
(i.e. someone was converting the UTF-16(?) string to UTF-8, probably
Oracle itself).

I tried two approaches to make the Oracle client send data in UTF-8
(not UTF-16):

1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
in a warning:
* UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal *
It worked, but it did not seem right, since I was receiving unicode()
objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
know how things work inside SqlAlchemy, so I tried something else.

2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
(utf-8 was sent to Oracle), but another warning was issued:
* SAWarning: cx_Oracle is compiled under Python 2.xx using the
WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
which is in no way necessary for full support of Unicode. Otherwise,
all string-holding bind parameters must be explicitly typed using
SQLAlchemy's String type or one of its subtypes,or otherwise be passed
as Python unicode.  Plain Python strings passed as bind parameters
will be silently corrupted by cx_Oracle. *

Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
works, but this warning is worrying me. I tried to pass a str() object
in UTF-8 to SqlAlchemy, and the execution halted with an encoding
error (UnicodeDecodeError). In other words, cx_Oracle did not silently
corrupted the data (at least when special characters were sent).

I would like to know if anyone has a better (safer) solution to my
problem. Or am I safe enough by using the WITH_UNICODE flag and
passing only unicode() objects?

Thank you in advance.

Guilherme.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

2010-11-17 Thread Michael Bayer

On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:

 Hi,
 
 I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
 cxOracle 5.0.4 to access an Oracle 11g database.
 
 I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
 Oracle is correctly configured to accept Unicode.
 
 First, I compiled cxOracle without the WITH_UNICODE flag (as
 recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
 unicode() objects. Everything worked without errors or warnings.
 
 However, sometimes Oracle would complain that the string I was trying
 to insert into a VARCHAR2 field was too big ( 4000), even when the
 string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
 (before SqlAlchemy). I used a sniffer to verify that the Oracle client
 was sending two bytes for each character (even the ASCII ones),
 instead of sending two bytes only for special characters.
 
 I repeated this insertion experiment using cx_Oracle directly and
 passing unicode() objects to it. Same thing: no errors or warnings,
 and again the sniffer shown that the Oracle client was sending two
 bytes per character. It seemed to me that cx_Oracle does not really
 supports unicode() objects; it accepts them but it does not encode()
 them to the correct encoding (as set in NLS_LANG variable).
 
 However, when reading data from Oracle everything worked as expected
 (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
 Oracle itself).
 
 I tried two approaches to make the Oracle client send data in UTF-8
 (not UTF-16):
 
 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
 in a warning:
 * UnicodeWarning: Unicode equal comparison failed to convert both
 arguments to Unicode - interpreting them as being unequal *
 It worked, but it did not seem right, since I was receiving unicode()
 objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
 know how things work inside SqlAlchemy, so I tried something else.
 
 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
 (utf-8 was sent to Oracle), but another warning was issued:
 * SAWarning: cx_Oracle is compiled under Python 2.xx using the
 WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
 which is in no way necessary for full support of Unicode. Otherwise,
 all string-holding bind parameters must be explicitly typed using
 SQLAlchemy's String type or one of its subtypes,or otherwise be passed
 as Python unicode.  Plain Python strings passed as bind parameters
 will be silently corrupted by cx_Oracle. *
 
 Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
 works, but this warning is worrying me. I tried to pass a str() object
 in UTF-8 to SqlAlchemy, and the execution halted with an encoding
 error (UnicodeDecodeError). In other words, cx_Oracle did not silently
 corrupted the data (at least when special characters were sent).
 
 I would like to know if anyone has a better (safer) solution to my
 problem. Or am I safe enough by using the WITH_UNICODE flag and
 passing only unicode() objects?


Regarding the passing of utf-8 vs. utf-16 and the differences in WITH_UNICODE 
mode, you should really email the cx_oracle list about that as WITH_UNICODE to 
my knowledge only has to do with the Python interpretation of arguments, not 
its interaction with OCI.   If you email their list, make sure all code 
examples are purely derived from cx_oracle, to eliminate any doubt that each 
behavior is one of cx_oracle directly.  WITH_UNICODE was not intended for 
general use in Python 2.xx.   if your whole application really works with it, 
then there's no reason not to use it, its just that you'll always have to 
ensure that no non-Python unicode strings ever get sent to cx_oracle which can 
be fairly tedious.   It sounds like a bug that cx_oracle would be expanding 
into a two-byte-per-character stream like that.

Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the key 
to cx_oracle's behavior here.   We call setinputsizes for every Oracle cursor 
but we currently exclude the string types from that list as it had some 
unwanted side effects.

Also note that VARCHAR2(4000) is measuring itself in bytes, not characters.  
For true Unicode support Oracle provides the NVARCHAR2 type, where the length 
specifies the number of characters, instead of bytes.   Recent versions of 
Oracle also support the form VARCHAR2(4000 CHAR) which will similarly measure 
the column in terms of characters based on the databases encoding instead of 
bytes.Its worth investigating if using NVARCHAR2 causes cx_oracle, or OCI, 
to change its behavior.


 
 Thank you in advance.
 
 Guilherme.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalch...@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit 

Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

2010-11-17 Thread Guilherme Menezes
Thank you.

I will report the problem to the cx_Oracle list and see what they have to say.

Regards,

Guilherme.


On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:

 Hi,

 I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
 cxOracle 5.0.4 to access an Oracle 11g database.

 I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
 Oracle is correctly configured to accept Unicode.

 First, I compiled cxOracle without the WITH_UNICODE flag (as
 recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
 unicode() objects. Everything worked without errors or warnings.

 However, sometimes Oracle would complain that the string I was trying
 to insert into a VARCHAR2 field was too big ( 4000), even when the
 string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
 (before SqlAlchemy). I used a sniffer to verify that the Oracle client
 was sending two bytes for each character (even the ASCII ones),
 instead of sending two bytes only for special characters.

 I repeated this insertion experiment using cx_Oracle directly and
 passing unicode() objects to it. Same thing: no errors or warnings,
 and again the sniffer shown that the Oracle client was sending two
 bytes per character. It seemed to me that cx_Oracle does not really
 supports unicode() objects; it accepts them but it does not encode()
 them to the correct encoding (as set in NLS_LANG variable).

 However, when reading data from Oracle everything worked as expected
 (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
 Oracle itself).

 I tried two approaches to make the Oracle client send data in UTF-8
 (not UTF-16):

 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
 in a warning:
 * UnicodeWarning: Unicode equal comparison failed to convert both
 arguments to Unicode - interpreting them as being unequal *
 It worked, but it did not seem right, since I was receiving unicode()
 objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
 know how things work inside SqlAlchemy, so I tried something else.

 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
 (utf-8 was sent to Oracle), but another warning was issued:
 * SAWarning: cx_Oracle is compiled under Python 2.xx using the
 WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
 which is in no way necessary for full support of Unicode. Otherwise,
 all string-holding bind parameters must be explicitly typed using
 SQLAlchemy's String type or one of its subtypes,or otherwise be passed
 as Python unicode.  Plain Python strings passed as bind parameters
 will be silently corrupted by cx_Oracle. *

 Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
 works, but this warning is worrying me. I tried to pass a str() object
 in UTF-8 to SqlAlchemy, and the execution halted with an encoding
 error (UnicodeDecodeError). In other words, cx_Oracle did not silently
 corrupted the data (at least when special characters were sent).

 I would like to know if anyone has a better (safer) solution to my
 problem. Or am I safe enough by using the WITH_UNICODE flag and
 passing only unicode() objects?


 Regarding the passing of utf-8 vs. utf-16 and the differences in WITH_UNICODE 
 mode, you should really email the cx_oracle list about that as WITH_UNICODE 
 to my knowledge only has to do with the Python interpretation of arguments, 
 not its interaction with OCI.   If you email their list, make sure all code 
 examples are purely derived from cx_oracle, to eliminate any doubt that each 
 behavior is one of cx_oracle directly.  WITH_UNICODE was not intended for 
 general use in Python 2.xx.   if your whole application really works with it, 
 then there's no reason not to use it, its just that you'll always have to 
 ensure that no non-Python unicode strings ever get sent to cx_oracle which 
 can be fairly tedious.   It sounds like a bug that cx_oracle would be 
 expanding into a two-byte-per-character stream like that.

 Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the 
 key to cx_oracle's behavior here.   We call setinputsizes for every Oracle 
 cursor but we currently exclude the string types from that list as it had 
 some unwanted side effects.

 Also note that VARCHAR2(4000) is measuring itself in bytes, not characters.  
 For true Unicode support Oracle provides the NVARCHAR2 type, where the 
 length specifies the number of characters, instead of bytes.   Recent 
 versions of Oracle also support the form VARCHAR2(4000 CHAR) which will 
 similarly measure the column in terms of characters based on the databases 
 encoding instead of bytes.    Its worth investigating if using NVARCHAR2 
 causes cx_oracle, or OCI, to change its behavior.



 Thank you in advance.

 Guilherme.

 --
 You received this message because you are subscribed to the Google Groups