Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-10 Thread Jani Tiainen

10.1.2013 8:59, Ian Kelly kirjoitti:

On Wed, Jan 9, 2013 at 11:40 PM, Jani Tiainen  wrote:

If we just force using force_unicode everything works except in older
versions of cx_Oracle (our server had 5.0.4 or something) connection strings
can't be unicode for some reason.


Sure, that's why the check exists in the first place.  Prior to 5.1
cx_Oracle could be built either with Unicode or without.  If the
former, it would accept only unicode strings and would raise an
exception on byte strings.  If the latter, it would be exactly the
opposite.

Does it work for you using force_bytes with 5.0.4?



That's on my production server that runs 1.3.x version. smart_str (which 
detection selects) does not work.


using force_unicode works (except for connection string).

Also depending on what OCI client 10.2.0.5 or instant client 11.2 is 
used when compiling cx_Oracle causes variation. 10.2.0.5 doesn't work 
with smart_str while 11.2 does work.


Both can take plain unicode (u'') when using 
just cx_Oracle commands without any problems.


Note:

If I add manually some unicode to database Django can read it without 
any problems.


--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Ian Kelly
On Wed, Jan 9, 2013 at 11:40 PM, Jani Tiainen  wrote:
> If we just force using force_unicode everything works except in older
> versions of cx_Oracle (our server had 5.0.4 or something) connection strings
> can't be unicode for some reason.

Sure, that's why the check exists in the first place.  Prior to 5.1
cx_Oracle could be built either with Unicode or without.  If the
former, it would accept only unicode strings and would raise an
exception on byte strings.  If the latter, it would be exactly the
opposite.

Does it work for you using force_bytes with 5.0.4?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Jani Tiainen

9.1.2013 19:21, Ian Kelly kirjoitti:

On Wed, Jan 9, 2013 at 3:55 AM, Jani Tiainen  wrote:

Server is running Oracle Database 10g Release 10.2.0.5.0 - 64bit Production.
(EE edition)

and charset info:
NLS_CHARACTERSETWE8ISO8859P1
NLS_NCHAR_CHARACTERSET  AL16UTF16


Sorry, I meant your web server setup.



Windows 7, development server.

Staging server Ubuntu something (propably 10.04 LTS) 64bit

And symptoms were consistent. For some reason Django does something bad 
when it uses smart_str (and whatever that is in 1.5).


If we just force using force_unicode everything works except in older 
versions of cx_Oracle (our server had 5.0.4 or something) connection 
strings can't be unicode for some reason.


--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Ian Kelly
On Wed, Jan 9, 2013 at 3:55 AM, Jani Tiainen  wrote:
> Server is running Oracle Database 10g Release 10.2.0.5.0 - 64bit Production.
> (EE edition)
>
> and charset info:
> NLS_CHARACTERSETWE8ISO8859P1
> NLS_NCHAR_CHARACTERSET  AL16UTF16

Sorry, I meant your web server setup.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Jani Tiainen

9.1.2013 12:28, Ian kirjoitti:

On Wednesday, January 9, 2013 12:38:28 AM UTC-7, Jani Tiainen wrote:

Tested against latest master. Same behaviour.

In Oracle backend base.py is following piece of code:

# Check whether cx_Oracle was compiled with the WITH_UNICODE option.
This will
# also be True in Python 3.0.
if int(Database.version.split('.', 1)[0]) >= 5 and not
hasattr(Database,
'UNICODE'):
  convert_unicode = force_text
else:
  convert_unicode = force_bytes

Which was added in
>

Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE
definition.


That sounds correct.  The cx_Oracle.UNICODE type constant is present
when cx_Oracle is compiled *without* the WITH_UNICODE option (which no
longer exists in 5.1 anyway).

And Django uses smart_str / force_bytes.

If I remove that and use convert_unicode as force_text / force_unicode
everything works as expected.


Strange, in 5.1 it shouldn't make any difference which is used, as long
as your NLS_LANG is getting set properly in the backend.  What is your
server setup?  It seems that sometimes that can get interfered with if
you have other services using Oracle in the same process.  It shouldn't
hurt anything though for us to do an additional check for cx_Oracle 5.1+
and always use force_text in that case.



Server is running Oracle Database 10g Release 10.2.0.5.0 - 64bit 
Production. (EE edition)


and charset info:
NLS_CHARACTERSETWE8ISO8859P1
NLS_NCHAR_CHARACTERSET  AL16UTF16

When cx_Oracle (Version 5.1.2) is compiled against 10.2.0.3 client:

I can insert unicode characters directly using cx_Oracle.
I can't insert unicode characters using ORM
I can't insert unicode characters using Django connection.cursor()

When cx_Oracle is compiled against instantclient 11.2 (multinational 
version) I can do all of the above without the problems.



--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Ian
On Wednesday, January 9, 2013 12:38:28 AM UTC-7, Jani Tiainen wrote:
>
> Tested against latest master. Same behaviour. 
>
> In Oracle backend base.py is following piece of code: 
>
> # Check whether cx_Oracle was compiled with the WITH_UNICODE option. 
> This will 
> # also be True in Python 3.0. 
> if int(Database.version.split('.', 1)[0]) >= 5 and not hasattr(Database, 
> 'UNICODE'): 
>  convert_unicode = force_text 
> else: 
>  convert_unicode = force_bytes 
>
> Which was added in  
>
> Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE 
> definition. 
>

That sounds correct.  The cx_Oracle.UNICODE type constant is present when 
cx_Oracle is compiled *without* the WITH_UNICODE option (which no longer 
exists in 5.1 anyway).

 

> And Django uses smart_str / force_bytes. 
>
> If I remove that and use convert_unicode as force_text / force_unicode 
> everything works as expected. 
>

Strange, in 5.1 it shouldn't make any difference which is used, as long as 
your NLS_LANG is getting set properly in the backend.  What is your server 
setup?  It seems that sometimes that can get interfered with if you have 
other services using Oracle in the same process.  It shouldn't hurt 
anything though for us to do an additional check for cx_Oracle 5.1+ and 
always use force_text in that case. 

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-users/-/CIo27txyz84J.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-09 Thread Jani Tiainen

Ok, found source of the problem - but I don't know the solution.

I'm using Oracle client 10.2.0.3.0. It seems that unicode doesn't work 
there.


I compiled cx_Oracle against 11g instantclient 11.2 and it worked just fine.

So it must be something that Django assumes with Oracle and unicode 
capability.


I had cx_Oracle.UNICODE defined always which is checked in the code. I 
don't really know why.



9.1.2013 8:56, Jani Tiainen kirjoitti:

8.1.2013 21:00, akaariai kirjoitti:

I created the following test case into django's test suite modeltests/
basic/tests.py:
 def test_unicode(self):
 # Note: from __future__ import unicode_literals is in
effect...
 a = Article.objects.create(headline='0
\u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
 self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
\u0442\u0435\u0441\u0442 test'   )

This does pass on Oracle when using Django's master branch, both with
Python 2.7 and 3.3.

Django's backend is doing all sorts of trickery behind the scenes to
get correct unicode handling. I am not sure where the problem is. What
Django version are you using?


Sorry about forgotting version info. I tested with 1.3.1 and 1.4.1 and
both gave same behaviour.

And I know that there is quite a lot of trickery going on. I'll try to
figure out what causes that problem.


On 8 tammi, 17:34, Jani Tiainen  wrote:

Hi,

I've been trying to save UTF-8 characters to oracle database without
success.

I've verified that database is indeed UTF-8 capable.

I can insert UTF-8 characters directly using cx_Oracle.

But when I use ORM it will trash characters.

Model I use:

class MyTest(models.Model):
  txt = CharField(max_length=128)

s = u'0 \u0442\u0435\u0441\u0442 test'

i = MyTest()
i.txt = s
i.save()

i2 = MyTest.objects.get(id=i.id)
print i2.txt

u'0 \xbf\xbf\xbf\xbf test'

So what happens here? It looks like Django trashes my unicode string at
some (unknown point).

Additional note:

If I use cursor() from Django connection object strings get broken also.
So it must be django Oracle backend doing something evil for me.

--
Jani Tiainen

- Well planned is half done and a half done has been sufficient
before...








--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-08 Thread Jani Tiainen

Tested against latest master. Same behaviour.

In Oracle backend base.py is following piece of code:

# Check whether cx_Oracle was compiled with the WITH_UNICODE option. 
This will

# also be True in Python 3.0.
if int(Database.version.split('.', 1)[0]) >= 5 and not hasattr(Database, 
'UNICODE'):

convert_unicode = force_text
else:
convert_unicode = force_bytes

Which was added in 

Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE 
definition.


And Django uses smart_str / force_bytes.

If I remove that and use convert_unicode as force_text / force_unicode 
everything works as expected.


9.1.2013 8:56, Jani Tiainen kirjoitti:

8.1.2013 21:00, akaariai kirjoitti:

I created the following test case into django's test suite modeltests/
basic/tests.py:
 def test_unicode(self):
 # Note: from __future__ import unicode_literals is in
effect...
 a = Article.objects.create(headline='0
\u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
 self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
\u0442\u0435\u0441\u0442 test'   )

This does pass on Oracle when using Django's master branch, both with
Python 2.7 and 3.3.

Django's backend is doing all sorts of trickery behind the scenes to
get correct unicode handling. I am not sure where the problem is. What
Django version are you using?


Sorry about forgotting version info. I tested with 1.3.1 and 1.4.1 and
both gave same behaviour.

And I know that there is quite a lot of trickery going on. I'll try to
figure out what causes that problem.


On 8 tammi, 17:34, Jani Tiainen  wrote:

Hi,

I've been trying to save UTF-8 characters to oracle database without
success.

I've verified that database is indeed UTF-8 capable.

I can insert UTF-8 characters directly using cx_Oracle.

But when I use ORM it will trash characters.

Model I use:

class MyTest(models.Model):
  txt = CharField(max_length=128)

s = u'0 \u0442\u0435\u0441\u0442 test'

i = MyTest()
i.txt = s
i.save()

i2 = MyTest.objects.get(id=i.id)
print i2.txt

u'0 \xbf\xbf\xbf\xbf test'

So what happens here? It looks like Django trashes my unicode string at
some (unknown point).

Additional note:

If I use cursor() from Django connection object strings get broken also.
So it must be django Oracle backend doing something evil for me.

--
Jani Tiainen

- Well planned is half done and a half done has been sufficient
before...








--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-08 Thread Jani Tiainen

8.1.2013 21:00, akaariai kirjoitti:

I created the following test case into django's test suite modeltests/
basic/tests.py:
 def test_unicode(self):
 # Note: from __future__ import unicode_literals is in
effect...
 a = Article.objects.create(headline='0
\u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
 self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
\u0442\u0435\u0441\u0442 test'   )

This does pass on Oracle when using Django's master branch, both with
Python 2.7 and 3.3.

Django's backend is doing all sorts of trickery behind the scenes to
get correct unicode handling. I am not sure where the problem is. What
Django version are you using?


Sorry about forgotting version info. I tested with 1.3.1 and 1.4.1 and 
both gave same behaviour.


And I know that there is quite a lot of trickery going on. I'll try to 
figure out what causes that problem.



On 8 tammi, 17:34, Jani Tiainen  wrote:

Hi,

I've been trying to save UTF-8 characters to oracle database without
success.

I've verified that database is indeed UTF-8 capable.

I can insert UTF-8 characters directly using cx_Oracle.

But when I use ORM it will trash characters.

Model I use:

class MyTest(models.Model):
  txt = CharField(max_length=128)

s = u'0 \u0442\u0435\u0441\u0442 test'

i = MyTest()
i.txt = s
i.save()

i2 = MyTest.objects.get(id=i.id)
print i2.txt

u'0 \xbf\xbf\xbf\xbf test'

So what happens here? It looks like Django trashes my unicode string at
some (unknown point).

Additional note:

If I use cursor() from Django connection object strings get broken also.
So it must be django Oracle backend doing something evil for me.

--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...





--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: ORM, Oracle and UTF-8 encoding problem.

2013-01-08 Thread akaariai
I created the following test case into django's test suite modeltests/
basic/tests.py:
def test_unicode(self):
# Note: from __future__ import unicode_literals is in
effect...
a = Article.objects.create(headline='0
\u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
\u0442\u0435\u0441\u0442 test'   )

This does pass on Oracle when using Django's master branch, both with
Python 2.7 and 3.3.

Django's backend is doing all sorts of trickery behind the scenes to
get correct unicode handling. I am not sure where the problem is. What
Django version are you using?

 - Anssi

On 8 tammi, 17:34, Jani Tiainen  wrote:
> Hi,
>
> I've been trying to save UTF-8 characters to oracle database without
> success.
>
> I've verified that database is indeed UTF-8 capable.
>
> I can insert UTF-8 characters directly using cx_Oracle.
>
> But when I use ORM it will trash characters.
>
> Model I use:
>
> class MyTest(models.Model):
>      txt = CharField(max_length=128)
>
> s = u'0 \u0442\u0435\u0441\u0442 test'
>
> i = MyTest()
> i.txt = s
> i.save()
>
> i2 = MyTest.objects.get(id=i.id)
> print i2.txt
>
> u'0 \xbf\xbf\xbf\xbf test'
>
> So what happens here? It looks like Django trashes my unicode string at
> some (unknown point).
>
> Additional note:
>
> If I use cursor() from Django connection object strings get broken also.
> So it must be django Oracle backend doing something evil for me.
>
> --
> Jani Tiainen
>
> - Well planned is half done and a half done has been sufficient before...

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



ORM, Oracle and UTF-8 encoding problem.

2013-01-08 Thread Jani Tiainen

Hi,

I've been trying to save UTF-8 characters to oracle database without 
success.


I've verified that database is indeed UTF-8 capable.

I can insert UTF-8 characters directly using cx_Oracle.

But when I use ORM it will trash characters.

Model I use:

class MyTest(models.Model):
txt = CharField(max_length=128)


s = u'0 \u0442\u0435\u0441\u0442 test'

i = MyTest()
i.txt = s
i.save()

i2 = MyTest.objects.get(id=i.id)
print i2.txt

u'0 \xbf\xbf\xbf\xbf test'


So what happens here? It looks like Django trashes my unicode string at 
some (unknown point).


Additional note:

If I use cursor() from Django connection object strings get broken also. 
So it must be django Oracle backend doing something evil for me.


--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.