Re: Unicode Error when Saving Django Model

2010-05-25 Thread vjimw
Thanks. It was actually a combination of issues. The database was
UTF8, I should have added to my original post that I could manually
insert and retrieve UTF8 data.

The data we are pulling (migrating one system to a new one, built on
django) is a bit of a nest of encoding issues. So things that may look
like UTF8 may not be, etc.  So I think my attempts to encode this data
as UTF8 started the problem.

Thanks for the help and the general heads up on encoding and unicode
with django. I have read about it, but I understand it better each
time I encounter a problem with it.

--Jim

On May 24, 8:30 am, Karen Tracey  wrote:
> On Sun, May 23, 2010 at 10:10 PM, vjimw  wrote:
> > I have been reading up on Unicode with Python and Django and I think I
> > have my code set to use UTF8 data when saving or updating an object
> > but I get an error on model.save()
>
> > My database and all of its tables are UTF8 encoded with UTF8 collation
> > (DEFAULT CHARSET=utf8;)
> > The data I am inputting is unicode
> > (u'Save up to 25% on your online order of select HP LaserJet\x92s')
> > 
>
> > But when I try to save this data I get an error
> > Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1
>
> This error implies that your MySQL table is not set up the say you think it
> is, with a charset of utf8. Given a table that actually has a utf8 charset:
>
> k...@lbox:~/software/web/playground$ mysql -p Play2
> Enter password:
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
> Welcome to the MySQL monitor.  Commands end with ; or \g.
> Your MySQL connection id is 5852
> Server version: 5.0.67-0ubuntu6.1 (Ubuntu)
>
> Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
>
> mysql> show create table ttt_tag;
> +-+ 
> --- 
> --+
> | Table   | Create
> Table
> |
> +-+ 
> --- 
> --+
> | ttt_tag | CREATE TABLE `ttt_tag` (
>   `id` int(11) NOT NULL auto_increment,
>   `name` varchar(88) NOT NULL,
>   PRIMARY KEY  (`id`)
> ) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 |
> +-+ 
> --- 
> --+
> 1 row in set (0.00 sec)
>
> I can create an object in Django using the odd unicode character your
> string  includes (though I'm not sure what it is supposed to be -- based on
> its placement I'd guess it is supposed to be a registered trademark symbol
> but that's not what you actually have):
>
> k...@lbox:~/software/web/playground$ python manage.py shell
> Python 2.5.2 (r252:60911, Jan 20 2010, 23:16:55)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> (InteractiveConsole)
>
> >>> from ttt.models import Tag
> >>> t = Tag.objects.create(name=u'HP LaserJet\x92s')
> >>> print t
> HP LaserJet s
> >>> quit()
>
> So that works, though the character does not print as anything useful.
>
> If I change the table to have a charset of latin1 (MySQL's default):
>
> mysql> drop table ttt_tag;
> Query OK, 0 rows affected (0.00 sec)
> mysql> create table ttt_tag (id int(11) not null auto_increment, name
> varchar(88) not null, primary key (id)) engine=myisam default charset
> latin1;
> Query OK, 0 rows affected (0.01 sec)
>
> I can then recreate the error you report:
>
> >>> t = Tag.objects.create(name=u'HP LaserJet\x92s')
>
> Traceback (most recent call last):
>   File "", line 1, in 
> [snipped]
>   File "/usr/lib/python2.5/warnings.py", line 102, in warn_explicit
>     raise message
> Warning: Incorrect string value: '\xC2\x92s' for column 'name' at row 1
>
> So I think one problem is that your table is not actually set up the way you
> think it is.
>
> Another may be that you data is not really correct either. What you are
> showing that you have in your data is this character:
>
> http://www.fileformat.info/info/unicode/char/0092/index.htm
>
> and I suspect what you really want is either of these:
>
> http://www.fileformat.info/info/unicode/char/2122/index.htmhttp://www.fileformat.info/info/unicode/char/00ae/index.htm
>
> Either of these would display better than what you have:
>
> >>> u1 = u'LaserJet\u2122'
> >>> print u1
> LaserJet(tm)
> >>> u2 = u'LaserJet\xae'
> >>> print u2
>
> LaserJet(R)
>
> Karen
> --http://tracey.org/kmt/
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr.

Re: Unicode Error when Saving Django Model

2010-05-24 Thread Scott Gould
Point taken, three times.

On May 24, 9:40 am, Karen Tracey  wrote:
> On Mon, May 24, 2010 at 8:27 AM, Scott Gould  wrote:
> > > My database and all of its tables are UTF8 encoded with UTF8 collation
> > > (DEFAULT CHARSET=utf8;)
> > > The data I am inputting is unicode
> > > (u'Save up to 25% on your online order of select HP LaserJet\x92s')
> > > 
>
> > > But when I try to save this data I get an error
> > > Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1
>
> > > I assume I am missing something, but not sure what I am missing.
>
> > Your string is a unicode string (u'...') but you have UTF-8 encoded
> > text inside it.
>
> No, that is just the way Python displays unicode repr. The value shown is a
> valid unicode string with a character \x92 in it. This is encoded to utf-8
> as \xC2\x92 for storage in the database, and the database is reporting an
> error with that uf8 encoded value, likely because the table actually has a
> non-utf8 charset that has no mapping for unicode u+0092.
>
> > Unicode is not UTF-8; UTF-8 is a way to represent
> > unicode in ASCII. You should be able to fix it by either casting that
> > string to str(),
>
> Casting to str() would raise a UnicodeEncodeError, because the unicode
> character \x92 cannot be encoded in ASCII:
>
> >>> u
> u'LaserJet\x92'
> >>> type(u)
> 
> >>> str(u)
>
> Traceback (most recent call last):
>   File "", line 1, in 
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in position
> 8: ordinal not in range(128)
>
> > or by having "real" unicode inside it (difficult to
> > say which is better without knowing how you're obtaining that string
> > to begin with).
>
> It is real unicode as it is, though rather odd (it's a "private use"
> character).
>
> Karen
> --http://tracey.org/kmt/
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group 
> athttp://groups.google.com/group/django-users?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Unicode Error when Saving Django Model

2010-05-24 Thread Karen Tracey
On Mon, May 24, 2010 at 8:27 AM, Scott Gould  wrote:

> > My database and all of its tables are UTF8 encoded with UTF8 collation
> > (DEFAULT CHARSET=utf8;)
> > The data I am inputting is unicode
> > (u'Save up to 25% on your online order of select HP LaserJet\x92s')
> > 
> >
> > But when I try to save this data I get an error
> > Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1
> >
> > I assume I am missing something, but not sure what I am missing.
>
> Your string is a unicode string (u'...') but you have UTF-8 encoded
> text inside it.


No, that is just the way Python displays unicode repr. The value shown is a
valid unicode string with a character \x92 in it. This is encoded to utf-8
as \xC2\x92 for storage in the database, and the database is reporting an
error with that uf8 encoded value, likely because the table actually has a
non-utf8 charset that has no mapping for unicode u+0092.


> Unicode is not UTF-8; UTF-8 is a way to represent
> unicode in ASCII. You should be able to fix it by either casting that
> string to str(),


Casting to str() would raise a UnicodeEncodeError, because the unicode
character \x92 cannot be encoded in ASCII:

>>> u
u'LaserJet\x92'
>>> type(u)

>>> str(u)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in position
8: ordinal not in range(128)


> or by having "real" unicode inside it (difficult to
> say which is better without knowing how you're obtaining that string
> to begin with).


It is real unicode as it is, though rather odd (it's a "private use"
character).

Karen
-- 
http://tracey.org/kmt/

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Unicode Error when Saving Django Model

2010-05-24 Thread Karen Tracey
On Sun, May 23, 2010 at 10:10 PM, vjimw  wrote:

> I have been reading up on Unicode with Python and Django and I think I
> have my code set to use UTF8 data when saving or updating an object
> but I get an error on model.save()
>
> My database and all of its tables are UTF8 encoded with UTF8 collation
> (DEFAULT CHARSET=utf8;)
> The data I am inputting is unicode
> (u'Save up to 25% on your online order of select HP LaserJet\x92s')
> 
>
> But when I try to save this data I get an error
> Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1
>


This error implies that your MySQL table is not set up the say you think it
is, with a charset of utf8. Given a table that actually has a utf8 charset:

k...@lbox:~/software/web/playground$ mysql -p Play2
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5852
Server version: 5.0.67-0ubuntu6.1 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show create table ttt_tag;
+-+-+
| Table   | Create
Table
|
+-+-+
| ttt_tag | CREATE TABLE `ttt_tag` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(88) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 |
+-+-+
1 row in set (0.00 sec)

I can create an object in Django using the odd unicode character your
string  includes (though I'm not sure what it is supposed to be -- based on
its placement I'd guess it is supposed to be a registered trademark symbol
but that's not what you actually have):

k...@lbox:~/software/web/playground$ python manage.py shell
Python 2.5.2 (r252:60911, Jan 20 2010, 23:16:55)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from ttt.models import Tag
>>> t = Tag.objects.create(name=u'HP LaserJet\x92s')
>>> print t
HP LaserJet s
>>> quit()

So that works, though the character does not print as anything useful.

If I change the table to have a charset of latin1 (MySQL's default):

mysql> drop table ttt_tag;
Query OK, 0 rows affected (0.00 sec)
mysql> create table ttt_tag (id int(11) not null auto_increment, name
varchar(88) not null, primary key (id)) engine=myisam default charset
latin1;
Query OK, 0 rows affected (0.01 sec)

I can then recreate the error you report:

>>> t = Tag.objects.create(name=u'HP LaserJet\x92s')
Traceback (most recent call last):
  File "", line 1, in 
[snipped]
  File "/usr/lib/python2.5/warnings.py", line 102, in warn_explicit
raise message
Warning: Incorrect string value: '\xC2\x92s' for column 'name' at row 1

So I think one problem is that your table is not actually set up the way you
think it is.

Another may be that you data is not really correct either. What you are
showing that you have in your data is this character:

http://www.fileformat.info/info/unicode/char/0092/index.htm

and I suspect what you really want is either of these:

http://www.fileformat.info/info/unicode/char/2122/index.htm
http://www.fileformat.info/info/unicode/char/00ae/index.htm

Either of these would display better than what you have:

>>> u1 = u'LaserJet\u2122'
>>> print u1
LaserJet(tm)
>>> u2 = u'LaserJet\xae'
>>> print u2
LaserJet(R)

Karen
-- 
http://tracey.org/kmt/

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Unicode Error when Saving Django Model

2010-05-24 Thread Scott Gould
> My database and all of its tables are UTF8 encoded with UTF8 collation
> (DEFAULT CHARSET=utf8;)
> The data I am inputting is unicode
> (u'Save up to 25% on your online order of select HP LaserJet\x92s')
> 
>
> But when I try to save this data I get an error
> Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1
>
> I assume I am missing something, but not sure what I am missing.

Your string is a unicode string (u'...') but you have UTF-8 encoded
text inside it. Unicode is not UTF-8; UTF-8 is a way to represent
unicode in ASCII. You should be able to fix it by either casting that
string to str(), or by having "real" unicode inside it (difficult to
say which is better without knowing how you're obtaining that string
to begin with).

Regards
Scott

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Unicode Error when Saving Django Model

2010-05-23 Thread vjimw
I have been reading up on Unicode with Python and Django and I think I
have my code set to use UTF8 data when saving or updating an object
but I get an error on model.save()

My database and all of its tables are UTF8 encoded with UTF8 collation
(DEFAULT CHARSET=utf8;)
The data I am inputting is unicode
(u'Save up to 25% on your online order of select HP LaserJet\x92s')


But when I try to save this data I get an error
Incorrect string value: '\\xC2\\x92s' for column 'title' at row 1

I assume I am missing something, but not sure what I am missing.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.