On Sat, Feb 7, 2009 at 8:25 PM, redmonkey <michele.mem...@googlemail.com>wrote:

>
> Thank you very much. That solved it, gave me all the information I
> needed to not make the same mistake again, and taught me a quick way
> to check the encoding of strings in python.
>
> As it happen, in this case, the script that generates the external
> file is some commercial software, so I can't touch it. It all seems to
> work though.
>
> Thanks again,
>
> RM
>
> On Feb 8, 12:57 am, Karen Tracey <kmtra...@gmail.com> wrote:
> > On Sat, Feb 7, 2009 at 7:27 PM, redmonkey <michele.mem...@googlemail.com
> >wrote:
> >
> >
> >
> >
> >
> > > Sure, here's a bit more info.
> >
> > > The external data is generated by a script and it describes a
> > > catalogue of lot items for an auction site I'm building. The format
> > > includes a lot number, a brief description of the lot for sale, and an
> > > estimate for the item. Each lot is separated in the file by a '$' with
> > > some whitespace. Here's a snippet:
> >
> > > $
> > >  292 A collection of wine bottle and trinket boxes
> > >     Est. 30-60
> > > $
> > >  293 A paper maché letter rack with painted foliate decoration and a
> > > C19th papier mache side chair and one other (a/f)
> > >     Est. 20-30
> > > $
> > >  294 A wall mirror with bevelled plate within gilt frame
> > >     Est. 40-60
> >
> > And this file is encoded in...?  It doesn't appear to be utf-8.  It may
> be
> > iso8859-1.
> >
> >  [snip]
> >
> >
> >
> >
> >
> > > And here's that handle_data_upload function (it's passed the uploaded
> > > file object):
> >
> > > def handle_data_upload(f, cat):
> > >    """
> > >    Creates and Adds lots to catalogue.
> >
> > >    """
> >
> > >    lot = re.compile(r'\s*(?P<lot_number>\d*) (?P<description>.*)
> > > \s*Est. (?P<min_estimate>\d*)-(?P<max_estimate>\d*)')
> > >    iterator = lot.finditer(f.read())
> > >    f.close()
> >
> > >    for item in iterator:
> > >        if not item.group('description') == "end":
> > >            Lot.objects.create(
> > >                lot_number=int(item.group('lot_number')),
> > >                description=item.group('description').strip(),
> >
> > Here you are setting description to a bytestring read from your file.
>  When
> > you don't pass Unicode to Django, Django will convert to unicode assuming
> a
> > utf-8 encoding, which will cause the error you are getting if the file is
> > not in fact using utf-8 as the encoding.  I suspect your file is encoded
> in
> > iso8859-1, in which case changing this line to:
> >
> > description=unicode(item.group('description').strip(), 'iso8859-1')
> >
> > Will probably fix the problem.  But, you should verify that that is the
> > encoding used by whatever is creating the file, and if possible you might
> > want to change whatever is creating the file to use utf-8 for the
> encoding,
> > if possible (and if these files aren't fed into other processes that
> might
> > get confused by changing their encoding).
> >
> > [snip]
> >
> > File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in
> >
> > > force_unicode
> > >  70.         raise DjangoUnicodeDecodeError(s, *e.args)
> >
> > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/
> > > add/
> > > Exception Value: 'utf8' codec can't decode bytes in position 12-14:
> > > invalid data. You passed in 'A paper mach\xe9 letter rack with painted
> > > foliate decoration and a C19th papier mache side chair and one other
> > > (a/f)' (<type 'str'>)
> >
> > This is why I think your file is using iso889-1:
> >
> > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
> > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.>>>
> s = 'A paper mach\xe9 letter rack'
> > >>> print unicode(s, 'utf-8')
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14:
> > invalid data>>> print unicode(s, 'iso8859-1')
> >
> > A paper maché letter rack
> >
> >
> >
> > The one that causes the error is what Django does when handed a
> bytestring,
> > and matches what you are seeing.  Using iso8859-1 as the encoding makes
> the
> > value convert and print properly (plus it's a popular encoding).
> >
> >
> >
> > > I hope that clears a few things up.
> >
> > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56)
> >
> > No, in that blog post the user had a broken __unicode__ method in their
> > model, it wasn't actually an admin problem.
> >
> > Karen
> >
>
FYI when you want to open a file with a specific encoding but deal with it
as unicode the python codecs library is great:
http://docs.python.org/library/codecs.html#codecs.open

Alex

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to