Re: external non-ascii character is breaking my script

2009-02-07 Thread Alex Gaynor
On Sat, Feb 7, 2009 at 8:25 PM, redmonkey wrote:

>
> Thank you very much. That solved it, gave me all the information I
> needed to not make the same mistake again, and taught me a quick way
> to check the encoding of strings in python.
>
> As it happen, in this case, the script that generates the external
> file is some commercial software, so I can't touch it. It all seems to
> work though.
>
> Thanks again,
>
> RM
>
> On Feb 8, 12:57 am, Karen Tracey  wrote:
> > On Sat, Feb 7, 2009 at 7:27 PM, redmonkey  >wrote:
> >
> >
> >
> >
> >
> > > Sure, here's a bit more info.
> >
> > > The external data is generated by a script and it describes a
> > > catalogue of lot items for an auction site I'm building. The format
> > > includes a lot number, a brief description of the lot for sale, and an
> > > estimate for the item. Each lot is separated in the file by a '$' with
> > > some whitespace. Here's a snippet:
> >
> > > $
> > >  292 A collection of wine bottle and trinket boxes
> > > Est. 30-60
> > > $
> > >  293 A paper maché letter rack with painted foliate decoration and a
> > > C19th papier mache side chair and one other (a/f)
> > > Est. 20-30
> > > $
> > >  294 A wall mirror with bevelled plate within gilt frame
> > > Est. 40-60
> >
> > And this file is encoded in...?  It doesn't appear to be utf-8.  It may
> be
> > iso8859-1.
> >
> >  [snip]
> >
> >
> >
> >
> >
> > > And here's that handle_data_upload function (it's passed the uploaded
> > > file object):
> >
> > > def handle_data_upload(f, cat):
> > >"""
> > >Creates and Adds lots to catalogue.
> >
> > >"""
> >
> > >lot = re.compile(r'\s*(?P\d*) (?P.*)
> > > \s*Est. (?P\d*)-(?P\d*)')
> > >iterator = lot.finditer(f.read())
> > >f.close()
> >
> > >for item in iterator:
> > >if not item.group('description') == "end":
> > >Lot.objects.create(
> > >lot_number=int(item.group('lot_number')),
> > >description=item.group('description').strip(),
> >
> > Here you are setting description to a bytestring read from your file.
>  When
> > you don't pass Unicode to Django, Django will convert to unicode assuming
> a
> > utf-8 encoding, which will cause the error you are getting if the file is
> > not in fact using utf-8 as the encoding.  I suspect your file is encoded
> in
> > iso8859-1, in which case changing this line to:
> >
> > description=unicode(item.group('description').strip(), 'iso8859-1')
> >
> > Will probably fix the problem.  But, you should verify that that is the
> > encoding used by whatever is creating the file, and if possible you might
> > want to change whatever is creating the file to use utf-8 for the
> encoding,
> > if possible (and if these files aren't fed into other processes that
> might
> > get confused by changing their encoding).
> >
> > [snip]
> >
> > File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in
> >
> > > force_unicode
> > >  70. raise DjangoUnicodeDecodeError(s, *e.args)
> >
> > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/
> > > add/
> > > Exception Value: 'utf8' codec can't decode bytes in position 12-14:
> > > invalid data. You passed in 'A paper mach\xe9 letter rack with painted
> > > foliate decoration and a C19th papier mache side chair and one other
> > > (a/f)' ()
> >
> > This is why I think your file is using iso889-1:
> >
> > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
> > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.>>>
> s = 'A paper mach\xe9 letter rack'
> > >>> print unicode(s, 'utf-8')
> >
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14:
> > invalid data>>> print unicode(s, 'iso8859-1')
> >
> > A paper maché letter rack
> >
> >
> >
> > The one that causes the error is what Django does when handed a
> bytestring,
> > and matches what you are seeing.  Using iso8859-1 as the encoding makes
> the
> > value convert and print properly (plus it's a popular encoding).
> >
> >
> >
> > > I hope that clears a few things up.
> >
> > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56)
> >
> > No, in that blog post the user had a broken __unicode__ method in their
> > model, it wasn't actually an admin problem.
> >
> > Karen
> >
>
FYI when you want to open a file with a specific encoding but deal with it
as unicode the python codecs library is great:
http://docs.python.org/library/codecs.html#codecs.open

Alex

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe 

Re: external non-ascii character is breaking my script

2009-02-07 Thread redmonkey

Thank you very much. That solved it, gave me all the information I
needed to not make the same mistake again, and taught me a quick way
to check the encoding of strings in python.

As it happen, in this case, the script that generates the external
file is some commercial software, so I can't touch it. It all seems to
work though.

Thanks again,

RM

On Feb 8, 12:57 am, Karen Tracey  wrote:
> On Sat, Feb 7, 2009 at 7:27 PM, redmonkey 
> wrote:
>
>
>
>
>
> > Sure, here's a bit more info.
>
> > The external data is generated by a script and it describes a
> > catalogue of lot items for an auction site I'm building. The format
> > includes a lot number, a brief description of the lot for sale, and an
> > estimate for the item. Each lot is separated in the file by a '$' with
> > some whitespace. Here's a snippet:
>
> > $
> >  292 A collection of wine bottle and trinket boxes
> >     Est. 30-60
> > $
> >  293 A paper maché letter rack with painted foliate decoration and a
> > C19th papier mache side chair and one other (a/f)
> >     Est. 20-30
> > $
> >  294 A wall mirror with bevelled plate within gilt frame
> >     Est. 40-60
>
> And this file is encoded in...?  It doesn't appear to be utf-8.  It may be
> iso8859-1.
>
>  [snip]
>
>
>
>
>
> > And here's that handle_data_upload function (it's passed the uploaded
> > file object):
>
> > def handle_data_upload(f, cat):
> >    """
> >    Creates and Adds lots to catalogue.
>
> >    """
>
> >    lot = re.compile(r'\s*(?P\d*) (?P.*)
> > \s*Est. (?P\d*)-(?P\d*)')
> >    iterator = lot.finditer(f.read())
> >    f.close()
>
> >    for item in iterator:
> >        if not item.group('description') == "end":
> >            Lot.objects.create(
> >                lot_number=int(item.group('lot_number')),
> >                description=item.group('description').strip(),
>
> Here you are setting description to a bytestring read from your file.  When
> you don't pass Unicode to Django, Django will convert to unicode assuming a
> utf-8 encoding, which will cause the error you are getting if the file is
> not in fact using utf-8 as the encoding.  I suspect your file is encoded in
> iso8859-1, in which case changing this line to:
>
> description=unicode(item.group('description').strip(), 'iso8859-1')
>
> Will probably fix the problem.  But, you should verify that that is the
> encoding used by whatever is creating the file, and if possible you might
> want to change whatever is creating the file to use utf-8 for the encoding,
> if possible (and if these files aren't fed into other processes that might
> get confused by changing their encoding).
>
> [snip]
>
> File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in
>
> > force_unicode
> >  70.         raise DjangoUnicodeDecodeError(s, *e.args)
>
> > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/
> > add/
> > Exception Value: 'utf8' codec can't decode bytes in position 12-14:
> > invalid data. You passed in 'A paper mach\xe9 letter rack with painted
> > foliate decoration and a C19th papier mache side chair and one other
> > (a/f)' ()
>
> This is why I think your file is using iso889-1:
>
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
> [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> s = 
> 'A paper mach\xe9 letter rack'
> >>> print unicode(s, 'utf-8')
>
> Traceback (most recent call last):
>   File "", line 1, in 
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14:
> invalid data>>> print unicode(s, 'iso8859-1')
>
> A paper maché letter rack
>
>
>
> The one that causes the error is what Django does when handed a bytestring,
> and matches what you are seeing.  Using iso8859-1 as the encoding makes the
> value convert and print properly (plus it's a popular encoding).
>
>
>
> > I hope that clears a few things up.
>
> > Is this an admin thing? (http://www.factory-h.com/blog/?p=56)
>
> No, in that blog post the user had a broken __unicode__ method in their
> model, it wasn't actually an admin problem.
>
> Karen
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: external non-ascii character is breaking my script

2009-02-07 Thread Karen Tracey
On Sat, Feb 7, 2009 at 7:27 PM, redmonkey wrote:

>
> Sure, here's a bit more info.
>
> The external data is generated by a script and it describes a
> catalogue of lot items for an auction site I'm building. The format
> includes a lot number, a brief description of the lot for sale, and an
> estimate for the item. Each lot is separated in the file by a '$' with
> some whitespace. Here's a snippet:
>
> $
>  292 A collection of wine bottle and trinket boxes
> Est. 30-60
> $
>  293 A paper maché letter rack with painted foliate decoration and a
> C19th papier mache side chair and one other (a/f)
> Est. 20-30
> $
>  294 A wall mirror with bevelled plate within gilt frame
> Est. 40-60
>

And this file is encoded in...?  It doesn't appear to be utf-8.  It may be
iso8859-1.

 [snip]

>
> And here's that handle_data_upload function (it's passed the uploaded
> file object):
>
> def handle_data_upload(f, cat):
>"""
>Creates and Adds lots to catalogue.
>
>"""
>
>lot = re.compile(r'\s*(?P\d*) (?P.*)
> \s*Est. (?P\d*)-(?P\d*)')
>iterator = lot.finditer(f.read())
>f.close()
>
>for item in iterator:
>if not item.group('description') == "end":
>Lot.objects.create(
>lot_number=int(item.group('lot_number')),
>description=item.group('description').strip(),


Here you are setting description to a bytestring read from your file.  When
you don't pass Unicode to Django, Django will convert to unicode assuming a
utf-8 encoding, which will cause the error you are getting if the file is
not in fact using utf-8 as the encoding.  I suspect your file is encoded in
iso8859-1, in which case changing this line to:

description=unicode(item.group('description').strip(), 'iso8859-1')

Will probably fix the problem.  But, you should verify that that is the
encoding used by whatever is creating the file, and if possible you might
want to change whatever is creating the file to use utf-8 for the encoding,
if possible (and if these files aren't fed into other processes that might
get confused by changing their encoding).

[snip]

File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in
> force_unicode
>  70. raise DjangoUnicodeDecodeError(s, *e.args)
>
> Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/
> add/
> Exception Value: 'utf8' codec can't decode bytes in position 12-14:
> invalid data. You passed in 'A paper mach\xe9 letter rack with painted
> foliate decoration and a C19th papier mache side chair and one other
> (a/f)' ()



This is why I think your file is using iso889-1:

Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = 'A paper mach\xe9 letter rack'
>>> print unicode(s, 'utf-8')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14:
invalid data
>>> print unicode(s, 'iso8859-1')
A paper maché letter rack
>>>

The one that causes the error is what Django does when handed a bytestring,
and matches what you are seeing.  Using iso8859-1 as the encoding makes the
value convert and print properly (plus it's a popular encoding).


>
> I hope that clears a few things up.
>
> Is this an admin thing? (http://www.factory-h.com/blog/?p=56)
>

No, in that blog post the user had a broken __unicode__ method in their
model, it wasn't actually an admin problem.

Karen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: external non-ascii character is breaking my script

2009-02-07 Thread redmonkey

Sure, here's a bit more info.

The external data is generated by a script and it describes a
catalogue of lot items for an auction site I'm building. The format
includes a lot number, a brief description of the lot for sale, and an
estimate for the item. Each lot is separated in the file by a '$' with
some whitespace. Here's a snippet:

$
 292 A collection of wine bottle and trinket boxes
 Est. 30-60
$
 293 A paper maché letter rack with painted foliate decoration and a
C19th papier mache side chair and one other (a/f)
 Est. 20-30
$
 294 A wall mirror with bevelled plate within gilt frame
 Est. 40-60

I've got a regular expression to extract out all the bits I need from
the external file:

lot = re.compile(r'\s*(?P\d*) (?P.*)\s*Est. (?
P\d*)-(?P\d*)')

This information is extracted when this file is submitted to an 'add
new catalogue' form in Django's admin interface:

class CatalogueAdmin(admin.ModelAdmin):
# ...
def save_model(self, request, obj, form, change):
"""
Check for an attached data file, if instance is being created,
also
creates models within data file.
"""
obj.save()
if not change and form.cleaned_data['data']:
# Creating a new catalogue
handle_data_upload(form.cleaned_data['data'], obj)

And here's that handle_data_upload function (it's passed the uploaded
file object):

def handle_data_upload(f, cat):
"""
Creates and Adds lots to catalogue.

"""

lot = re.compile(r'\s*(?P\d*) (?P.*)
\s*Est. (?P\d*)-(?P\d*)')
iterator = lot.finditer(f.read())
f.close()

for item in iterator:
if not item.group('description') == "end":
Lot.objects.create(
lot_number=int(item.group('lot_number')),
description=item.group('description').strip(),
min_estimate=Decimal(item.group('min_estimate')),
max_estimate=Decimal(item.group('max_estimate')),
catalogue=cat
)

Again, this all seems to work fine until Django come across the "é" in
the external data when it decides to throw this error:

Environment:

Request Method: POST
Request URL: http://192.168.0.2:8000/admin/catalogue/catalogue/add/
Django Version: 1.1 pre-alpha SVN-9646
Python Version: 2.5.1
Installed Applications:
['django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.sites',
 'django.contrib.admin',
 'auction.catalogue',
 'auction.mailouts',
 'auction.users',
 'auction.bidding',
 'django.contrib.flatpages',
 'profiles',
 'registration',
 'django_extensions',
 'tinymce',
 'auction.lot-alerts']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware')


Traceback:
File "/Library/Python/2.5/site-packages/django/core/handlers/base.py"
in get_response
  86. response = callback(request, *callback_args,
**callback_kwargs)
File "/Library/Python/2.5/site-packages/django/contrib/admin/sites.py"
in root
  157. return self.model_page(request, *url.split('/',
2))
File "/Library/Python/2.5/site-packages/django/views/decorators/
cache.py" in _wrapped_view_func
  44. response = view_func(request, *args, **kwargs)
File "/Library/Python/2.5/site-packages/django/contrib/admin/sites.py"
in model_page
  176. return admin_obj(request, rest_of_url)
File "/Library/Python/2.5/site-packages/django/contrib/admin/
options.py" in __call__
  191. return self.add_view(request)
File "/Library/Python/2.5/site-packages/django/db/transaction.py" in
_commit_on_success
  238. res = func(*args, **kw)
File "/Library/Python/2.5/site-packages/django/contrib/admin/
options.py" in add_view
  494. self.save_model(request, new_object, form,
change=False)
File "/Library/Python/2.5/site-packages/auction/catalogue/admin.py" in
save_model
  34. handle_data_upload(form.cleaned_data['data'], obj)
File "/Library/Python/2.5/site-packages/auction/catalogue/utils.py" in
handle_data_upload
  27. catalogue=cat
File "/Library/Python/2.5/site-packages/django/db/models/manager.py"
in create
  99. return self.get_query_set().create(**kwargs)
File "/Library/Python/2.5/site-packages/django/db/models/query.py" in
create
  319. obj.save(force_insert=True)
File "/Library/Python/2.5/site-packages/auction/catalogue/models.py"
in save
  170. super(Lot, self).save(kwargs)
File "/Library/Python/2.5/site-packages/django/db/models/base.py" in
save
  328. self.save_base(force_insert=force_insert,
force_update=force_update)
File "/Library/Python/2.5/site-packages/django/db/models/base.py" in
save_base
  400. result = manager._insert(values,
return_id=update_pk)
File "/Library/Python/2.5/site-packages/d

Re: external non-ascii character is breaking my script

2009-02-07 Thread Karen Tracey
On Sat, Feb 7, 2009 at 11:56 AM, redmonkey wrote:

>
> Hey everyone,
>
> I'm trying to create django model instances from data stored in a flat
> text file but I get `force_unicode` errors when the script comes
> across one of the data items containing the "é" character.
>
> Can anyone explain to me what this problem is and now I can fix it? I
> can't really get my head around unicode, ascii and UTF-8 stuff.
>
>
If you expect anyone on the list to help, you really need to share some
snippets of the code you are using to read the data from the file and create
Django model instances, plus the full traceback you get when it runs into
trouble.  Django certainly handles unicode data in models, so there's
something specific about what you are doing that is causing a problem.

Karen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



external non-ascii character is breaking my script

2009-02-07 Thread redmonkey

Hey everyone,

I'm trying to create django model instances from data stored in a flat
text file but I get `force_unicode` errors when the script comes
across one of the data items containing the "é" character.

Can anyone explain to me what this problem is and now I can fix it? I
can't really get my head around unicode, ascii and UTF-8 stuff.

Thanks,

RedMonkey
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---