Re: external non-ascii character is breaking my script
On Sat, Feb 7, 2009 at 8:25 PM, redmonkey wrote: > > Thank you very much. That solved it, gave me all the information I > needed to not make the same mistake again, and taught me a quick way > to check the encoding of strings in python. > > As it happen, in this case, the script that generates the external > file is some commercial software, so I can't touch it. It all seems to > work though. > > Thanks again, > > RM > > On Feb 8, 12:57 am, Karen Tracey wrote: > > On Sat, Feb 7, 2009 at 7:27 PM, redmonkey >wrote: > > > > > > > > > > > > > Sure, here's a bit more info. > > > > > The external data is generated by a script and it describes a > > > catalogue of lot items for an auction site I'm building. The format > > > includes a lot number, a brief description of the lot for sale, and an > > > estimate for the item. Each lot is separated in the file by a '$' with > > > some whitespace. Here's a snippet: > > > > > $ > > > 292 A collection of wine bottle and trinket boxes > > > Est. 30-60 > > > $ > > > 293 A paper maché letter rack with painted foliate decoration and a > > > C19th papier mache side chair and one other (a/f) > > > Est. 20-30 > > > $ > > > 294 A wall mirror with bevelled plate within gilt frame > > > Est. 40-60 > > > > And this file is encoded in...? It doesn't appear to be utf-8. It may > be > > iso8859-1. > > > > [snip] > > > > > > > > > > > > > And here's that handle_data_upload function (it's passed the uploaded > > > file object): > > > > > def handle_data_upload(f, cat): > > >""" > > >Creates and Adds lots to catalogue. > > > > >""" > > > > >lot = re.compile(r'\s*(?P\d*) (?P.*) > > > \s*Est. (?P\d*)-(?P\d*)') > > >iterator = lot.finditer(f.read()) > > >f.close() > > > > >for item in iterator: > > >if not item.group('description') == "end": > > >Lot.objects.create( > > >lot_number=int(item.group('lot_number')), > > >description=item.group('description').strip(), > > > > Here you are setting description to a bytestring read from your file. > When > > you don't pass Unicode to Django, Django will convert to unicode assuming > a > > utf-8 encoding, which will cause the error you are getting if the file is > > not in fact using utf-8 as the encoding. I suspect your file is encoded > in > > iso8859-1, in which case changing this line to: > > > > description=unicode(item.group('description').strip(), 'iso8859-1') > > > > Will probably fix the problem. But, you should verify that that is the > > encoding used by whatever is creating the file, and if possible you might > > want to change whatever is creating the file to use utf-8 for the > encoding, > > if possible (and if these files aren't fed into other processes that > might > > get confused by changing their encoding). > > > > [snip] > > > > File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in > > > > > force_unicode > > > 70. raise DjangoUnicodeDecodeError(s, *e.args) > > > > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/ > > > add/ > > > Exception Value: 'utf8' codec can't decode bytes in position 12-14: > > > invalid data. You passed in 'A paper mach\xe9 letter rack with painted > > > foliate decoration and a C19th papier mache side chair and one other > > > (a/f)' () > > > > This is why I think your file is using iso889-1: > > > > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) > > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information.>>> > s = 'A paper mach\xe9 letter rack' > > >>> print unicode(s, 'utf-8') > > > > Traceback (most recent call last): > > File "", line 1, in > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14: > > invalid data>>> print unicode(s, 'iso8859-1') > > > > A paper maché letter rack > > > > > > > > The one that causes the error is what Django does when handed a > bytestring, > > and matches what you are seeing. Using iso8859-1 as the encoding makes > the > > value convert and print properly (plus it's a popular encoding). > > > > > > > > > I hope that clears a few things up. > > > > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56) > > > > No, in that blog post the user had a broken __unicode__ method in their > > model, it wasn't actually an admin problem. > > > > Karen > > > FYI when you want to open a file with a specific encoding but deal with it as unicode the python codecs library is great: http://docs.python.org/library/codecs.html#codecs.open Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire "The people's good is the highest law."--Cicero --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe
Re: external non-ascii character is breaking my script
Thank you very much. That solved it, gave me all the information I needed to not make the same mistake again, and taught me a quick way to check the encoding of strings in python. As it happen, in this case, the script that generates the external file is some commercial software, so I can't touch it. It all seems to work though. Thanks again, RM On Feb 8, 12:57 am, Karen Tracey wrote: > On Sat, Feb 7, 2009 at 7:27 PM, redmonkey > wrote: > > > > > > > Sure, here's a bit more info. > > > The external data is generated by a script and it describes a > > catalogue of lot items for an auction site I'm building. The format > > includes a lot number, a brief description of the lot for sale, and an > > estimate for the item. Each lot is separated in the file by a '$' with > > some whitespace. Here's a snippet: > > > $ > > 292 A collection of wine bottle and trinket boxes > > Est. 30-60 > > $ > > 293 A paper maché letter rack with painted foliate decoration and a > > C19th papier mache side chair and one other (a/f) > > Est. 20-30 > > $ > > 294 A wall mirror with bevelled plate within gilt frame > > Est. 40-60 > > And this file is encoded in...? It doesn't appear to be utf-8. It may be > iso8859-1. > > [snip] > > > > > > > And here's that handle_data_upload function (it's passed the uploaded > > file object): > > > def handle_data_upload(f, cat): > > """ > > Creates and Adds lots to catalogue. > > > """ > > > lot = re.compile(r'\s*(?P\d*) (?P.*) > > \s*Est. (?P\d*)-(?P\d*)') > > iterator = lot.finditer(f.read()) > > f.close() > > > for item in iterator: > > if not item.group('description') == "end": > > Lot.objects.create( > > lot_number=int(item.group('lot_number')), > > description=item.group('description').strip(), > > Here you are setting description to a bytestring read from your file. When > you don't pass Unicode to Django, Django will convert to unicode assuming a > utf-8 encoding, which will cause the error you are getting if the file is > not in fact using utf-8 as the encoding. I suspect your file is encoded in > iso8859-1, in which case changing this line to: > > description=unicode(item.group('description').strip(), 'iso8859-1') > > Will probably fix the problem. But, you should verify that that is the > encoding used by whatever is creating the file, and if possible you might > want to change whatever is creating the file to use utf-8 for the encoding, > if possible (and if these files aren't fed into other processes that might > get confused by changing their encoding). > > [snip] > > File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in > > > force_unicode > > 70. raise DjangoUnicodeDecodeError(s, *e.args) > > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/ > > add/ > > Exception Value: 'utf8' codec can't decode bytes in position 12-14: > > invalid data. You passed in 'A paper mach\xe9 letter rack with painted > > foliate decoration and a C19th papier mache side chair and one other > > (a/f)' () > > This is why I think your file is using iso889-1: > > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information.>>> s = > 'A paper mach\xe9 letter rack' > >>> print unicode(s, 'utf-8') > > Traceback (most recent call last): > File "", line 1, in > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14: > invalid data>>> print unicode(s, 'iso8859-1') > > A paper maché letter rack > > > > The one that causes the error is what Django does when handed a bytestring, > and matches what you are seeing. Using iso8859-1 as the encoding makes the > value convert and print properly (plus it's a popular encoding). > > > > > I hope that clears a few things up. > > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56) > > No, in that blog post the user had a broken __unicode__ method in their > model, it wasn't actually an admin problem. > > Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: external non-ascii character is breaking my script
On Sat, Feb 7, 2009 at 7:27 PM, redmonkey wrote: > > Sure, here's a bit more info. > > The external data is generated by a script and it describes a > catalogue of lot items for an auction site I'm building. The format > includes a lot number, a brief description of the lot for sale, and an > estimate for the item. Each lot is separated in the file by a '$' with > some whitespace. Here's a snippet: > > $ > 292 A collection of wine bottle and trinket boxes > Est. 30-60 > $ > 293 A paper maché letter rack with painted foliate decoration and a > C19th papier mache side chair and one other (a/f) > Est. 20-30 > $ > 294 A wall mirror with bevelled plate within gilt frame > Est. 40-60 > And this file is encoded in...? It doesn't appear to be utf-8. It may be iso8859-1. [snip] > > And here's that handle_data_upload function (it's passed the uploaded > file object): > > def handle_data_upload(f, cat): >""" >Creates and Adds lots to catalogue. > >""" > >lot = re.compile(r'\s*(?P\d*) (?P.*) > \s*Est. (?P\d*)-(?P\d*)') >iterator = lot.finditer(f.read()) >f.close() > >for item in iterator: >if not item.group('description') == "end": >Lot.objects.create( >lot_number=int(item.group('lot_number')), >description=item.group('description').strip(), Here you are setting description to a bytestring read from your file. When you don't pass Unicode to Django, Django will convert to unicode assuming a utf-8 encoding, which will cause the error you are getting if the file is not in fact using utf-8 as the encoding. I suspect your file is encoded in iso8859-1, in which case changing this line to: description=unicode(item.group('description').strip(), 'iso8859-1') Will probably fix the problem. But, you should verify that that is the encoding used by whatever is creating the file, and if possible you might want to change whatever is creating the file to use utf-8 for the encoding, if possible (and if these files aren't fed into other processes that might get confused by changing their encoding). [snip] File "/Library/Python/2.5/site-packages/django/utils/encoding.py" in > force_unicode > 70. raise DjangoUnicodeDecodeError(s, *e.args) > > Exception Type: DjangoUnicodeDecodeError at /admin/catalogue/catalogue/ > add/ > Exception Value: 'utf8' codec can't decode bytes in position 12-14: > invalid data. You passed in 'A paper mach\xe9 letter rack with painted > foliate decoration and a C19th papier mache side chair and one other > (a/f)' () This is why I think your file is using iso889-1: Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = 'A paper mach\xe9 letter rack' >>> print unicode(s, 'utf-8') Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode bytes in position 12-14: invalid data >>> print unicode(s, 'iso8859-1') A paper maché letter rack >>> The one that causes the error is what Django does when handed a bytestring, and matches what you are seeing. Using iso8859-1 as the encoding makes the value convert and print properly (plus it's a popular encoding). > > I hope that clears a few things up. > > Is this an admin thing? (http://www.factory-h.com/blog/?p=56) > No, in that blog post the user had a broken __unicode__ method in their model, it wasn't actually an admin problem. Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: external non-ascii character is breaking my script
Sure, here's a bit more info. The external data is generated by a script and it describes a catalogue of lot items for an auction site I'm building. The format includes a lot number, a brief description of the lot for sale, and an estimate for the item. Each lot is separated in the file by a '$' with some whitespace. Here's a snippet: $ 292 A collection of wine bottle and trinket boxes Est. 30-60 $ 293 A paper maché letter rack with painted foliate decoration and a C19th papier mache side chair and one other (a/f) Est. 20-30 $ 294 A wall mirror with bevelled plate within gilt frame Est. 40-60 I've got a regular expression to extract out all the bits I need from the external file: lot = re.compile(r'\s*(?P\d*) (?P.*)\s*Est. (? P\d*)-(?P\d*)') This information is extracted when this file is submitted to an 'add new catalogue' form in Django's admin interface: class CatalogueAdmin(admin.ModelAdmin): # ... def save_model(self, request, obj, form, change): """ Check for an attached data file, if instance is being created, also creates models within data file. """ obj.save() if not change and form.cleaned_data['data']: # Creating a new catalogue handle_data_upload(form.cleaned_data['data'], obj) And here's that handle_data_upload function (it's passed the uploaded file object): def handle_data_upload(f, cat): """ Creates and Adds lots to catalogue. """ lot = re.compile(r'\s*(?P\d*) (?P.*) \s*Est. (?P\d*)-(?P\d*)') iterator = lot.finditer(f.read()) f.close() for item in iterator: if not item.group('description') == "end": Lot.objects.create( lot_number=int(item.group('lot_number')), description=item.group('description').strip(), min_estimate=Decimal(item.group('min_estimate')), max_estimate=Decimal(item.group('max_estimate')), catalogue=cat ) Again, this all seems to work fine until Django come across the "é" in the external data when it decides to throw this error: Environment: Request Method: POST Request URL: http://192.168.0.2:8000/admin/catalogue/catalogue/add/ Django Version: 1.1 pre-alpha SVN-9646 Python Version: 2.5.1 Installed Applications: ['django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', 'django.contrib.admin', 'auction.catalogue', 'auction.mailouts', 'auction.users', 'auction.bidding', 'django.contrib.flatpages', 'profiles', 'registration', 'django_extensions', 'tinymce', 'auction.lot-alerts'] Installed Middleware: ('django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware') Traceback: File "/Library/Python/2.5/site-packages/django/core/handlers/base.py" in get_response 86. response = callback(request, *callback_args, **callback_kwargs) File "/Library/Python/2.5/site-packages/django/contrib/admin/sites.py" in root 157. return self.model_page(request, *url.split('/', 2)) File "/Library/Python/2.5/site-packages/django/views/decorators/ cache.py" in _wrapped_view_func 44. response = view_func(request, *args, **kwargs) File "/Library/Python/2.5/site-packages/django/contrib/admin/sites.py" in model_page 176. return admin_obj(request, rest_of_url) File "/Library/Python/2.5/site-packages/django/contrib/admin/ options.py" in __call__ 191. return self.add_view(request) File "/Library/Python/2.5/site-packages/django/db/transaction.py" in _commit_on_success 238. res = func(*args, **kw) File "/Library/Python/2.5/site-packages/django/contrib/admin/ options.py" in add_view 494. self.save_model(request, new_object, form, change=False) File "/Library/Python/2.5/site-packages/auction/catalogue/admin.py" in save_model 34. handle_data_upload(form.cleaned_data['data'], obj) File "/Library/Python/2.5/site-packages/auction/catalogue/utils.py" in handle_data_upload 27. catalogue=cat File "/Library/Python/2.5/site-packages/django/db/models/manager.py" in create 99. return self.get_query_set().create(**kwargs) File "/Library/Python/2.5/site-packages/django/db/models/query.py" in create 319. obj.save(force_insert=True) File "/Library/Python/2.5/site-packages/auction/catalogue/models.py" in save 170. super(Lot, self).save(kwargs) File "/Library/Python/2.5/site-packages/django/db/models/base.py" in save 328. self.save_base(force_insert=force_insert, force_update=force_update) File "/Library/Python/2.5/site-packages/django/db/models/base.py" in save_base 400. result = manager._insert(values, return_id=update_pk) File "/Library/Python/2.5/site-packages/d
Re: external non-ascii character is breaking my script
On Sat, Feb 7, 2009 at 11:56 AM, redmonkey wrote: > > Hey everyone, > > I'm trying to create django model instances from data stored in a flat > text file but I get `force_unicode` errors when the script comes > across one of the data items containing the "é" character. > > Can anyone explain to me what this problem is and now I can fix it? I > can't really get my head around unicode, ascii and UTF-8 stuff. > > If you expect anyone on the list to help, you really need to share some snippets of the code you are using to read the data from the file and create Django model instances, plus the full traceback you get when it runs into trouble. Django certainly handles unicode data in models, so there's something specific about what you are doing that is causing a problem. Karen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
external non-ascii character is breaking my script
Hey everyone, I'm trying to create django model instances from data stored in a flat text file but I get `force_unicode` errors when the script comes across one of the data items containing the "é" character. Can anyone explain to me what this problem is and now I can fix it? I can't really get my head around unicode, ascii and UTF-8 stuff. Thanks, RedMonkey --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---