#559: BibUpload: Cannot bibupload file containing UTF-8 chars
------------------------+----------------------------
Reporter: grfavre | Owner:
Type: defect | Status: infoneeded_new
Priority: critical | Milestone: v1.0
Component: BibUpload | Version:
Resolution: | Keywords:
------------------------+----------------------------
Changes (by jcaffaro):
* status: new => infoneeded_new
Comment:
The exception seems really to take place in BibRecord, which lives in ...
BibEdit ;-)
In any case the buggy.xml file works for me, both with:
{{{
$ ./bibupload -ri buggy.xml
}}}
and with:
{{{
>> from invenio.bibrecord import create_records
>> my_records = create_records(file('buggy.xml').read())
>> print my_records[1][0]['245']
[([('a', 'Oxymoron, un tr\xc3\xa9sor de fiches de lecture et un atelier de
mutualisation des savoirs entre apprenants/chercheurs')], ' ', ' ', '',
8)]
}}}
both with 4suite and pyRXP. In Greg's case the BibUpload insertion fails
with the two parsers (we are both with Python 2.6.X. I am running latest
Git master, while Greg is with RC0 AFAIK, but bibrecord.py does not seem
to have changed in between).
Greg, what do you get as result when running the second case above? Can
you retry by downloading the file attached to the ticket (in case the
encoding got changed in some way during the upload..)?
This reminds me of a behaviour encountered with the minidom parser, which
by default does return unicode strings when "printed" instead of encoded
byte strings, resulting in similar issue.
BTW, what do you get with:
{{{
>> import sys
>> sys.getdefaultencoding()
'ascii'
}}}
--
Ticket URL: <http://invenio-software.org/ticket/559#comment:2>
Invenio <http://invenio-software.org>