Hi 

I have been trying to get a script to work on windows that works on mint. The 
key blocker has been utf8 errors, most of which I have solved.

Now however the last error I am trying to overcome, the solution appears to be 
to use the .decode('windows-1252') to correct an ascii error.

I am using lxml to read my content and decode is not supported are there any 
known ways to read with lxml and fix unicode faults?

The key part of my script is 

        for content in roots:
            utf8_parser = etree.XMLParser(encoding='utf-8')
            fix_ascii = utf8_parser.decode('windows-1252')
            mytree = etree.fromstring(
                content.read().encode('utf-8'), parser=fix_ascii)

Without the added .decode my code looks like

        for content in roots:
            utf8_parser = etree.XMLParser(encoding='utf-8')
            mytree = etree.fromstring(
                content.read().encode('utf-8'), parser=utf8_parser)

However doing it in such a fashion returns this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid 
start byte
Which I found this SO for http://stackoverflow.com/a/29217546/461887 but cannot 
seem to implement with lxml.

Ideas?

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to