Sayth Renshaw wrote: > On Tuesday, 20 December 2016 22:54:03 UTC+11, Sayth Renshaw wrote: >> Hi >> >> I have been trying to get a script to work on windows that works on mint. >> The key blocker has been utf8 errors, most of which I have solved. >> >> Now however the last error I am trying to overcome, the solution appears >> to be to use the .decode('windows-1252') to correct an ascii error. >> >> I am using lxml to read my content and decode is not supported are there >> any known ways to read with lxml and fix unicode faults? >> >> The key part of my script is >> >> for content in roots: >> utf8_parser = etree.XMLParser(encoding='utf-8') >> fix_ascii = utf8_parser.decode('windows-1252') >> mytree = etree.fromstring( >> content.read().encode('utf-8'), parser=fix_ascii) >> >> Without the added .decode my code looks like >> >> for content in roots: >> utf8_parser = etree.XMLParser(encoding='utf-8') >> mytree = etree.fromstring( >> content.read().encode('utf-8'), parser=utf8_parser) >> >> However doing it in such a fashion returns this error: >> >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: >> invalid start byte Which I found this SO for >> http://stackoverflow.com/a/29217546/461887 but cannot seem to implement >> with lxml. >> >> Ideas? >> >> Sayth > > Why is windows so hard.
I don't think this has anything to do with the OS. Your lxml_data is probably not what you think it is. Compare: $ python3 Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> import lxml.etree >>> lxml.etree.parse(sys.stdout) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955) File "parser.pxi", line 1769, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102257) File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:102516) File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:101442) File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:97069) File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275) File "parser.pxi", line 679, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92426) File "lxml.etree.pyx", line 327, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:10196) File "parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:89083) io.UnsupportedOperation: not readable That looks similar to what you get. > Sort of running out of ideas, tried methods in the > docs SO etc. > > Currently > > for xml_data in roots: > parser_xml = etree.XMLParser() > mytree = etree.parse(xml_data, parser_xml) > > Returns > C:\Users\Sayth\Anaconda3\envs\race\python.exe > C:/Users/Sayth/PycharmProjects/bs4race/race.py data/ -e *.xml Traceback > (most recent call last): > File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 100, in > <module> > data_attr(rootObs) > File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 55, in > data_attr > mytree = etree.parse(xml_data, parser_xml) > File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse > (src\lxml\lxml.etree.c:81110) File "src/lxml/parser.pxi", line 1832, in > lxml.etree._parseDocument (src\lxml\lxml.etree.c:118109) File > "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument > (src\lxml\lxml.etree.c:118392) File "src/lxml/parser.pxi", line 1747, in > lxml.etree._parseDocFromFilelike (src\lxml\lxml.etree.c:117180) File > "src/lxml/parser.pxi", line 1162, in > lxml.etree._BaseParser._parseDocFromFilelike > (src\lxml\lxml.etree.c:111907) File "src/lxml/parser.pxi", line 595, in > lxml.etree._ParserContext._handleParseResultDoc > (src\lxml\lxml.etree.c:105102) File "src/lxml/parser.pxi", line 702, in > lxml.etree._handleParseResult (src\lxml\lxml.etree.c:106769) File > "src/lxml/lxml.etree.pyx", line 324, in > lxml.etree._ExceptionContext._raise_if_stored > (src\lxml\lxml.etree.c:12074) File "src/lxml/parser.pxi", line 373, in > lxml.etree._FileReaderContext.copyToBuffer > (src\lxml\lxml.etree.c:102431) > io.UnsupportedOperation: read > > Process finished with exit code 1 > > Thoughts? > > Sayth -- https://mail.python.org/mailman/listinfo/python-list