On Tuesday, 20 December 2016 22:54:03 UTC+11, Sayth Renshaw wrote: > Hi > > I have been trying to get a script to work on windows that works on mint. The > key blocker has been utf8 errors, most of which I have solved. > > Now however the last error I am trying to overcome, the solution appears to > be to use the .decode('windows-1252') to correct an ascii error. > > I am using lxml to read my content and decode is not supported are there any > known ways to read with lxml and fix unicode faults? > > The key part of my script is > > for content in roots: > utf8_parser = etree.XMLParser(encoding='utf-8') > fix_ascii = utf8_parser.decode('windows-1252') > mytree = etree.fromstring( > content.read().encode('utf-8'), parser=fix_ascii) > > Without the added .decode my code looks like > > for content in roots: > utf8_parser = etree.XMLParser(encoding='utf-8') > mytree = etree.fromstring( > content.read().encode('utf-8'), parser=utf8_parser) > > However doing it in such a fashion returns this error: > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: > invalid start byte > Which I found this SO for http://stackoverflow.com/a/29217546/461887 but > cannot seem to implement with lxml. > > Ideas? > > Sayth
Why is windows so hard. Sort of running out of ideas, tried methods in the docs SO etc. Currently for xml_data in roots: parser_xml = etree.XMLParser() mytree = etree.parse(xml_data, parser_xml) Returns C:\Users\Sayth\Anaconda3\envs\race\python.exe C:/Users/Sayth/PycharmProjects/bs4race/race.py data/ -e *.xml Traceback (most recent call last): File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 100, in <module> data_attr(rootObs) File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 55, in data_attr mytree = etree.parse(xml_data, parser_xml) File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src\lxml\lxml.etree.c:81110) File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:118109) File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src\lxml\lxml.etree.c:118392) File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src\lxml\lxml.etree.c:117180) File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src\lxml\lxml.etree.c:111907) File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:105102) File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:106769) File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:12074) File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src\lxml\lxml.etree.c:102431) io.UnsupportedOperation: read Process finished with exit code 1 Thoughts? Sayth -- https://mail.python.org/mailman/listinfo/python-list