Control: retitle -1 libhtml-html5-parser-perl: UTF-8 character breaks parse_file
As a consequence of this bug, html2xhtml doesn't work at all when applied on a file. No problems when the HTML document is provided in the standard input, though. For instance, with test.html as: <!DOCTYPE html> <html><body><p>Test €</p></body></html> I get: $ html2xhtml test.html <?xml version="1.0" encoding="windows-1252"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html> $ html2xhtml < test.html <?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body><p>Test €</p> </body></html> and with test.html as: <!DOCTYPE html> <html><body><p>Test é</p></body></html> $ html2xhtml test.html <?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body><p>Test �</p> </body></html> $ html2xhtml < test.html <?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body><p>Test é</p> </body></html> parse_file is used in the former test (like in my original bug report), and parse_string is used in the latter test. Thus it seems that's parse_file that is broken. Hence the retitle. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org