Control: retitle -1 libhtml-html5-parser-perl: UTF-8 character breaks parse_file

As a consequence of this bug, html2xhtml doesn't work at all when
applied on a file. No problems when the HTML document is provided
in the standard input, though. For instance, with test.html as:

<!DOCTYPE html>
<html><body><p>Test €</p></body></html>

I get:

$ html2xhtml test.html
<?xml version="1.0" encoding="windows-1252"?>
<html xmlns="http://www.w3.org/1999/xhtml";><head/><body/></html>

$ html2xhtml < test.html
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml";><head/><body><p>Test €</p>
</body></html>

and with test.html as:

<!DOCTYPE html>
<html><body><p>Test é</p></body></html>

$ html2xhtml test.html
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml";><head/><body><p>Test �</p>
</body></html>

$ html2xhtml < test.html
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml";><head/><body><p>Test é</p>
</body></html>

parse_file is used in the former test (like in my original bug report),
and parse_string is used in the latter test. Thus it seems that's
parse_file that is broken. Hence the retitle.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to