Your message dated Wed, 08 Sep 2021 22:19:10 +0000
with message-id <e1mo5uq-0002al...@fasolo.debian.org>
and subject line Bug#750946: fixed in libhtml-html5-parser-perl 0.992-1
has caused the Debian Bug report #750946,
regarding libhtml-html5-parser-perl: UTF-8 character breaks parse_file
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
750946: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750946
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: libhtml-html5-parser-perl
Version: 0.301-1
Severity: important

(with possible data loss as a consequence)

Consider the following HTML file:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>title</title>
  </head>
  <body>
    <p>↓</p>
  </body>
</html>

On this file, the following script

#!/usr/bin/env perl

use strict;
use HTML::HTML5::Parser;

use utf8;                            # for the characters in the script.
use open ':encoding(UTF-8)';         # for the file arguments.
binmode STDIN, ':encoding(UTF-8)';   # for stdin.
binmode STDOUT, ':encoding(UTF-8)';  # for stdout.

@ARGV == 1 or die "Usage: $0 <file.html>\n";

my $parser = HTML::HTML5::Parser->new;
my $doc = $parser->parse_file($ARGV[0]);
print "Charset: '", $parser->charset($doc), "'\n";
print $doc->toString();

outputs:

Charset: ''
<?xml version="1.0" encoding="windows-1252"?>
<html xmlns="http://www.w3.org/1999/xhtml";><head/><body/></html>

If I replace the ↓ (U+2193 DOWNWARDS ARROW) by é (U+00E9 LATIN SMALL
LETTER E WITH ACUTE), then I get:

Charset: 'utf-8'
<?xml version="1.0" encoding="utf-8"?>
<!--?xml version="1.0" encoding="utf-8"?-->
<html xmlns="http://www.w3.org/1999/xhtml"; 
xmlns="http://www.w3.org/1999/xhtml";><head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
    <title>title</title>
  </head>
  <body>
    <p>�</p>
  

</body></html>

which is also incorrect, but at least the charset is correct.

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.11-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libhtml-html5-parser-perl depends on:
ii  libhtml-html5-entities-perl       0.003-2
ii  libio-html-perl                   1.00-1
ii  libtry-tiny-perl                  0.22-1
ii  liburi-perl                       1.60-1
ii  libxml-libxml-perl                2.0116+dfsg-1
ii  perl                              5.18.2-4
ii  perl-modules [libhttp-tiny-perl]  5.18.2-4

libhtml-html5-parser-perl recommends no packages.

Versions of packages libhtml-html5-parser-perl suggests:
pn  libxml-libxml-devel-setlinenumber-perl  <none>

-- no debconf information

--- End Message ---
--- Begin Message ---
Source: libhtml-html5-parser-perl
Source-Version: 0.992-1
Done: Jonas Smedegaard <d...@jones.dk>

We believe that the bug you reported is fixed in the latest version of
libhtml-html5-parser-perl, which is due to be installed in the Debian FTP 
archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 750...@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Jonas Smedegaard <d...@jones.dk> (supplier of updated libhtml-html5-parser-perl 
package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmas...@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Format: 1.8
Date: Wed, 08 Sep 2021 23:44:32 +0200
Source: libhtml-html5-parser-perl
Architecture: source
Version: 0.992-1
Distribution: unstable
Urgency: medium
Maintainer: Debian Perl Group <pkg-perl-maintain...@lists.alioth.debian.org>
Changed-By: Jonas Smedegaard <d...@jones.dk>
Closes: 750946
Changes:
 libhtml-html5-parser-perl (0.992-1) unstable; urgency=medium
 .
   [ upstream ]
   * new release(s)
     + fix encoding issues;
       closes: bug#750946
 .
   [ Salvatore Bonaccorso ]
   * update Vcs-* headers for switch to salsa.debian.org
 .
   [ gregor herrmann ]
   * use MetaCPAN URIs (not search.cpan.org or www.cpan.org)
   * use secure GitHub URIs
 .
   [ Debian Janitor ]
   * set upstream metadata fields:
     Bug-Database Repository
 .
   [ Jonas Smedegaard ]
   * simplify rules;
     stop build-depend on dh-buildinfo cdbs
   * annotate test-only build-dependencies
   * use debhelper compatibility level 13 (not 9);
     build-depend on debhelper-compat (not debhelper)
   * set Rules-Requires-Root: no
   * enable autopkgtest
   * update watch file:
     + use file format 4
     + mention gbp --uscan in usage comment
     + use substitution strings
   * simplify source helper script copyright-check
   * update copyright info:
     + stop track no longer embedded code
     + sort License sections alphabetically
     + update coverage
     + list GitHub issue tracker as preferred upstream contact
   * use semantic newlines in long description and copyright fields
   * build-depend on libtest-requires-perl
   * declare compliance with Debian Policy 4.6.0
Checksums-Sha1:
 61f35a7c9053b2736f538b1ea50c3a0e766cf4fd 2456 
libhtml-html5-parser-perl_0.992-1.dsc
 dcaafb6cec32bbaf511f5a0a6b6970b6a1532b7e 155983 
libhtml-html5-parser-perl_0.992.orig.tar.gz
 c61bb8631b9a4b2ce0955cfd5c727ddd2abb9e07 6844 
libhtml-html5-parser-perl_0.992-1.debian.tar.xz
 a116893c701afd35982c73f367eca64bc964f800 7171 
libhtml-html5-parser-perl_0.992-1_amd64.buildinfo
Checksums-Sha256:
 c7cb10ae7dc58b2a45dc5b28066653ab26899db75615dd33d2061aa8385ed365 2456 
libhtml-html5-parser-perl_0.992-1.dsc
 a184ca241caf97c57fd37f18e0fe686ef79cfe8eede7e31d93f3e636ed011169 155983 
libhtml-html5-parser-perl_0.992.orig.tar.gz
 2cc5984377eff31027f9e6152c5ddd5ddac41d311503590ae538b87980792e0b 6844 
libhtml-html5-parser-perl_0.992-1.debian.tar.xz
 191262278bdef7ca64baa55c03471beb747d27d969ee06eb3f5de8d0fe21fc64 7171 
libhtml-html5-parser-perl_0.992-1_amd64.buildinfo
Files:
 499caa0c0715078ac4a04f412eebf4b9 2456 perl optional 
libhtml-html5-parser-perl_0.992-1.dsc
 263a8051dae04296e2cb6cac1a0dd247 155983 perl optional 
libhtml-html5-parser-perl_0.992.orig.tar.gz
 3f2873d4880f12a321ce9b0329a4507f 6844 perl optional 
libhtml-html5-parser-perl_0.992-1.debian.tar.xz
 58fb21f6cb99915553bea0fec216ea1b 7171 perl optional 
libhtml-html5-parser-perl_0.992-1_amd64.buildinfo

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEn+Ppw2aRpp/1PMaELHwxRsGgASEFAmE5NBkACgkQLHwxRsGg
ASGKsA/8DHlelAVotvSTBzElFAGIoayIh5nGhYTdTZjgLF/imwdgkwmb9NWUmkXJ
lIEAmIA8kRa9prl9qxDcaTensYlDEPEUe+KY7WkkyizgjuoYuYLJ2rCn2S9HXLr0
eRLwbJM4H1MR8DFOmwe9h2k1IBuaquHAZBFSVjDVpU/fsGAWmoPiw2eIcYfZmG1n
BGyk8uB11hq0TTy22ToTNiDRzvOYvw0ImtaqRkhA4XFYjqCwjfGG5QhDtJ3WmnpC
yoeZVmFN/p1e2GZkTYEDH3vUQ9rrlGB9YbLVJvU6mp6aWaF3+ym1nWRQv+SJtChz
MgNiiw2VskDvREw/dK2F6wCc/n5gunPpVqeLdvBLL3rHCMNjdQn/+SLi4jt12H2X
LQ/KLKliuH9i2jJPhnL/C0HcpufJozZ33gZCkVKkqBMVcn5eDCwyP2zXbYaRUkB9
Oxu4ZBtz09MuTEBgURoancOinzzX/I/YSyPQeYr+1ntTfrDpeES8Z/8TdeXCbXDz
UDbPmsGcfCWpRo3jjSKd8Mb4BRdphVYvs/lsB5hdHU/fI/vmqcXGfMiHkUKQOVk9
X9NY9uACG2/xnKMtYAXibT2ESPI9pVMrN72H2h4D1Q0R6tspouXHoAm5UZqjo7pr
NPEid8639KJ/y4c7Hd/5RClGihQbzz5jfVKz3EpSbbf8L1xvv4U=
=6eP1
-----END PGP SIGNATURE-----

--- End Message ---

Reply via email to