Your message dated Fri, 24 Sep 2010 15:47:06 +0000
with message-id <[email protected]>
and subject line Bug#374605: fixed in libhtml-tree-perl 4.0-1
has caused the Debian Bug report #374605,
regarding HTML::TreeBuilder doesn't properly set utf8 flag when parsing from a
file
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
374605: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=374605
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Subject: libhtml-format-perl: Problem with UTF8
Package: libhtml-format-perl
Version: 2.04-1
Severity: important
*** Please type your report below this line ***
I tried to get the text content of an UTF8 encoded HTML page.
with the following code:
<<
require HTML::TreeBuilder;
$tree = HTML::TreeBuilder->new->parse_file("test.html");
require HTML::FormatText;
$formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50);
print $formatter->format($tree);
A lots of characters with accents were destroyed during this text
manipulation.
The following line is in cause:
l. 191: $text =~ tr/\xA0\xAD/ /d;
The bug was already reported here one year ago:
http://rt.cpan.org/Public/Bug/Display.html?id=9700
But the code is always buggy.
Consequently, this package can not be used with multibyte charsets.
-- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.9-2-686
Locale: LANG=de_DE, LC_CTYPE=de_DE (charmap=ISO-8859-1)
Versions of packages libhtml-format-perl depends on:
ii libfont-afm-perl 1.19-1 Font::AFM - Interface to Adobe
Fon
ii libhtml-tree-perl 3.19.01-2 represent and create HTML
syntax t
ii perl 5.8.8-6 Larry Wall's Practical
Extraction
libhtml-format-perl recommends no packages.
-- no debconf information
--- End Message ---
--- Begin Message ---
Source: libhtml-tree-perl
Source-Version: 4.0-1
We believe that the bug you reported is fixed in the latest version of
libhtml-tree-perl, which is due to be installed in the Debian FTP archive:
libhtml-tree-perl_4.0-1.debian.tar.gz
to main/libh/libhtml-tree-perl/libhtml-tree-perl_4.0-1.debian.tar.gz
libhtml-tree-perl_4.0-1.dsc
to main/libh/libhtml-tree-perl/libhtml-tree-perl_4.0-1.dsc
libhtml-tree-perl_4.0-1_all.deb
to main/libh/libhtml-tree-perl/libhtml-tree-perl_4.0-1_all.deb
libhtml-tree-perl_4.0.orig.tar.gz
to main/libh/libhtml-tree-perl/libhtml-tree-perl_4.0.orig.tar.gz
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to [email protected],
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Krzysztof Krzyżaniak (eloy) <[email protected]> (supplier of updated
libhtml-tree-perl package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [email protected])
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Format: 1.8
Date: Fri, 24 Sep 2010 17:15:47 +0200
Source: libhtml-tree-perl
Binary: libhtml-tree-perl
Architecture: source all
Version: 4.0-1
Distribution: unstable
Urgency: low
Maintainer: Debian Perl Group <[email protected]>
Changed-By: Krzysztof Krzyżaniak (eloy) <[email protected]>
Description:
libhtml-tree-perl - Perl module to represent and create HTML syntax trees
Closes: 374605
Changes:
libhtml-tree-perl (4.0-1) unstable; urgency=low
.
* New upstream release, (closes: #374605)
* Update Standards-Version to 3.9.1 (no changes)
* Removed debian/patches/missing_close_tag.patch and
debian/patches/spelling.patch (fixed by upstream)
Checksums-Sha1:
ec507de5feec8c5437e76dc3caaf7e6818c8ee54 1441 libhtml-tree-perl_4.0-1.dsc
78689f1fd026f03432e886c6b4b49f7d4d89aa8b 129998
libhtml-tree-perl_4.0.orig.tar.gz
202426fddab878c492cc812802af520fa1c99a76 3715
libhtml-tree-perl_4.0-1.debian.tar.gz
7557dbe36366da052a0b7421d4e465fdb9bfa2a7 216092 libhtml-tree-perl_4.0-1_all.deb
Checksums-Sha256:
ef9177620c2863ab8d8954e854237286595dab8197bbe117d50ab26e29d8f56a 1441
libhtml-tree-perl_4.0-1.dsc
5caa72deab5aba4d8bf26a2b557f9ffce615a9af57112c0b551ebdcd4857552a 129998
libhtml-tree-perl_4.0.orig.tar.gz
49003cc7d62480674728f7ebbebd18ace6169ca70a8fbdfaff7f698f2f42c3ee 3715
libhtml-tree-perl_4.0-1.debian.tar.gz
dfe2d7ead314aa1caefef19e8dbc3ddb17dc69dfa4347e8cc52d85448f4a4301 216092
libhtml-tree-perl_4.0-1_all.deb
Files:
914b7f3cdbe9455e6c9eb943b69b0a65 1441 perl optional libhtml-tree-perl_4.0-1.dsc
7ba44995905a117c00f6744350799883 129998 perl optional
libhtml-tree-perl_4.0.orig.tar.gz
97c2ef711831f16d5efce762b8a98494 3715 perl optional
libhtml-tree-perl_4.0-1.debian.tar.gz
35efd17547ba88c2aeb3ce83d172cd2c 216092 perl optional
libhtml-tree-perl_4.0-1_all.deb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAkycxGcACgkQy+HP4f7iC8uURQCglk74aH+YqdGxzwTEgMZp1ZXk
G1oAoJVf+cIX01ADkS5Ni6GTSscQz1y2
=QYgx
-----END PGP SIGNATURE-----
--- End Message ---