Package: php-htmlpurifier
Version: 4.3.0+dfsg1-1
Severity: important

We use HTML Purifier to clean up HTML mails from customers before displaying
then. Under certain circumstances an ISO-8859-1 HTML string is cut off in the
middle. The following scripts reproduces the problem:


require_once "HTMLPurifier.auto.php";

$in = "€".str_repeat(".", 50000);

$cfg = HTMLPurifier_Config::createDefault();
$cfg->set("Core.Encoding", "iso-8859-1");
$purifier = new HTMLPurifier($cfg);
$out = $purifier->purify($in);

echo "in: ".strlen($in)."<br>";
echo "out: ".strlen($out)."<br>";
echo $out;


Output:
in: 50007
out: 8159
................... [...]


Expected Output:
in: 50007
out: 50007
[Euro symbol]............ [...]


The problem does not occur with encoding set to UTF-8. Unfortunately we cannot
just convert the encoding as the encoding is also declared in the HTML header
of the input string.


-- System Information:
Debian Release: 6.0.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-xen-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages php-htmlpurifier depends on:
ii  php5                    5.3.3-7+squeeze1 server-side, HTML-embedded scripti

Versions of packages php-htmlpurifier recommends:
ii  php5-cli                5.3.3-7+squeeze1 command-line interpreter for the p

php-htmlpurifier suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to