ID:               43896
 Updated by:       [EMAIL PROTECTED]
 Reported By:      arnaud dot lb at gmail dot com
 Status:           Critical
 Bug Type:         Strings related
 Operating System: *
 PHP Version:      5.2CVS, 5.3CVS (2008-07-15)
 New Comment:

I even don't think this is a valid bug in the first place. You passed a

string that is encoded in ISO-8859-15 to htmlspecialchars() while 
specifying UTF-8 to force the string to be treated as "UTF-8". One 
should never depend on the past wrond behaviour with which invalid byte

sequences pass through. Besides, you can always work around it by
giving 
ISO-8859-15 to the third argument.






Previous Comments:
------------------------------------------------------------------------

[2008-06-27 17:32:43] sillyxone at yaoo dot com

  is also affected in 5.2, for example:

$str = 'Hello' . chr(160) . 'there';
print(htmlentities($str, ENT_COMPAT, 'UTF-8'));

Instead of printing "Hello there", it prints nothing (empty string).
The same for htmlspecialchars().

Both functions work fine in 5.1

------------------------------------------------------------------------

[2008-05-05 21:00:37] heurika at gmail dot com

Hi,
I've got the same Bug, posted on #43740.
Please fix it.

Thanks!

------------------------------------------------------------------------

[2008-02-17 13:25:22] andreas dot ravnestad at gmail dot com

This seems to be breaking PEAR::Text_Wiki completely when using UTF-8:
http://pear.php.net/bugs/bug.php?id=13136

------------------------------------------------------------------------

[2008-01-24 20:51:11] tallyce at gmail dot com

See also bugs 43294 and 43549 which seem to be the same thing.

This is really starting to bite now. Please can this be fixed, or
suggest how we can reliably process incoming user data in UTF8 given
this behaviour change!

------------------------------------------------------------------------

[2008-01-24 12:29:58] arnaud dot lb at gmail dot com

I made a patch for this bug:

http://s3.amazonaws.com/arnaud.lb/php_htmlentities_utf.patch

The internal get_next_char() function returns a status of FAILURE 
when it encounters a invalid or incomplete sequence, which causes 
the htmlspecialchars and htmlentities functions to return an empty 
string.

This patch modify the behavior of these functions to skip invalid 
sequences, without discarding the whole string. This involves a very 
few changes and makes the behavior of theses functions more 
consistent with previous PHP versions.

It also adds a few tests to htmlentities-utf.phpt.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/43896

-- 
Edit this bug report at http://bugs.php.net/?id=43896&edit=1

Reply via email to