ID:               43549
 Updated by:       [EMAIL PROTECTED]
 Reported By:      mariusads at helpedia dot com
-Status:           Assigned
+Status:           Wont fix
 Bug Type:         Strings related
 Operating System: Redhat?,  Linux
 PHP Version:      5.2.5
 Assigned To:      stas
 New Comment:

As function seems to work as intended and there's other way for
sanitizing utf-8, I'm marking it as wontfix for now, unless any new info
arrives. 


Previous Comments:
------------------------------------------------------------------------

[2008-01-29 21:13:16] [EMAIL PROTECTED]

As I commented in that bug, assuming you are passing in that character
properly encoded, it will work.  Nothing in that bug report shows an
actual problem as you don't show the exact byte sequence you are passing
in.

------------------------------------------------------------------------

[2008-01-29 14:31:46] tallyce at gmail dot com

Thanks, but see
http://bugs.php.net/43294

which shows that the dagger character (and others) results in the whole
string disappearing, on some installations at least.

I thought the dagger character was a valid UTF8 string, or would a
submission of that character be considered "invalid input"?

------------------------------------------------------------------------

[2008-01-28 23:32:01] [EMAIL PROTECTED]

It comes down to what to do with invalid input.  We can't let invalid
UTF-8 through, because if you do, your site will be insecure.  Before
this fix, your site was actually open to XSS exploits since you were
spitting invalid UTF-8 chars out on a page marked as UTF-8 and that
confuses IE.

I suppose we could change htmlentities to just strip the invalid chars,
but from a security perspective that is typically not the right
approach.  You can strip the invalid utf-8 chars yourself with: 

  $str = iconv('utf-8','utf-8',$str);


------------------------------------------------------------------------

[2008-01-24 20:54:10] tallyce at gmail dot com

See also bugs 43294 and 43896 which seem to be the same thing.

This is really starting to bite now. Please can this be fixed, or
suggest how we can reliably process incoming user data in UTF8 given
this behaviour change!

I concur this seems to be installation specific and earlier than 5.2.5
as shown in bug 43294.

------------------------------------------------------------------------

[2008-01-14 08:36:21] s-beutel at gmx dot de

Hi,

I confirm the very same issue for PHP 5.2.1/Apache2/RedHat. 

- has nothing to do with the browser encoding or GET'ed/POST'ed
variables, since I simply convert a static string
- seems to be installation specific, since it runs perfectly on my
windows box (PHP 5.2.0)
- I have the idea - but no evidence yet - that it's an older issue: for
almost one year I tried to fix an issue with a tiny webshop which is an
outcome of this, and which some users have been complaining about every
now and then (obviously, without debugging or narrower information)

Example skript: http://sbeutel.sb.ohost.de/trans.php
Plain Text code: http://sbeutel.sb.ohost.de/trans.txt

It simply encodes the string aou_äöü with various settings, and
htmlentities($str,ENT_QUOTES,'utf-8'); spits out just nothing as soon as
non-ASCII characters (german umlauts, in this case) are contained in the
string.

Hope this helps. Contact me if I may provide more information.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/43549

-- 
Edit this bug report at http://bugs.php.net/?id=43549&edit=1

Reply via email to