On Mon, 28 Jan 2008 17:26:48 -0800, in php.internals [EMAIL PROTECTED] (Rasmus Lerdorf) wrote:
>> On the other hand utf8_decode() also expects the input to be UTF-8 >> encoded, but it replaces incomplete sequences with the character "?". > >utf8_decode() doesn't replace invalid chars with a ? > >eg. > >php -r '$a="abcd".chr(0xE0);echo >iconv("utf-8","utf-8",$a)."\n".utf8_decode($a);' | od -t x1 > >0000000 61 62 63 64 0a 61 62 63 64 03 Yes it does, but not in your case :-) However: $ php -r '$a="abcd".chr(0xE0)."e"; echo iconv("utf-8","utf-8",$a)."\n".utf8_decode($a);'|hd 00000000 61 62 63 64 0a 61 62 63 64 3f |abcd.abcd?| $ php -r 'print utf8_decode("Fløde på æblegrød");' Fl?p?blegr? >It would be a horrendously bad idea to replace invalid chars with some >other valid char. Way worse than returning nothing. Think about what >would happen in a regex, for example, if a user was able to inject a '?' >by sending an invalid utf-8 sequence that ends up in a regular expression. I don't disagree with you and I have thought of the same issue (although I suppose any sanitation should happen after any given conversion; other charsets than utf-8 might be able to encode lowbits such as "?" as well - but this is beside the point) I'm not fond of the "?" feature as well, but it is present in utf8_decode() and other non-php applications with utf-8 conversion. My guess is still that some standard recommends this conversion as a possible fallback for error handling. -- - Peter Brodersen -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php