Peter Brodersen wrote: > http://php.net/xml also documents this replacement: > == > If PHP encounters characters in the parsed XML document that can not be > represented in the chosen target encoding, the problem characters will be > "demoted". Currently, this means that such characters are replaced by a > question mark. > ==
That was back in the expat days. We don't use that xml parser anymore. > http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt mentions: > == > According to ISO 10646-1:2000, sections D.7 and 2.3c, a device > receiving UTF-8 shall interpret a "malformed sequence in the same way > that it interprets a character that is outside the adopted subset" and > "characters that are not within the adopted subset shall be indicated > to the user" by a receiving device. A quite commonly used approach in > UTF-8 decoders is to replace any malformed UTF-8 sequence by a > replacement character (U+FFFD), which looks a bit like an inverted > question mark, or a similar symbol. It might be a good idea to > visually distinguish a malformed UTF-8 sequence from a correctly > encoded Unicode character that is just not available in the current > font but otherwise fully legal, even though ISO 10646-1 doesn't > mandate this. In any case, just ignoring malformed sequences or > unavailable characters does not conform to ISO 10646, will make > debugging more difficult, and can lead to user confusion. > == That part is completely different. That's at the display level. Replacing it in the backend makes no sense to me. Don't use utf8_decode. Use iconv() so you know what the heck is going on. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php