Markus Kuhn <[EMAIL PROTECTED]>: > > > Not much good if you're not converting to UTF-16. > > > > Well, it works with UCS-4 as well (but I would use a private area for > > this kind of stuff until it's generally accepted practice to do such > > hacks with surrogates). > > No, this way, you would loose transparency for private area characters. > If you do in-band signalling of UTF-8 errors in UCS-4, then you must > only use characters, which are forbidden to be encoded in UTF-8 anyway, > and these are only the surrogates plus U+FFFE and U+FFFF. So what should mbtowc(&wc, "\xED\xB2\x80", 3) return? With the libutf8_plug I have here it returns 3 and sets wc to 0xDC80. I really don't like the idea of a UTF-8 decoder having to know about surrogates which have nothing to do with UTF-8. If that sort of thing starts being imposed, I start to wonder whether Unicode really is too complex to be secure ... Edmund - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
- Substituting malformed UTF-8 sequences in a decoder Markus Kuhn
- Re: Substituting malformed UTF-8 sequences in a ... Edmund GRIMLEY EVANS
- Re: Substituting malformed UTF-8 sequences in a ... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences in a ... Edmund GRIMLEY EVANS
- Re: Substituting malformed UTF-8 sequences i... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences in a ... Bram Moolenaar
- Re: Displaying malformed UTF-8 sequences in ... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences in a ... Florian Weimer
- Re: Substituting malformed UTF-8 sequences in a ... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences i... Florian Weimer
- Re: Substituting malformed UTF-8 sequences in a ... Edmund GRIMLEY EVANS
- Re: Substituting malformed UTF-8 sequences i... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences in a ... Bruno Haible
- Re: Substituting malformed UTF-8 sequences in a ... David Starner
- Re: Substituting malformed UTF-8 sequences in a ... Markus Kuhn
- Re: Substituting malformed UTF-8 sequences i... Edmund GRIMLEY EVANS
- Re: Substituting malformed UTF-8 sequences in a ... Bruno Haible