On 03/13/2011 01:25 PM, ZY Zhou wrote:
but I think that it's completely unreasonable to expect
>  all of the string-based and/or range-based functions to be able to handle
>  invalid unicode.
As I explained in the first mail, if utf8 parser convert all invalid utf8 chars 
to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want

string s = "\xa0";
foreach(dchar d; s) {
   if (isValidUnicode(d)) {
     process(d);
   } else {
     handleError(d);
   }
}

PS: You are free to preprocess the source if you like it, and convert invalid parts into whatever you like. But instead of surrogates, you'd rather use one of the freely usable ranges of values; or use 0 maybe (so that output won't be disturbed); or better the code point intended for "un-representable" thingie, that all fonts would correctly interpret (and usually display as an inverse video '?').

Denis
--
_________________
vita es estrany
spir.wikidot.com

Reply via email to