Re: string need to be robust

spir Sun, 13 Mar 2011 05:56:12 -0700

On 03/13/2011 01:25 PM, ZY Zhou wrote:

but I think that it's completely unreasonable to expect
>  all of the string-based and/or range-based functions to be able to handle
>  invalid unicode.

As I explained in the first mail, if utf8 parser convert all invalid utf8 chars 
to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want


string s = "\xa0";
foreach(dchar d; s) {
   if (isValidUnicode(d)) {
     process(d);
   } else {
     handleError(d);
   }
}

PS: You are free to preprocess the source if you like it, and convert invalidparts into whatever you like. But instead of surrogates, you'd rather use oneof the freely usable ranges of values; or use 0 maybe (so that output won't bedisturbed); or better the code point intended for "un-representable" thingie,that all fonts would correctly interpret (and usually display as an inversevideo '?').


Denis
--
_________________
vita es estrany
spir.wikidot.com

Re: string need to be robust

Reply via email to