Hi, Eric Kow <[email protected]> writes: >> No. There's no need to tag. >> >> UTF-8 can be detected automatically with 100% certainty in practice. If >> a string correctly decodes as UTF-8, then it's most certainly UTF-8. > > Could you explain in a bit more detail why this is the case? > > Are you saying that the probability of funny characters occurring only > within UTF-8 compatible sequences like 110xxxxx 10xxxxx is just so > absurdly low (especially in practice) that we can get away with > autodetection?
> Come to think of it, what's the harm if we mistakenly detect something > as UTF-8 from time to time? Maybe it's no worse than what's we already > do... if it's worth anything, I have done some utf8-detection based on a strict utf8 decoder and I haven't seen a false positive yet (false negatives are impossible by nature of the test). I am mildly in favour of auto-detecting utf8 (although I probably haven't done enough research myself to put up a strong point). Yours, Petr. _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
