Juliusz Chroboczek writes: > > The big problem that you face is short sequences of extended Shift > > JIS, Big 5, and Windows-125x that are mostly ASCII. That sounds a lot > > like a typical email message with correctly spelled name and/or .sig > > to me. > > Please exhibit a sentence in a natural language encoded in one of these > encodings that decodes as proper UTF-8, or forever keep your peace.
That may not be possible. The examples I've seen involved multiple languages, English + something else. I don't have one to hand, and don't have time to look for or generate one. They're rare -- except from the point of view of a person who happens to have such a name or use such a .sig. In any case, I'm not asking that non-UTF-8 encodings be decoded *at all* (that's Eric's suggestion, and code is already available in Darcs it would seem), and certainly not that Darcs try to detect non-UTF-8 encodings that masquerade as UTF-8. Only that when something does not decode as proper UTF-8 (including the "uniquely encoded with the minimum number of octets" condition) that the user be warned. Anything less *is* "punishing the users," and will hurt Darcs. _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
