> Our projects contains Unicode encoded files. If the "Unicode" is stored in a file, then is it UTF-16 in big-endian order, UTF-16 in little-endian order, or UTF-8?
(In principle, I suppose, someone could store it as big- or little-endian 32-bit values, but that seems pretty bogus). > Encoding these files in UTF-8 could be possible, but it would have an impact > on the our XML data implementation. The default encoding for XML _is_ UTF-8, no? > One of the things thatis currently not sounding good is to store text type > of information in CVS as binary, just because we need to prevent possible > truncation. UTF-8 was carefully designed so that it will never contain a zero byte unless you actually use Unicode character zero (U+0000), and I bet you don't do that. > Is there any solution on how to have unicode text, html and xml files > checked in as text and not as binary, so that we might be able to use > revision and merge options as for text, The CVS cognoscenti should correct me if I'm wrong, but I assume that CVS handles text with 8-bit character sets just fine (as long as you don't use a zero byte) -- no? If this is true, then you should be able to store all your Unicode text using UTF-8 (which is the default for XML anyway) and CVS should handle it perfectly well as text, not binary. Thomas Maslen [EMAIL PROTECTED] _______________________________________________ Bug-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-cvs