> You would need an Input Method driver which lets you type > complex key sequences or combinations to type in a language > which has more than the usual few dozen chars of alphabet.
Yes. The (keyboard) input and (screen) output appears to be the most complicated exercise here. DBCS or UTF-8 support inside other programs would appear less complicated - as far as I know, DOSLFN properly supports DBCS. (UTF-8 appears to be easier than DBCS, but I didn't look into the details of the latter.) > In addition, you get a sort of graceful degradation: Tools > which are not Unicode-aware would treat the strings as if > they use some unknown codepage. So such tools would think > that AndrXX where XX is an encoding for an accented e has 6 > characters but at least you can still see the "Andr" in it. > > In the other direction, if you accidentally put in a text > with Latin1 or codepage 858 / 850 encoding, you get AndrY > where Y is the codepage style encoding of the accented "e" > and the Y and possibly one char after it would be shown in > a broken way by a CON driver which expects UTF8 instead. Arguably, the UTF-8 "compatibility" is worse here: with the actual encoding in any code page (not DBCS or UTF-8), displaying the string in another code page will replace each non-ASCII character by one random character of the active code page. With UTF-8, non-ASCII character are encoded as multi-byte sequences - resulting in several random characters of the active code page, where actually only one code-point is encoded. > I do not understand the "codepoints are 24 bit numbers" > issue. Unicode chars with numbers above 65535 are very > exotic in everyday languages That is why I said it's not that important. > If you mean UTF8, No. That would not make sense. A code-point is usually written like "U+0038", with 4 to 6 hexadecimal digits that give you the numeric value of that code-point. The "character set", Unicode, defines code-points. The encoding, UTF-8, defines how (almost) arbitrary numeric values are to be encoded into a stream of bytes. UTF-8 support easily scales to support all currently reserved code-points which do not fit into a 16-bit number, if the underlying interface supports them. (A 21-bit number is large enough for all code-points.) > I think Mac / Office sometimes might use > one of the UTF16 encodings but otherwise they are not > so widespread. Don't forget FAT's long file names ;-) Regards, Christian ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel