On 2002-11-20, Aitor Santamarķa Merino wrote: > Are Windows charsets 8-bit codepages? > If this is so, we could prepare (I don't know how difficult > it would be) those codepages to be used with DISPLAY.
Such solutions already exist, although they are the result of patching existing .CPI files. In fact, I'm proposing to introduce this for many years now, not only for Windows Code Pages, but also for Macintosh and ISO Code Pages, so you can view foreign files without necessary conversion... Actually, Michal and I are working on creating new .CPI files from scratch (to be used under *any* system supporting .CPI files, including DR-DOS, PTS-DOS, MS-DOS OEM issues, Arabic/Hebrew issues of MS-DOS, OS/2 and Windows NT/2000/XP), so you can include and exclude codepages as you like. Switching between the various .CPI file formats will also be just a matter of setting a different conditional define. This is very slow work, though, as Michal and I are extremely busy with other duties... So volunteers are welcome, for example to create more mapping tables, compare codepages, collect character shapes. All this does not require any real programming knowledge (a bit background on NLS issues will help, though), but beware, it is still very time- consuming and detail-oriented, and therefore exhausting work. It would require someone who is not after a quick hack, but a perfect solution. There's no point in having only 99.9% correct Code Page definitions or such. If we could refer this work to someone else, we could better concentrate on writing the actual code, Michal his font editor and me the skeleton macro assembler sources for the CPI files themselves. Much of it is already there.) > There's something that I would need to know for KEYB to handle > this easily: which is the highest codepage number known? May I refer you to the huge KBD.LST file containing all the keyboard related news for the forthcoming issue of RBIL? I already sent you a copy... For your convenience, here's one of the tables to be found under INT 21h/AX=AD80h: [Sorry for the extra long lines in this post] |KEYB.COM keyboard layout IDs: | | ID: Code: Sub: Code pages: Source: Description: Euro: | --- SG* - (2) [A] - | 0* SG* 1 850,437 [BCDEGHJLMNO] Switzerland (German) E | --- CF* - (2) [A] - | 58* CF* 1 863,850 [BCEHIJKO] Canada (French) E | J: old | K: CAN/CSA-Z 243.200-92 | 58* CF* 1 850,863 [GLMN] E | 91* RU* 1 866,437,850,855,852,1251 [D] Russia - | 91* RU* 1 866,437,850,855,1251 [O] - | 92* RU 2 866,437,850,855,1251 [D] Russia - | 93* RU 2 866,437,850,855,1251 [D] Russia "Latin/Cyrillic" - | 94* UR* 1 866,437,850,855 [D] Russia - | 95* UR 2 866,437,850,855 [D] Russia - | 96* UR 2 866,437,850,855 [D] Russia - | 97* BL* 1 866,437,850,855 [D] - | 98* BL 2 866,437,850,855 [D] - | 99* BL 2 866,437,850,855 [D] - | --- XX* - (5) [A] - | 103* XX* 1 437,850,860,863,865 [B] - | 103* XX* 1 437,850,852,860,863,865 [CDEHJ] - | 103* US/XX* 1 437,850,852,860,863,865,866,855 [FO] - | 103* XX* 1 437,850,852,855,857,860,861,863,866,869 [G] - | 103* US/XX* 1 437,850,852,860,863,865,861 [IK] - | 103* US/XX* 1 437,850,852,855,857,860,861,863,865,866,869 [LMN] 5,E | 103 UX* 1 850,437 [N] 5,E | 103* US/XX 1 437,850,852,860,863,865 [P] "US-X" - | 103* US/XX 1 437,775,850,852,860,863,865,866,855 [R] - | 118* YC* 1 855,852 [GLMN] - | 118 YC* 1 855,850 [K] - | --- BE* - (2) [A] Belgium - | 120* BE* 1 850,437 [BCDEGHIJLMNO] Belgium E | 120* FR 2 437,850 [BCEHIJ] France - | 120* FR 2 850,437 [GLMN] France E | --- GR* - (2) [A] Germany - | 129* GR* 1 437,850 [BCDEHIJKO] Germany (DIN 2137 part 2) - | J: Shiftlock | K: CAPSlock "GR-IBM" | 129* GR* 1 850,437 [GLMN] Germany E | --- IT* - (2) [A] - | 141* IT* 1 437,850 [BCDEHIJKO] Italy - | 141* IT* 2 850,437 [GLMN] Italy 5 | 142* IT 2 437,850 [BCEHIJKO] Italy "IT142" - | 142* IT 2 850,437 [GLMN] Italy 5 | --- NL* - (2) [A] - | 143* NL* 1 437,850 [BCDEHIJO] Netherlands - | 143* NL* 1 850,437 [GLMN] Netherlands E | 146 IT Italy "IT146" ? | --- SF* - (2) [A] - | 150* SF* 1 850,437 [BCDEGHJLMNO] Switzerland (French) E | --- SV* - (2) [A] Sweden - | 153* SV* 1 437,850 [BCDEHIJKO] Sweden - | 153* SV* 1 850,437 [GLMN] Sweden 5 | 153 SU* 1 850,437 [BCDEGHIJKLMNO] Finland (Suomi) 5,E | --- NO* - (2) [A] Norway - | 155* NO* 1 850,865 [BCEHIJKMNO] Norway 5 | 155* NO* 1 850 [GL] 5 | --- DK* - (2) [A] - | 159* DK* 1 850,865 [BCDEHIJLMNO] Denmark 5 | 159* DK* 1 850 [G] - | 161* IS* 1 861,850 [IKO] Iceland - | --- PO* - (2) [A] - | 163* PO* 1 850,860 [BCEGHIJLMNO] Portugal 5 | --- UK* - (2) [A] - | 166* UK* 1 437,850 [BCDEHIJKO] United Kingdom - | 166* UK* 1 850,437 [GLMN] 4 | 168* UK 2 437,850 [BCEHIJK] United Kingdom "UK168" E | --- LA* - (2) [A] - | 171* LA* 1 850,437 [BCEGHIJLMNO] Latin America E | --- SP* - (2) [A] - | 172* SP* 1 850,437 [BCDEGHIJLMNO] Spain 5,E | 179* TR 2 857,850 [GLMN] Turkey E | 179* TR 2 850,857 [IKO] - | --- FR* - (2) [A] - | 189* FR* 1 437,850 [BCDEHIJO] France - | 189* FR* 2 850,437 [GLMN] France E | 190 TH "TH190" | 194 JP* 1 932,437 [CEHJ] Japan - | 194* JP* 1 437,932 [GLMN] - | 194 JP* 1 437,932 [OP] - | 194 AX* 1 437,932 [M] - | 194 J3* 1 437,932 [M] - | 196 AX* 1 437,932 [P] | 196 US* 1 437,932 [P] | 197* IC* 1 850,861 [GLMN] Iceland 5 | 208* HU* 1 850,852 [CDEHJKO] Hungary - | 208* HU* 1 852,850 [G] - | 208 HU* 1 850,852 [I] - | 208* HU* 1 852,912,850 [LMN] - | 210* LT 770,773,775,437 [Q] Lithuania (QWERTY EDV) 4 | 210* LT 775 [R] | 211* LT* 770,773,775,437 [Q] Lithuania (,A"ZERTY) E | 211* LT* 775 [R] | 212* LT 770,773,775,437 [Q] Lithuania (QWERTY Baltic) E | 212* LT 775 [R] | 214* PL* 1 850,852 [CDEHIJKO] Poland - | 214* PL* 1 852,850 [G] - | 214* PL* 1 852,912,850 [LMN] 5? | 220 EL "EL220" | 234* YU* 1 850,852 [CDEHJKO] Yugoslavia (Latin) - | 234* YU* 1 852,850 [G] - | 234 YU* 1 850,852 [I] - | 234* YU* 1 852,912,850 [LMN] - | 234 SI* 1 852,912,850 [LMN] Slovenia - | 234 BL* 1 852,912,850 [LMN] Bosnia/Herzegovina (Latin) - | 234 HR* 1 852,912,850 [LMN] Croatia - | 241* BG 2 855,850 [GLMN] Bulgaria - | 243* CZ* 1 850,852 [CDEHJKO] Czech Republic - | 243* CZ* 1 852,850 [G] - | 243 CZ* 1 850,852 [I] - | 243* CZ* 1 852,850,912 [LMN] - | 245* SL* 1 850,852 [CDEHJKO] - | 245* SL* 1 852,850 [G] - | 245 SL* 1 850,852 [I] - | 245* SL* 1 852,912,850 [LMN] - | 258 IS | 259 EL | 274* BR* 1 850,437 [CEHJO] Brazil - | 274* BR 2 850,437 [GLMN] - | 274* BR* 2 850,437 [IK] - | 275* BR* 2 850,437 [GLMN] "BR-2" - | 275* BR 1 850,437 [IK] - | 319* GK* 1 869,850 [GLMN] Greece - | 319* GK* 1 869,737 [IO] - | 319* GK* 1 869,737,850 [K] - | 333 RO* 1 850,852 [IKO] Romania - | 341* RU 2 866,437,850,855,1251 [D] Russia (Latin/Cyrillic) - | 425* ET* 1 775,850 [O] - | 440* TR* 2 857,850 [GLMN] Turkey "TR440" E | 440* TR* 1 850,857 [IKO] Turkey - | 441* RU 3 866,855,850 [GLMN] Russia "RU441" - | 441 RU* 1 866,850,855 [K] - | 442* BG* 2 855,850 [G] Bulgaria - | 442* BG* 2 866,850,855 [K] - | 442* BG* 2 915,855,850 [LMN] - | 442* BG* 1 866,850,855 [O] - | 443 RU Russia (documention only!) - | 446* RO* 1 852,850 [G] Romania - | 446* RO* 1 852,912,850 [LMN] Romania - | 448* AL* 1 852,850 [G] - | 449* MK* 1 915,855,852 [LMN] FYR of Macodonia - | 450 BC* 1 915,855,852 [LMN] Bosnia/Herzegovia (Cyrillic) - | 450* SB* 1 915,855,852 [LMN] Serbia/Montenegria - | 452* AL* 1 850,437 [LMN] Albania - | 453* GR 1 850,437 [LMN] Germany (DIN 2137) "DE"/"DE453" E | 457 PL "PL457" 5? | 458 IS Iceland "IS458" 5 | 459 EL "EL459" | 470 AA "AA470" | 985* DV* 1 437,850 [FO] Dvorak "USDV" - | 986* LH* 1 437,850 [F] Left handed Dvorak "USDVL" - | 987* RH* 1 437,850 [F] Right handed Dvorak "USDVR" - | | [A] MS-DOS 3.30 KEYBOARD.SYS | [B] PC DOS 4.00, MS-DOS 4.01 KEYBOARD.SYS | [C] MS-DOS 5.0 KEYBOARD.SYS | [D] Russian MS-DOS 5.0 KEYBOARD.SYS | [E] MS-DOS 6.0 KEYBOARD.SYS | [F] MS-DOS 6.0, MS-DOS 6.20, MS-DOS 6.22 DVORAK.SYS | [G] PC DOS 6.1 KEYBOARD.SYS | [H] MS-DOS 6.20 KEYBOARD.SYS | [I] MS-DOS 6.20 KEYBRD2.SYS | [J] MS-DOS 6.22, Chinese MS-DOS 6.22, MS-DOS 7.10 (Windows 95 OPK3, 98, | 98SE / 98ZA), MS-DOS 8.0 (Windows ME) KEYBOARD.SYS | [K] MS-DOS 6.22, Chinese MS-DOS 6.22, MS-DOS 7.10 (Windows 95 OPK3, 98, | 98SE / 98ZA), MS-DOS 8.0 (Windows ME) KEYBRD2.SYS | [L] PC DOS 7 KEYBOARD.SYS | [M] PC DOS/V 7 KEYBOARD.SYS | [N] PC DOS 2000 KEYBOARD.SYS | [O] Windows 2000 (NT5) KEYBOARD.SYS and KEY01.SYS | [P] Japanese MS-DOS 6.20 JKEYBRD.SYS | [Q] KADA W98LT 4.16 KEYBRDL.SYS | [R] KADA WNLT 4.13 for NT4 KEYBRDL.SYS | |Notes: According to documentation PC DOS 7-2000 no longer support the | UK /ID:168 keyboard layout, but the KEYBOARD.SYS files still | contain entries for it. | Since some layout definitions are only used internally, a star (*) | indicates that the layout is actually addressable under this | ID or name | The Code Pages are given in the priority the have in the | corresponding KEYBOARD.SYS file. I hope this can answer several of your's and Henrique's questions. > ( my wish: below 4000 > my second wish: below 8000 > my last wish: below 16000 :-((() Country codes, Code Page IDs, and Keyboard Layout IDs are 16-bit values and should be treated as such. Although far not all of them were or are used by Microsoft and IBM, the highest assignable Code Page number is 65533. 0, 65534 and 65535 are reserved as they have special meanings for the OS itself (see below). AFAIR, Microsoft's MODE does not accept Code Page numbers higher than 999 (only a question of command line parsing, no technical limitation). DR-DOS MODE displays high Code Page numbers as negative numbers due to a signed/unsigned oversight (only a cosmetical issue). But this does not mean they cannot exist - actually, there are *many* (hundreds!) Code Pages with much higher values. Please have a look at the huge table in CODEPAGE.LST which I already sent you as well. I wonder a bit, why I spent so much time collecting all this info and maintaining these lists, when apparently noone reads them... > CHCP is an internal program that calls kernel, which calls NLSFUNC. Indirectly, yes. It calls the DOS kernel, which will usually call down to NLSFUNC, which will then call back into DOS to retrieve the info (for file I/O only). Once the info has been looked up, NLSFUNC will return it to the DOS kernel, which will then again call NLSFUNC in order to switch the codepage. NLSFUNC will then ask any character device driver in the system if it supports codepage switching. Any driver supporting codepage switching (like DISPLAY.SYS or PRINTER.SYS, for example), will then be advised to switch the codepage. If they return an error, NLSFUNC will return an error as well. DISPLAY.SYS internally will also communicate with ANSI.SYS and KEYB in order to switch display and keyboard codepages (ANSI.SYS is not called for codepage switching as is, only for communicating display properties). > If I am not wrong, NLSFUNC would care of all that (including > DISPLAY), and change all of that in a consistent manner*. > The problem is that we do not have a NLSFUNC program :-((( Yes. > (*) There is a case which, in my opinion, leads to inconsistency. > DISPLAY.SYS is responsible for changing keyboard codepage too. > Microsoft's implementation will switch the screen codepage regardless > if KEYB managed to change codepage or not, which means that it would > leave screen and keyboard with different codepages if KEYB failed. > In my opinion, this is a bug. In my opinion, too, but I would simply implement an option into DISPLAY to control the behaviour - so the decision is up to the user. I suggest to use /E for this purpose because it would somewhat correlate with an option supported by my internal issue of DR-DOS NLSFUNC: |NLSFUNC R4.07 (001014) National Language Support |Copyright (c) 1988,1998 Caldera, Inc. All rights reserved. |Copyright (c) 1997,2000 Matthias Paul. All rights reserved. | |NLSFUNC [[d:]path] [/Help] [/B][/E][/F] [/MH|/MU|/ML|/L|/NOHMA] [/N][/V][/X] | | d:path Filespec of local COUNTRY.SYS database (default: system file) | /B Search both, local and system NLS databases for requested data | /E Do not report device driver code page switching errors | /F Override warnings and force NLSFUNC to load or update filespec | /MH Load and relocate NLSFUNC into High Memory (HMA) | /MU Load and relocate NLSFUNC into XMS Upper Memory (XMSUMB) | /ML Load NLSFUNC as classical TSR (Conventional or Upper Memory) | /L or /NOHMA Similar to /ML, but prohibit relocation into High Memory (HMA) | /N Do not bypass NLSFUNC on detection of SHIFT+CTRL+ALT hotkey | /V Display verbose messages (default: warnings and errors) (ALT) | /X Always load advanced COUNTRY.SYS file support | |Installing NLSFUNC without giving any of the /Mx switches will try to relocate |it into High Memory or XMS Upper Memory, or to load it as a classical TSR. Use |of a combination of /Mx switches will override the default. Use of the HILOAD |NLSFUNC /L syntax will try to load NLSFUNC into Upper Memory (UMB) as a TSR. On 2002-11-20, Arkady V.Belousov wrote: > This program is not required. It required only if you wish to > switch codepages "on the fly", but if you work only with one > codepage, you may (should) initialize it with COUNTR= statement. > Of course, in MS-DOS MODE and KEYB without NLSFUNC loaded will > fail to load fonts/layouts other than pointed in COUNTRY=. This is correct, but still, a COUNTRY.SYS file parser is needed not only for NLSFUNC, but also for FreeDOS' DOS BIOS. In older issues of DR DOS, NLSFUNC has been an integral part of the kernel, and the disk file was only a dummy for programs expecting it to be there. But then the code moved into the DOS BIOS, where it will get discarded after init (the driver is temporarily linked in during the processing of the COUNTRY directive), and into the file parsing portion of the external NLSFUNC driver for later use. On 2002-11-21, Axel C. Frinke wrote: > I've heard of a proposal about 'user definable codepage IDs' to > assign IDs above 0xF000 to codepages without official IDs. But > I don't like to assign such a number to a wide-spread codepage > like KOI8-R. According to IBM's Character Data Representation Architecture (CDRA level 2) there are two special areas within the 16-bit Code Page ID space for variations of existing Code Pages and user or OEM definable Code Pages. This is exactly the way to go until IBM would assign an offical ID for a new Code Page. Everything else undermines the system, which I think, is a bad idea, even though there are quite a large number of Code Pages which have been assigned without first checking with IBM and there are still many Code Pages not having official Code Page IDs, yet. That's also, why I withdraw my proposed Code Page ID for the new variant of Code Page 850 with Euro under ID "8501" (which I issued before I knew about IBM CDRA). This codepage is now called CP 858 officially. >From my NECPINW.CPI docs: | My previous proposal for this EURO SIGN-variant of Code Page 850 | was Code Page 8501, while the IBM CDRA level 2 standard reserves | the range E000h..EFFFh for user definable CCSIDs (that is, | "Code Pages" here). NECPINW.CPI can still provide the Code Page | under both IDs (EURO_8501 conditional), but IBM has meanwhile | assigned ID 858, making my previous proposal obsolete. Hence, | support for 8501/58194 may vanish, use 858 instead. and | The IBM CDRA level 2 standard reserves Code Page IDs FF00h..FFFEh | for user definable "private use" assignments. This means, Code Pages in the FF00h..FFFEh (or better FFFDh) range may vary completely from user to user, device to device, and/or manufacturer to manufacturer. So, switching to them via CHCP does not necessary create reasonable results depending on circumstances. Switching to them via MODE dev: CODEPAGE SELECT=nnnnn will still work fine, as you can select different Code Pages for different devices then. But if you want to assign something "new" or "special", this is the range to use, and you are completely free in using this space as you like and can even create your own bit patterns within that range. By definitionem, these assignments are private, so it is no problem, if different people assign different Code Pages to identical IDs. The range E000h..EFFFh is used for varitions of existing Code Pages, and if possible should be assigned so that the LSBs are still matching the parent Code Page. That's why NECPINW.CPI also supported the new variant of Code Page 850 with Euro sign under ID 58194 (CP 850 = 0352h, CP 58194 = E352h). On 2002-11-21, Arkady V.Belousov wrote: > Subject: Re: [fd-dev] ISO-Latin and 4-digit codepages; arabic cp720 >> Let me express it more precisely: it would be handy to have all >> codepage number below 4098. > > As stated by Matthias, DR-DOS assigns for code pages with euro > sign some very big values. My NECPINW.CPI does, DR-DOS does not. However, DR-DOS /does/ support /Country Codes/ much larger than 999 in order to support entries with the ISO 8601 international date format and/or Euro currency. This is a proprietary extension of DR-DOS. Since I have already explained the patterns (a MOD 1000 system) and range definitions for this scheme, I won't go into the details here. Axel C. Frinke wrote: > Well, with the assumption that all codepage IDs would not take more > than 10 bits, there would be 6 remaining bits to denote variations of > the existing codepages. If I remember correct, the DR-DOS method > denotes codepage 858 as 20850. No. DR-DOS does not support codepages with Euro sign (yet) although, somewhat ironically, the symbol is part of the font database. Nothing new to you, Axel, but maybe still an interesting bit of trivia for the others: The Euro currency support in DR-DOS 7.02+ pre-dates the Euro currency support in IBM PC DOS 2000 by several months (IIRC, I implemented it in 1997-11) and it was still not completely clear where to introduce the character on the keyboard layout. I had tried to discuss the matter and find a solution with several keyboard vendors beforehand, but back in 1996 - 1997 they all said, we'll wait and see what Microsoft will do... ;-> Code Page 858 "as is" was not defined at that time. Maybe I was uninformed, but I did not heard about this ID before fall 2000. Looking back, the first reference I can now find about it is dated 1998-04-30 (PC DOS 2000 files), and looking this up in my records, the Euro sign was added to the DR-DOS font database on 1998-05-01 on my behalf. I had read about it in magazine articles a few weeks earlier and only learnt about PC DOS 2000 a few months afterwards. ;-) Still, the Euro variant of the codepage, which IBM introduced in PC DOS 2000 (somewhat incompatibly) resides under the ID 850, not 858. The Euro currency support in DR-DOS is bound to using alternative country codes, which, I think, was a bit cumbersome but reasonable at the time, because during the transitional phase of the European Monetary Union (EMU), you had to easily switch back and forth between the local currency and the forthcoming Euro all the time, so the old and the new country codes could be easly retrieved by adding or subtracting multiples of 1000 from the current value. This gave several possible values, not all of which have actually been used. So, you have values with Euro sign, with international date format according to ISO 8601, and with both, depending on personal preferences or local standards (for example, the corresponding DIN EN 28601 is mandantory in Germany since 1996-05-01, although most people still use the old 1.5.1996 data format). If you selected an Euro-enabled country code under DR-DOS 7.02, the currency was still displayed as "(=" (under PC DOS 2000 still "DM", BTW). A few months earlier, I had searched the web and asked in several German financial institutions, what the official abbreviation for the forthcoming Euro would be, but at this time, they still couldn't give me a definite answer (I guess at least some of them already knew what they would use at this time, but didn't want to make a formal statement), so instead of using "EUR" and risking to introduce a wrong string in the end, I used "(=" instead. Short before the release of DR-DOS 7.02 I received the definite answer that "EUR" would be used, but it took some more months before the immediately updated COUNTRY.SYS file became public with DR-DOS 7.03, unfortunately. Today, this system of doubled country codes is obsolete, and the old entries could be updated as the old currencies are no longer in use. > For automatted processing it is much easier to subtract 20000 than > looking up in an additional lookup table for variants of codepages. As explained, DR-DOS uses a similar system for Country Codes, but not for Code Pages. Still IBM CDRA reserves the E000h..EFFFh codepage range for a very similar purpose. >From my NECPINW.CPI docs (just to give an example, not representative, and of course, by far not a complete list of Code Pages - not all of them are even defined in CDRA level 2): | 00D2h 210 Greek | 016Fh 367 7-bit ISO 646 (US) | 01B5h 437 International, USA, IBM-2, PC-8, World Trade | 029Bh 667 Polish (Mazovia) (=CP 991) | 02E1h 737 Greek | 0352h 850 Multilingual, Latin I | 0354h 852 Slavic, Eastern Europe (Latin II) | 0355h 853 Turkish (Latin II) | 0357h 855 Cyrillic I | 0359h 857 Turkish (=CP 58201) | 035Ah 858 Multilingual, Latin I with EURO SIGN | 035Ch 860 Portuguese | 035Fh 863 French Canadian | 0361h 865 Nordic, Norway II, Danish | 0362h 866 Russian, Cyrillic II | 0363h 867 Czech (Kamenicky) (=CP 895) | 037Fh 895 Czech (Kamenicky) (=CP 867) | 03DFh 991 Polish (Mazovia) (=CP 667) | 999 Dummy placeholder for hardware Code Page | [...] | 2135h 8501 (Multilingual, Latin I with EURO SIGN) | E352h 58194 "" (=CP 8501) | E359h 58201 Turkish with EURO SIGN at D5h (=CP 857) | [...] | E5B5h 58805 CP 437 variant with EURO SIGN at 9Fh | E69Bh 59035 CP 667 variant with EURO SIGN at 9Fh (=CP 59359) | E752h 59218 CP 850 variant with EURO SIGN at 9Fh | E75Fh 59231 CP 863 variant with EURO SIGN at 9Fh | E761h 59233 CP 865 variant with EURO SIGN at 9Fh | E7DFh 59359 CP 991 variant with EURO SIGN at 9Fh (=CP 59035) | [...] Again, 8501 is meanwhile withdrawn and will no longer be supported in future issues of NECPINW.CPI. Please note, that the names of Microsoft's "Latin" codepages use Roman digits (I, II, III) rather than Arabic digits (1, 2, 3) to distinguish them from ISO codepages, which /do/ use Arabic digits. On 2002-11-21, Oleg Deribas wrote: > I don't know is it official or not, but KOI8-R have it's own codepage > number. In IBM OS/2 it is known as CP878. Very interesting, as it just fills a gap in my CODEPAGE.LST file: :-) | Index CCSID CPGID/ ES/ CS/ F/M/S Name & Comments | CP ESID GCSGID | (hex) (dec) (dec) (hex) (dec) (dec) | 0000h - 0 - - - Reduced 7-bit ASCII | (cannot be directly accessed by DOS) | 0000h - 0 - - - (internally reserved by DR DOS) | 0000h - 00000 reserved for special purposes | 0000h 00000 - - - - "Inheritance from a higher level" | [...] | 036Bh 00875 00875 1100h 00925 M(00184) IO/Group 1a: EBCDIC: Greek | 00878 ??? [OS/2 Warp 3 FixPak 40] | 0370h 00880 00880 1100h 00960 F(00190) CM/Group 1a: Cyrillic Multilingual | 880 Russian (Cyrillic GOST) | EBCDIC: Cyrillic | Names (RFC1345): "IBM880", "cp880", | "EBCDIC-Cyrillic" | (SeeAlso: CCSID 04976) | [...] For comparison purposes, can you provide a full encoding vector for what IBM implements in CP 878 (preferably in Unicode notation)? >>> BTW, to co mplicate case the more, there is another KOI8 - >>> KOI8-U (Ukrainian KOI8). You may see the differences with KOI8-R >>> in RFC2319. >> I will take a look at it by chance. Thanks. > > And there is official Ukrainian DOS codepage - CP1125. It is similar > to Russian CP866, but contains all Ukrainian characters. > BTW, in Epson printers CP1125 called CP866-Ukr for some reason ;) | [...] | 0464h 01124 01124 4100h 01326 F(00190) CM/Group 1a: Cyrillic Ukraine 8-Bit | 01125 ??? [OS/2 Warp 3 FixPak 40] | 1129 SBCS: Vietnamese [IBM PC] | 01131 ??? [OS/2 Warp 3 FixPak 40] | 01132 EBCDIC: Laotian [IBM, Unicode proposal | 1998-05] | 01133 SBCS: ASCII Laotian (ISO-8 based) [IBM, | Unicode proposal 1998-05] | [...] Yet another match, it seems. Thanks! :-) In regard to the areas E000h..EFFFh and FF00h..FFFEh, another excerpt of the end of CODEPAGE.LST: | [...] | C1B5h 49589 00437 3100h 00980 S(00097) CM/Group 1: PC Display; United Kingdom | C1F4h 49652 00500 1100h 01114 S(00160) CM/Group 1: Belgium | D1F4h 53748 00500 1100h 00103 S(00094) CM/Group 1: International DP94 | 57344..61439 var. var. var. CCSID: reserved for private/customer use | 61440..61695 var. var. var. CCSID: reserved for future allocation by CDRA | 61696..61951 var. var. var. CCSID: reserved for Global Use CCSIDs | F100h 61696 00500 1100h 00640 S(00081) Global Use: Syntactic CS in SBCS EBCDIC | (CP 00500 is used in the CDRA CCSID registry. | Any other CP, such as 00037, that has an | associated ESID 1100h and respects the | invariance for CS 00640, may also be used.) | F101h 61697 00850 2100h 00640 S(00081) Global Use: Syntactic CS in SBCS PC Data | F102h 61698 00850 3100h 00640 S(00081) Global Use: Syntactic CS in SBCS PC Display | F103h 61699 00819 4100h 00640 S(00081) Global Use: Syntactic CS in SBCS ISO-8 | F104h 61700 00367 5100h 00640 S(00081) Global Use: Syntactic CS in SBCS ISO-7 | F10Eh 61710 00819 4100h 01274 S(00073) Global Use: Dual case printable graphics of | ASN.1 in SBCS ISO-8; it includes: A to Z, | a to z, 0 to 9, and + = ' ( ) , - . / : ? | This CCSID corresponds to ASN.1 (ISO 8824) | "Printable String" and its encoding in SBCS | ISO-7 and ISO-8 codes. | F10Fh 61711 00500 1100h 01274 S(00073) Global Use: Dual case printable graphics of | ASN.1 in SBCS EBCDIC; it includes: A to Z, | a to z, 0 to 9, and + = ' ( ) , - . / : ? | This CCSID corresponds to ASN.1 (ISO 8824) | "Printable String" characters encoded in | SBCS EBCDIC codes. | (CP 00500 is used in the CDRA CCSID registry. | Any other CP, such as 00037, that has an | associated ESID 1100h and respects the | invariance for CS 01274, may also be used.) | F110h 61712 00500 1100h 01134 S(00036) Global Use: SNA character set, type AR | (A to Z, and 0 to 9). | (CP 00500 is used in the CDRA CCSID registry. | Any other CP, such as 00037, that has an | associated ESID 1100h and respects the | invariance for CS 01134, may also be used.) | 61952..62207 - - - CCSID: reserved for Request for Price |Quotation RPQ) | 62208..65533 - - - CCSID: reserved for future allocation by CDRA | 65024..65279 - - - CPGID/CP: reserved for Request for Price |Quotation (RPQ) | 65280..65534 - - - CPGID/CP: reserved for customer use | - 65400 reserved for Glyphes [IBM OS/2] | FFFEh - 65534 - - - (internally reserved by DR DOS) | FFFEh 65534 - - - - "Inheritance from a lower level" | FFFFh - 65535 - - - (internally reserved by DOS and DR DOS) | FFFFh - 65535 - - - reserved for special purposes | FFFFh 65535 - - - - "CCSID not applicable" Hope it helps, Matthias -- <mailto:[EMAIL PROTECTED]>; <mailto:[EMAIL PROTECTED]> http://www.uni-bonn.de/~uzs180/mpdokeng.html; http://mpaul.drdos.org "Programs are poems for computers." ---------- list options/archives/etc.: http://www.topica.com/lists/fd-dev unsubscribe: send blank email to: [EMAIL PROTECTED] ==^^=============================================================== This email was sent to: archive@mail-archive.com EASY UNSUBSCRIBE click here: http://topica.com/u/?bz8Rv5.bbRv4l.YXJjaGl2 Or send an email to: [EMAIL PROTECTED] T O P I C A -- Register now to manage your mail! http://www.topica.com/partner/tag02/register ==^^===============================================================