Unicode Keyboard Input Linux - Naive keymap
Hi All, This is a utf-8 file produced with Microsoft Word 2000 on Windows 98. Wordpad only does utf-16. Windows 98 Notepad does not do Unicode at all. I'm going to send you this message using copypaste into the Yahoo webmail form to see what happens. I've attached the original. Tschüss. Elvis NaiveKeymap.xml A keymap is essentially a list of keys. It translates (or maps) keycodes to charcodes. The keycodes represent keys on the keyboard and must be enumerated somewhere. A keymap is also a list(=set) of other keymaps, each keymap containing a list of keycode to charcode translations. The character set can be given for each (key)map; it doesnt have to be Unicode, or utf-8. For example: keymap map name=english key=ShiftAltF1.../map map name=russian key=ShiftAltF2.../map map name=greek key=ShiftAltF3 charset=utf-8 key name=V#969;/key --small omega=U03C9 key name=SemiColon key name=W#974;/key/key --small omega with tonos=U03CE key name=ShiftV#937;/key --capital omega=U03A9 key name=SemiColon key name=ShiftV#911;/key/key --capital omega with tonos=U038F key name=F1#922;#945;#955;#951;#956;#941;#961;#945; #954;#972;#963;#956;#949;/key --the string, kalhmera kosme key name=CtrlAltDelete action=reboot/key /map /keymap I have arbitrarily reserved AltF1, AltF2, AltF3 for VC1, VC2, VC3 etc. --these keys are not part of the VC keymap. You could use the ShiftAltFn to switch between maps within a keymap. Characters are written in utf-8, because this is xml, but you could also use the Unicode number format like this: key name=Q char=U+1FF6/key 'compose' key sequences are just nested character definitions. enum actions {characters, strings, switching keymaps, booting the machine, etc.}; There is no distinction between characters and strings. String actions are like character actions, just normal content. Emulating a Graphical Terminal What tty mode is rawer than RAW? It would have to be bit-mapped mode, or tty=null. Just pull out the ldterm all together and put in a graphical hardware emulator. How much of the X Protocol can be moved to a kernel-space stream, so that virtual consoles can share the same graphics interface (including fonts, keyboard events, and mouse) that X Windows system uses? DOS programmers will be familiar with the traditional graphical PC interface, but I am not one of them. For graphical virtual terminals to work, you would only need the graphics interface, not the windows, because the notion of a 'window' is fundamentally different to a VC, which does not share its virtual display with many applications. (That is what X, running in a VC, is for, to multiplex the VC.) Character mode VCs would be built on top of the graphics emulator. /* Naïve_Keyboard.h */ // Sorry about the mistake namespace mod {enum {shift=0x8000, ctrl=0x4000, alt=0x2000}; } // -- Correction namespace event {enum {make=0x0, brk=0x0080}; } namespace key {enum {ESC=0x01, k1=0x02, k2=0x03, k3=0x04, k4=0x05, k5=0x06, k6=0x07, k7=0x08, k8=0x09, k9=0x0a, k0=0x0b, minus=0x0c, equal=0x0d, BS=0x0e, tab=0x0f, Q=0x10, W=0x11, E=0x12, R=0x13, T=0x14, Y=0x15, U=0x16, I=0x17};} const int q_make = key::Q | event::make; const int q_break = key::Q | event::brk; const int q_key = q_make; const int q_shift = mod::shift | q_key; const int q_ctrl = mod::ctrl | q_key; const int q_alt = mod::alt | q_key; const int q_ctrl_alt = mod::ctrl | mod::alt | q_key; Composite Keys The Q key can be represented by the make code(=0x10 | 0x00). Shift-Q is a composite key being composed of Shift and Q keys (=0x8000 | 0x0010). Ctrl-Alt-Q = 0x4000 | 0x2000 | 0x0010; ß better The keyboard driver translates real keyboard scancodes into integer-valued composite keys, which are more easily mapped to (utf-8) characters. Users can change keyboard map in midstream(=line) by using an Alt key combination, so you can't replace a keyboard map as a module in a Stream. A composite keycode would contain bits set corresponding to the state of the keyboard. X Keyboard events (unlike console scan codes) contain not only make, break events, but they also pass along the state of the keyboard determined by the modifier keys down at the time. Applications not interested in break events can ignore them, but they have to look at the entire keycode to determine the value of the key. (Keycode is part of scancode.) Events are translated into characters by a conversion function. This function must look at a sequence of dead keys before outputting an accented character. All keys are dead keys until they emit a character. Traditional terminals send ascii characters to the tty driver in the host, so the keyboard logic (=event processing) is part of the terminal. __ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mailHi All, This is a utf-8 file produced with Microsoft Word 2000 on Windows 98.
Re: Unicode Keyboard Input Linux
On Sun, Jun 20, 2004 at 11:24:32PM +0200, Denis Barbier wrote: I have another question about UTF-8 and kbd: some keymaps are defined twice, with Unicode notation and numerical (or litteral) notation, like mk-utf.map and mk-cp1251.map. Since Unicode notation is needed to input non-ASCII characters, all keymaps will sooner or later provide a -utf variant. But why is Unicode notation handled in a different manner by loadkeys? This distinction could be made depending on KDGKBMODE ioctl but not on input format so that a single keymap can be used. Here is a first patch. It has not been fully tested, but it should explain what I am talking about. The same keymap file can be loaded when keyboard is in UTF-8 or ASCII mode, and this file can contain literal strings, numbers or Unicode codepoints. A charset has to be declared so that conversion between these formats can be performed without trouble. Denis diff -ur kbd-1.12.orig/src/analyze.l kbd-1.12/src/analyze.l --- kbd-1.12.orig/src/analyze.l 2004-01-16 22:51:44.0 +0100 +++ kbd-1.12/src/analyze.l 2004-06-24 21:28:14.0 +0200 @@ -77,7 +77,7 @@ \- {return(DASH);} \, {return(COMMA);} \+ {return(PLUS);} -{Unicode} {yylval=strtol(yytext+1,NULL,16);return(UNUMBER);} +{Unicode} {yylval=strtol(yytext+1,NULL,16) ^ 0xf000;return(UNUMBER);} {Decimal}|{Octal}|{Hex}{yylval=strtol(yytext,NULL,0);return(NUMBER);} RVALUE{Literal} {return((yylval=ksymtocode(yytext))==-1?ERROR:LITERAL);} {Charset} {return(CHARSET);} diff -ur kbd-1.12.orig/src/dumpkeys.c kbd-1.12/src/dumpkeys.c --- kbd-1.12.orig/src/dumpkeys.c2004-01-16 20:45:31.0 +0100 +++ kbd-1.12/src/dumpkeys.c 2004-06-24 23:50:48.0 +0200 @@ -131,11 +131,10 @@ t = KTYP(code); v = KVAL(code); if (t = syms_size) { - code = code ^ 0xf000; - if (!numeric (p = unicodetoksym(code)) != NULL) + if (!numeric (p = codetoksym(code)) != NULL) printf(%-16s, p); else - printf(U+%04x , code); + printf(U+%04x , code ^ 0xf000); return; } if (t == KT_LETTER) { diff -ur kbd-1.12.orig/src/ksyms.c kbd-1.12/src/ksyms.c --- kbd-1.12.orig/src/ksyms.c 2004-01-16 20:45:31.0 +0100 +++ kbd-1.12/src/ksyms.c2004-06-25 01:14:42.0 +0200 @@ -1,7 +1,9 @@ +#include linux/kd.h #include linux/keyboard.h #include stdio.h #include string.h #include ksyms.h +#include getfd.h #include nls.h /* Keysyms whose KTYP is KT_LATIN or KT_LETTER and whose KVAL is 0..127. */ @@ -1615,9 +1617,6 @@ /* Functions for both dumpkeys and loadkeys. */ -static int prefer_unicode = 0; -static const char *chosen_charset = NULL; - void list_charsets(FILE *f) { int i,j,lth,ct; @@ -1655,10 +1654,8 @@ sym *p; int i; - if (!strcasecmp(charset, unicode)) { - prefer_unicode = 1; + if (!strcasecmp(charset, unicode)) return 0; - } for (i = 0; i sizeof(charsets)/sizeof(charsets[0]); i++) { if (!strcasecmp(charsets[i].charset, charset)) { @@ -1667,7 +1664,6 @@ if(p-name[0]) syms[0].table[i] = p-name; } - chosen_charset = charset; return 0; } } @@ -1677,10 +1673,15 @@ } const char * -unicodetoksym(int code) { +codetoksym(int code) { int i, j; sym *p; + if (KTYP(code) == KT_META) + return NULL; + if (KTYP(code) syms_size) + return syms[KTYP(code)].table[KVAL(code)]; + code = code ^ 0xf000; if (code 0) return NULL; if (code 0x80) @@ -1697,49 +1698,60 @@ /* Functions for loadkeys. */ -int unicode_used = 0; - int ksymtocode(const char *s) { int i; - int j, jmax; + int j; int keycode; + int fd; + int kbd_mode; + int syms_start = 0; sym *p; + if (!s) { + fprintf(stderr, %s\n, _(null symbol found)); + return -1; + } + + fd = getfd(NULL); + ioctl(fd, KDGKBMODE, kbd_mode); if (!strncmp(s, Meta_, 5)) { + /* Temporarily change kbd_mode to ensure that keycode is + right. */ + ioctl(fd, KDSKBMODE, K_XLATE); keycode = ksymtocode(s+5); + ioctl(fd, KDSKBMODE, kbd_mode); if (KTYP(keycode) == KT_LATIN) return K(KT_META, KVAL(keycode)); /* fall through to error printf */ } - for (i = 0; i syms_size; i++) { - jmax = ((i == 0 prefer_unicode) ? 128 :
Re: Unicode Keyboard Input Linux
--- Denis Barbier [EMAIL PROTECTED] wrote: I have another question about UTF-8 and kbd: some keymaps are defined twice, with Unicode notation and numerical (or litteral) notation, like mk-utf.map and mk-cp1251.map. Since Unicode notation is needed to input non-ASCII characters, all keymaps will sooner or later provide a -utf variant. But why is Unicode notation handled in a different manner by loadkeys? This distinction could be made depending on KDGKBMODE ioctl but not on input format so that a single keymap can be used. In reading keymaps man pages, I naturally assume that these text files are utf-8, and that international characters can be used in keysym i.e. action positions, so I don't know exactly what you mean by /unicode/ notation. Is that some kind of ascii notation used to represent unicode? Elvis I have some questions about keymaps myself : 1) Man page of LOADKEYS BUGS The keyboard translation table is common for all the virtual consoles, so any changes to the keyboard bindings affect all the virtual consoles. :-( This is not a bug; it is a design defect. Each virtual keyboard needs its own keymap which translates keycodes (= modifier, key, event} into characters. 2) Man page of DUMPKEYS My cygwin does not respond to any of the keymaps commands, but I haven't checked my PATH variable (yet). 3) Virtual Consoles and xterms coexist! You can even run a complete X session(=display?) in each console. The X Window System will use the virtual console 7 by default. So if you start X and then switch to one of the text-based virtual consoles, you can go back again to X by typing Alt-F7. O'Reilly, Running Linux, 3rd ed. 1999, pg. 94 When X is started, it opens the first unused console [unused? /dev/console as a clone driver?]. While X is running, you can use Ctrl-Alt-Fn to switch to VTn. When X finishes, it will return to the original console [huh?] http://www.tldp.org/HOWTOP/Keyboard-and-Console-HOWTO-13.html Therefore, a) 'virtual consoles' must emulate a graphical terminal. (xterms emulate vt100's, character-mode terminals, so you could not run an instance of X in an xterm, but it looks like you can (and do!) run X in a virtual console.) Hypothesis: you should also be able to run an X server as an X client in an X window, since each window is a graphical terminal emulator. Only the interfaces are different. I imagine the second, nested version of X, would open an X window using the xlib protocol. b) 'vterms(=virtual consoles)' and 'xterms' must be able to coexist. notes Tell me if you like this terminology: /*naive_keyboard.h*/ // I thought C++ enums introduced a new namespace. This is really ugly: namespace mod {enum {shift=0x8000, ctrl=0xc00, alt=0xe00}; } // Too bad I can't use the term /break/ in this context: namespace event {enum {make=0x0, brk=0x0080}; } // Then, a list of keycodes... namespace key {enum {ESC=0x01, k1=0x02, k2=0x03, k3=0x04, k4=0x05, k5=0x06, k6=0x07, k7=0x08, k8=0x09, k9=0x0a, k0=0x0b, minus=0x0c, equal=0x0d, BS=0x0e, tab=0x0f, Q=0x10, W=0x11, E=0x12, R=0x13, T=0x14, Y=0x15, U=0x16, I=0x17};} //etc //scancodes const int q_make = key::Q | event::make; const int q_break = key::Q | event::brk; // the Q key (code) const int q_key = q_make; // Composite keys: const int q_shift = mod::shift | q_key; const int q_ctrl = mod::ctrl | q_key; const int q_alt = mod::alt | q_key; const int q_ctrl_alt = mod::ctrl | mod::alt | q_key; Composite Keys The Q key can be represented by the make code(=0x10 | 0x00). Shift-Q is a /*composite*/ key being composed of Shift and Q keys (=0x8000 | 0x0010). Ctrl-Alt-Q = 0xc00 | 0xe00 | 0x | 0x0010; The keyboard driver translates real keyboard scancodes into integer-valued composite keys (=keysyms in some idioms), which are more easily mapped to (utf-8) characters. X and Microsoft Windows do it this way. Users can change keyboard map in midstream(=line) by using an Alt key combination, so you can't replace a keyboard map as a module in a Stream. A composite keycode would contain bits set corresponding to the state of the keyboard. Careful, the CapsLock key changes the state of each (virtual) keyboard. X Keyboard events (unlike console scan codes) contain not only make, break events, but they also pass along the state of the keyboard determined by the modifier keys down at the time. Applications not interested in break events can ignore them, but they have to look at the entire keycode to determine the value of the key i.e. they must consider the modifier bits. (keycode is part of scancode in PC terminology.) Keyboard events are translated into characters by a conversion function. (naturally :) This function looks at a sequence of dead keys before outputting an accented character. Therefore: /*All keys are dead keys until they emit a character.*/ Traditional terminals send (ascii) characters to the tty driver in the host, so the keyboard logic (=event processing) was part of the
Re: Unicode Keyboard Input Linux
I have another question about UTF-8 and kbd: some keymaps are defined twice, with Unicode notation and numerical (or litteral) notation, like mk-utf.map and mk-cp1251.map. Since Unicode notation is needed to input non-ASCII characters, all keymaps will sooner or later provide a -utf variant. But why is Unicode notation handled in a different manner by loadkeys? This distinction could be made depending on KDGKBMODE ioctl but not on input format so that a single keymap can be used. Denis -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
Thank you. My version of Notepad has a little checkbox in the lower left corner of the 'Save As' dialog box called 'Save as Unicode'. I never saw it. 'Save as Type' has only Text Documents(*.txt) and All files. What about the other options, Unicode Big Endian, UTF-8? There is no entry in the Notepad help. Elvis PS Maybe I don't need a Linux box after all :) --- Wu Yongwei [EMAIL PROTECTED] wrote: You are wrong. Check the File - Save As menu item of Notepad. You will find the encoding option: ANSI, Unicode, Unicode Big Endian, UTF-8 are supported. You may need to specify a different font if some characters cannot display. By the way, many think it is a good idea to use real names in mailing list correspondence. Best regards, Wu Yongwei Elvis Presley wrote: As far as I remember, Notepad on NT (New Technology ;) systems has been doing Unicode for text files as long as it exists (or at least since NT4, that's the first I saw it on), if we consider so-and-so UCS-2/UTF-16 support as Unicode support. No, I'm sitting at an NT workstation right now, and I see no way to do Unicode in Notepad. In fact, the 'View Source' menu selection on my browser blithley opens Notepad to view html, and everything shows up as boxes but the ascii tags. On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you can imagine my surprise when the NT workstation at the library reported, Unicode text file support had been removed from this version of Wordpad. I immediately thought it was a cynical attempt on Microsoft's part to get us to use Word 2000, also installed on the Workstation, but, as I said, it's so fat, I hate using it. Otherwise, I have no idea why they did it. Search your memory. If you did see Unicode in Notepad on NT, I'd be interested. Thanks, Elvis -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
Thank you. My version of Notepad has a little checkbox in the lower left corner of the 'Save As' dialog box called 'Save as Unicode'. I never saw it. 'Save as Type' has only Text Documents(*.txt) and All files. What about the other options, Unicode Big Endian, UTF-8? I do not quite understand it. I just checked a Windows XP Professional box of a colleague's, and found the dialog just as I have described (same as Windows 2000). And his Wordpad has the type Unicode text file. Aren't you using the Home version? There is no entry in the Notepad help. Elvis For the moment maybe we should talk off the mailing list since it is not about Linux any more. Maybe you should first have a Linux box installed, use it for a while, and then talk again on the list. Best regards, Wu Yongwei -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
--- Danilo Segan [EMAIL PROTECTED] wrote: Today at 20:13, Elvis Presley wrote: I should have said, unicode text file support. Wordpad still does unicode, but only in Word format, not as a text file, so I can still edit a document in unicode, but I have to copy and paste it into a unicode editor to create a text file. As far as I remember, Notepad on NT (New Technology ;) systems has been doing Unicode for text files as long as it exists (or at least since NT4, that's the first I saw it on), if we consider so-and-so UCS-2/UTF-16 support as Unicode support. No, I'm sitting at an NT workstation right now, and I see no way to do Unicode in Notepad. In fact, the 'View Source' menu selection on my browser blithley opens Notepad to view html, and everything shows up as boxes but the ascii tags. On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you can imagine my surprise when the NT workstation at the library reported, Unicode text file support had been removed from this version of Wordpad. I immediately thought it was a cynical attempt on Microsoft's part to get us to use Word 2000, also installed on the Workstation, but, as I said, it's so fat, I hate using it. Otherwise, I have no idea why they did it. Search your memory. If you did see Unicode in Notepad on NT, I'd be interested. Thanks, Elvis Cheers, Danilo -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/ __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
You are wrong. Check the File - Save As menu item of Notepad. You will find the encoding option: ANSI, Unicode, Unicode Big Endian, UTF-8 are supported. You may need to specify a different font if some characters cannot display. By the way, many think it is a good idea to use real names in mailing list correspondence. Best regards, Wu Yongwei Elvis Presley wrote: As far as I remember, Notepad on NT (New Technology ;) systems has been doing Unicode for text files as long as it exists (or at least since NT4, that's the first I saw it on), if we consider so-and-so UCS-2/UTF-16 support as Unicode support. No, I'm sitting at an NT workstation right now, and I see no way to do Unicode in Notepad. In fact, the 'View Source' menu selection on my browser blithley opens Notepad to view html, and everything shows up as boxes but the ascii tags. On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you can imagine my surprise when the NT workstation at the library reported, Unicode text file support had been removed from this version of Wordpad. I immediately thought it was a cynical attempt on Microsoft's part to get us to use Word 2000, also installed on the Workstation, but, as I said, it's so fat, I hate using it. Otherwise, I have no idea why they did it. Search your memory. If you did see Unicode in Notepad on NT, I'd be interested. Thanks, Elvis -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
Hi All, Thanks for your help. I'm still processing your input. They just changed my Yahoo! mail; let's hope you get this. Elvis PS User space text-mode only virtual terminals On Mon, Jun 14, 2004 at 21:38:13 +0200, Pablo Saratxaga wrote: [...] It would be perfectly ok to provide only very minimalistic kernel support (even simpler and lighter than the current one) and have a user space 'vc' loaded early in the boot process. Or none at all. Just move the VC mux out of the kernel and into user space: == --+ : +---+ : +--+ ...-:-|console|-:-| | --+ : +---+ +---+ : |VC mux| : +|tcp|---:-| | : | +---+ : +--+ : | : : | +---+ : +-+ : | +--|tcp|--:-| | : | | +---+ : |vterm| : | | +-:-| | : | | | : +-+ : | | | m+-+s ++ : +-+ : | | +-|ptty0|--|tty0|-:-|shell| : | |+-+++ : +-+ : | | : : | | +---+ : +-+ : | | +|tcp|--:-| | : | | | +---+ : |vterm| : | | | +-:-| | : | | | | : +-+ : | | | | m+-+s ++ : +-+ : | | | +-|ptty1|--|tty1|-:-|login| : | | | +-+++ : +-+ : v v v : : +--++---+ : +-+ : ... --|IP mux|--|tcp|--:-| | : +--++---+ : |vterm| : +-:-| | : | : +-+ : | m+-+s ++ : +-+ : +-|ptty2|--|tty2|-:-|getty| : +-+++ : +-+ == This is exactly the same situation which applies to xterms, only the VC mux opens the console in character mode. It then forks a fixed number of 'vterms' as child processes. Each vterm holds the character contents of its display as well as the state of its keyboard. Conclusion: vterms and xterms are redundant, so there is no good reason to run them both at the same time. And xterms are more flexible. Still, the keyboards are the same, so both could share the same, better(=X) 'keymaps' fsm. 512(=2**9) character glyphs in the vterm character buffer would be plenty for my purposes: Latin (french, german, spanish, italian), Greek (mono- and polytonic) and Cyrillic, but I'd have to be able to chose the unicode characters I want, and map them to glyphs in the console-font. (You couldn't pull the IP mux out as easily, relying on traditional Unix pipes for IPC... that's another mailing list.) Unicode [...] is prejudiced against non-speakers. ??? [...] I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1 characters. Now everything (but English) is twice the size. (That's not true, only the accented vowels are.) The Perseus Project does a nice job with unicode, it has to, because there is no national character set for poly Greek (well there is, sort of, the encoding schemes used in academia, but they are less well known, and the unicode font support is better). Why do Greek newspapers still use ISO 8859-7? For the same reason that a majority of English language web sites still use windows-1252, I suppose. I guess we'll have to ask them. it looks like these older character sets will be around for a long time. Yes, but not for that reason (to save space); they are around because there is a lot of *OLD* data in those encodings [...] http://www.dolnet.ta-nea.gr/ is still producing alot of new material, and their mix is text-oriented. I thought it might be because they were using web authoring tools based on the older, national character set. Wide characters are easily compressed, by the file system, or the network. In fact, there is alot of network
Re: Unicode Keyboard Input Linux
Windows Virtual Machines Kalhmera kosme. I'm at the library right now and our NT workstations do not have international keyboard drivers installed. So I have to write Greeklish. Elvis PS On Tue, 15 Jun 2004 at 16:44:48 +0200, Pablo Saratxaga wrote: On Tue, Jun 15, 2004 at 05:55:18AM -0700, Elvis Presley wrote: Conclusion: vterms and xterms are redundant, so there is no good reason to run them both at the same time. And xterms are more flexible. Yes, but there is a big difference: xterms need a running X terminal; vterms don't. Can you help me out? I don't have a Linux PC. Do vterms and xterms run together on a real system, on your system? Still, the keyboards are the same, so both could share the same, better(=X) 'keymaps' fsm. The way the keyboards are handled is quite different (on X11 there is a high hardware abstraction; while the linux keyboard on console interacts directly with the kernel. So, it looks like you get a console from the kernel whether you want one or not. I'm thinking of those virtual terminal emulator processes. It's gotta be possible to emulate an xterm in a vterm, then neanderthals like me can use their stone tools. I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1 characters. Now everything (but English) is twice the size. (That's not true, only the accented vowels are.) And some are 3 bytes long, and some other are 4 bytes long,... But who cares? What matters is the ability to type any letter used in any human written language. That is a very huge improvement. I agree. I'm on your side. Why do Greek newspapers still use ISO 8859-7? For the same reason that a majority of English language web sites still use windows-1252, I suppose. http://www.dolnet.ta-nea.gr/ is still producing alot of new material, [unknown adress] Sorry about that. The url is: http://ta-nea.dolnet.gr/ They don't use the 'www' prefix as an alias, and I keep forgetting their parent company name, 'dolnet'. (I asked them to register 'tanea.gr' but they haven't.) The Communist Party newspaper is: http://www.rizospastis.gr/ They have a much nicer name, and they also have a 'text-only' link which does not download images, just the text. You can get the entire daily newspaper through http. (Only I don't know if they are using unicode, but I assume not, it's probably ISO 8859 too. You see? I've become skeptical.) Is there a version of Linux which runs as a Microsoft Window (not cygwin)? ?? What you say doesn't make sense. (you can on the other hand run an operating system inside of a virtual computer box inside another operating system) I should have asked, Is there a version of Microsoft Windows which will run a copy of Linux? It doesn't make any more sense in the other way either :) Both MS-Windows and Linux are operating systems, you can run one, or the other, not one inside the other; they are built in order to run at the very bottom in direct interaction with the hardware. They can be run inside an emulated hardware box, but not as normal programs. Microsoft describes Windows as a virtual-machine operating system, and DOS does, indeed, run, as an operating system in a window. I never read of MS-Windows described as a virtual-machine... And what runs in a window is in fact command.com, which is the equivalent (in much less powerful) of /bin/bash I assume a VxD would map/share the PC hardware, controlled by Windows, to the device drivers in the Linux kernel. No, the linux kernel needs direct access to the hardware. What you need is to emulate an entire system, like vmware does. I haven't been able to determine exactly what vmware does from their website, too proprietary, too hush-hush, but I assume they write VxDs which map the Linux kernel to the Windows VMM, and the real hardware. Someone once told me their product ran on the NT platform, but not Windows 98, but it was quite expensive. (All hearsay. No personal experience.) The heart of the Windows operating system is called the VMM(=Virtual Machine Manager). There are alot of descriptions out there, like http://win32assembly.online.fr/vxd-tut2.html When the VMM is running an instance of DOS, you get direct access to the DOS INT21 interface. Your program can even write directly into display memory, just like the old days, when your program owned the console. The VMM manages to control access to the real display, by remapping the real memory(=the virtual memory address space) used by DOS, which still has that weird 20-bit memory line. Even 32-bit protected-mode programs designed to run under a DOS extension called ???extenders??? --I forget the jargon-- still run in a DOS Window. Microsoft has managed to recreate the entire the DOS OS, not just command.com. I think you could host Linux, if you had the right VxDs. The VMM remaps the i486 ports used by the hosted OS's device drivers, so when the Linux kernel writes to port addresses, the VMM traps them in a
Re: Unicode Keyboard Input Linux
Windows Virtual Machines Kalhmera kosme. I'm at the library right now and our NT workstations do not have international keyboard drivers installed. So I have to write Greeklish. Elvis PS On Tue, 15 Jun 2004 at 16:44:48 +0200, Pablo Saratxaga wrote: On Tue, Jun 15, 2004 at 05:55:18AM -0700, Elvis Presley wrote: Conclusion: vterms and xterms are redundant, so there is no good reason to run them both at the same time. And xterms are more flexible. Yes, but there is a big difference: xterms need a running X terminal; vterms don't. Can you help me out? I don't have a Linux PC. Do vterms and xterms run together on a real system, on your system? Still, the keyboards are the same, so both could share the same, better(=X) 'keymaps' fsm. The way the keyboards are handled is quite different (on X11 there is a high hardware abstraction; while the linux keyboard on console interacts directly with the kernel. So, it looks like you get a console from the kernel whether you want one or not. I'm thinking of those virtual terminal emulator processes. It's gotta be possible to emulate an xterm in a vterm, then neanderthals like me can use their stone tools. I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1 characters. Now everything (but English) is twice the size. (That's not true, only the accented vowels are.) And some are 3 bytes long, and some other are 4 bytes long,... But who cares? What matters is the ability to type any letter used in any human written language. That is a very huge improvement. I agree. I'm on your side. Why do Greek newspapers still use ISO 8859-7? For the same reason that a majority of English language web sites still use windows-1252, I suppose. http://www.dolnet.ta-nea.gr/ is still producing alot of new material, [unknown adress] Sorry about that. The url is: http://ta-nea.dolnet.gr/ They don't use the 'www' prefix as an alias, and I keep forgetting their parent company name, 'dolnet'. (I asked them to register 'tanea.gr' but they haven't.) The Communist Party newspaper is: http://www.rizospastis.gr/ They have a much nicer name, and they also have a 'text-only' link which does not download images, just the text. You can get the entire daily newspaper through http. (Only I don't know if they are using unicode, but I assume not, it's probably ISO 8859 too. You see? I've become skeptical.) Is there a version of Linux which runs as a Microsoft Window (not cygwin)? ?? What you say doesn't make sense. (you can on the other hand run an operating system inside of a virtual computer box inside another operating system) I should have asked, Is there a version of Microsoft Windows which will run a copy of Linux? It doesn't make any more sense in the other way either :) Both MS-Windows and Linux are operating systems, you can run one, or the other, not one inside the other; they are built in order to run at the very bottom in direct interaction with the hardware. They can be run inside an emulated hardware box, but not as normal programs. Microsoft describes Windows as a virtual-machine operating system, and DOS does, indeed, run, as an operating system in a window. I never read of MS-Windows described as a virtual-machine... And what runs in a window is in fact command.com, which is the equivalent (in much less powerful) of /bin/bash I assume a VxD would map/share the PC hardware, controlled by Windows, to the device drivers in the Linux kernel. No, the linux kernel needs direct access to the hardware. What you need is to emulate an entire system, like vmware does. I haven't been able to determine exactly what vmware does from their website, too proprietary, too hush-hush, but I assume they write VxDs which map the Linux kernel to the Windows VMM, and the real hardware. Someone once told me their product ran on the NT platform, but not Windows 98, but it was quite expensive. (All hearsay. No personal experience.) The heart of the Windows operating system is called the VMM(=Virtual Machine Manager). There are alot of descriptions out there, like http://win32assembly.online.fr/vxd-tut2.html When the VMM is running an instance of DOS, you get direct access to the DOS INT21 interface. Your program can even write directly into display memory, just like the old days, when your program owned the console. The VMM manages to control access to the real display, by remapping the real memory(=the virtual memory address space) used by DOS, which still has that weird 20-bit memory line. Even 32-bit protected-mode programs designed to run under a DOS extension called ???extenders??? --I forget the jargon-- still run in a DOS Window. Microsoft has managed to recreate the entire the DOS OS, not just command.com. I think you could host Linux, if you had the right VxDs. The VMM remaps the i486 ports used by the hosted OS's device drivers, so when the Linux kernel writes to port addresses, the VMM traps them in a
Re: Unicode Keyboard Input Linux
Today at 20:13, Elvis Presley wrote: I haven't been able to determine exactly what vmware does from their website, too proprietary, too hush-hush, but I assume they write VxDs which map the Linux kernel to the Windows VMM, and the real hardware. Someone once told me their product ran on the NT platform, but not Windows 98, but it was quite expensive. (All hearsay. No personal experience.) Look at http://bochs.sf.net/, or at least do a better search of the web. This is not the list for such a discussion (whether Linux can or cannot be emulated on Windows). It's fascinating technology, but you'd need inside information to make it work. Google isn't enough. Given enough time, I'm sure these VxDs will appear out of nowhere, as freeware or sharewhare or whatever it's called. Or you could go with Free Software[1] such as bochs running on a Free platform, such as GNU/Linux (though I believe it runs even on some proprietary platforms). It does the complete emulation of Intel architecture, and thus works even across incompatible architectures (it also makes it slower, but you can't have it all). No VxD is needed (and they're not so fascinating, it's just about dumping some code in kernel-level, and using VM86 features of Intel CPUs). [1] http://gnu.org/philosophy/free-sw.html I should have said, unicode text file support. Wordpad still does unicode, but only in Word format, not as a text file, so I can still edit a document in unicode, but I have to copy and paste it into a unicode editor to create a text file. As far as I remember, Notepad on NT (New Technology ;) systems has been doing Unicode for text files as long as it exists (or at least since NT4, that's the first I saw it on), if we consider so-and-so UCS-2/UTF-16 support as Unicode support. Cheers, Danilo -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
On Mon, Jun 14, 2004 at 11:39:44PM +0200, Pablo Saratxaga wrote: Kaixo! On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote: ..[snip].. This is about as complicated as it gets in polytonic Greek, three dead keys, two pre-position, one post-position, 'w' representing omega, and an 'i' for iota subscript. No, dead keys cannot be post-position; they must always be typed *before* the key they modify; that is in fact the very definition of a dead_key: they modify the behavioiur of what is typed after them. If it is typed after it is not a dead key, but just a regular key. The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2 (, omega with psili varia and ypogrammeni) are: Multi_key bar greater grave Greek_omega : U1fa2 Multi_key bar grave greater Greek_omega : U1fa2 Multi_key greater bar grave Greek_omega : U1fa2 Multi_key greater grave bar Greek_omega : U1fa2 Multi_key grave bar greater Greek_omega : U1fa2 Multi_key grave greater bar Greek_omega : U1fa2 dead_iota dead_horn dead_grave Greek_omega : U1fa2 dead_iota dead_grave dead_horn Greek_omega : U1fa2 dead_horn dead_iota dead_grave Greek_omega : U1fa2 dead_horn dead_grave dead_iota Greek_omega : U1fa2 dead_grave dead_iota dead_horn Greek_omega : U1fa2 dead_grave dead_horn dead_iota Greek_omega : U1fa2 6 ways to type it with dead keys (corresponding to the six possible combinations of the three dead keys; but dead keys always after the letter) and 6 ways to type it with Multi_key (you press Multi_key, then the following keys in the given order). Note that, even Multi_key combinations always have the letter last, so that, when a letter arrives, it is certain that the sequence is complete. See my comments below. What you would like would be in fact: dead_horn dead_grave Greek_omega U0345 : U1fa2 dead_grave dead_horn Greek_omega U0345 : U1fa2 (that is, two dead keys, followed by two normal keys; a key sending Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI) I haven't tested it but if it works, it could indeed be added for all the cases and a layout with U0345 instead of dead_iota, if that is more intuitive to type. The keyboard map is therefore more than a map, it is a fsm, a stateful-map. That is not supported at all. If you need that, you need to develop an input method actually (like japanese or vietnamese use), that is, a program that interpretes what you type and produces a different input. Yes there is something of that in console (but very limited) and in X11 (more powerfull), but it is always linear. (also, I m' not sure if it is possible to have, for example, dead_horn dead_grave Greek_omega U0345 and dead_horn dead_grave Greek_omega sequences (that is, sequences that one is subset of another)) You can't. The problem with that is that, if you wanted to type the second sequence, the composition engine wouldn't know whether to stop there and emit the symbol, or to wait for another symbol to complete the sequence. So it waits. This could probably be fixed (partly): when a symbol comes that causes the sequence to become invalid, the engine could check the compose sequence just before the arrival of that symbol, and emit the result. But this is not the current behaviour. If I change keyboards in midstream (using alt-a, for example), the fsm would output the components of an unaccepted character individually. How far will keymaps go? You can't. pressing Alt-A means (or any other key) means you broke the sequence. in such case you simply lost what you typed in the incomplete sequence. Indeed. -- Vasilis Vasaitis A man is well or woe as he thinks himself so. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote: ..[snip].. Comparing characters would be easy, they compare as unsigned integers, but sorting them would be a problem, because you'd want to group all the (accented) vowels together, according to language specific rules. In Greek, this wouldn't be a problem, because monotonic vowels and polytonic vowels, though occupying different code ranges, are not mixed in the same word: they are essentially different languages. A 'tonos' is not a 'oxia' or a 'varia'. Actually, tonos and oxia are treated as equivalents in Unicode. Nevertheless, sorting wouldn't be a problem indeed, because it is done according to the base letter only, punctuation is irrelevant. Why do Greek newspapers still use ISO 8859-7? If it ain't broke, don't fix it. nightmare), but if you're only working in Greek, why not stick with what you know? Exactly. Nothing to do with size issues, and everything to do with that. Plus, a major operating system doesn't really support UTF-8, and instead concentrates on UTF-16, which is unusable in UNIX/GNU systems for most practical purposes. My Microsoft browser(=IE) has problems with ISO Greek and Windows Greek, especially capital Alpha with tonos: it gets confused, and displays a box. Well actually, this particular letter is the only incompatibility between the two character sets. In ISO-8859-7, this letter occupies the code point that MS Word once had hardcoded as representing the paragraph symbol. So for Windows-1253, Microsoft put the paragraph symbol there and moved capital Alpha with tonos elsewhere. -- Vasilis Vasaitis A man is well or woe as he thinks himself so. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
Unicode Keyboard Input Linux Hello, This attached diagram represents my (naive) understanding of terminal IO under Linux/Unix. Elvis PS The real console is essentially a graphical device, with screen(=display), keyboard and mouse, and whatever else might be considered interesting... Applications do not open the real console directly, but in theory, they could --in DOS they could: the interface could be made public; there would have to be a device special file for the real console, and the virtual consoles too, and the pseudo terminals... Have I forgotten anything? The state of the VC mux is controlled by Alt-Func keys. The VC mux sends all console IO to the current console (except the Alt-Func keys): you wouldn't need to run a VC mux in a vc. A virtual console(vc) holds the state of the each unicode terminal: 1) the display contents, 2) the keyboard map, 3) the mouse position (yes, because the vc mux does not do overlapping windows, so mouse position would be independent of each vc), therefore the keymap would be part of the virtual console, not the tty driver (as I thought), so you could change keyboards using any Alt-key combination (but Alt-Func keys are already used by the vc mux). You could not really use keymaps in a traditional tty configuration anyway, because the ascii terminal can't display unicode characters, unless you ran a unicode emulator on a PC, then connected to a shell through a traditional Start/Stop interface. Each vc is a unicode terminal emulator. They understand utf-8, they can display utf-8 and they can generate utf-8. You could put a keymap module in the stream which translated ascii into utf-8 unicode, but why bother? Of course, the tty module still must understand unicode. I don't think this is a big problem, beacuse the basic repetoire remains the same (=ascii) thanks to the utf-8 encoding, but I'm sure there a few hidden traps. The pseudo-terminal driver (i.e. module) has got to be a pretty simple device: it just copies everything it sees to the (traditional) tty driver(=line discipline). In fact, the VC mux could contain each VC, then you'd have a big, multiplexed pseudo-terminal. Anything (module or program) which opens the master side of a pseudo-terminal is called a terminal emulator, therefore a 'vc' and an 'xterm' perform the same function, but in different spaces. I wonder how much of the software can be reused. You need vc's in the kernel in the absence of X, to support Linux virtual terminals. It you run X, you don't need vc's. Therefore, the remote telnet is a terminal emulator too. It connects to a pseudo-terminal through its tcp conncetion. Kermit, too, would be a terminal emulator, even though both, as application programs, might be running in xterms on the remote computers. Kermit has the interesting property that it can download files over its session connection (unlike ftp) which means it would work through the firewall: if you can connect to a telnet server using kermit, you can surely download files, by changing the mode of the emulator, a pretty nice feature. This can get pretty confusing. Is the ftp client a terminal emulator? I think so, because it's control connection is going to be made to a pseudo terminal, but ftp doesn't use getty to check the userid, does it? A virtual console(=VC) has a set of abstract qualities which closely resemble the real console. If there were two types of virtual console, graphical and character-mode, then the real console could be shared with X which opens an instance of a graphical virtual console. There could be more than one instance of X running, why not? Otherwise, you'd have to choose between vterms and xterms. They both do the same thing anyway. Can I use xterm to connect to a remote X server? If I could, the X server would have to validate the user's identity, much like telnet does, using the /etc/passwd file. In the local scenario, xterm starts up shells automatically. In X I've got to use some replacement-for-getty/login program. Unicode C/C++ Application Programming wchar_t is a 32-bit wide character. 1) Does zero(=0x0) still represent end-of-string? 2) Does -1(=0x) still represent end-of-stream? Comparing characters would be easy, they compare as unsigned integers, but sorting them would be a problem, because you'd want to group all the (accented) vowels together, according to language specific rules. In Greek, this wouldn't be a problem, because monotonic vowels and polytonic vowels, though occupying different code ranges, are not mixed in the same word: they are essentially different languages. A 'tonos' is not a 'oxia' or a 'varia'. The editor 'vi' would have to be modified to get/put wcar_t, so I don't understand why you'd need a separate unicode editor, or separate unicode application, whatever it might be. 1) Does 'sort' work on utf-8 input? 2) Does 'grep' (Unix search) work on utf-8 input? 3) Is there a laundry list or Unix filters which need to be changed to support
Re: Unicode Keyboard Input Linux
Kaixo! On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote: Unicode Keyboard Input Linux In fact unicode (trough utf-8 of course) mostly works on the console. The drawbacks are currently tied to the nature of the console (in the current text mode) and not to the encoding. The main drawbacks are: - display is limited to up to 512 different glyphs; it is enough for most alphabetic languages; but it is not enough for CJK languages, for example. - display is limited to 1 char=1 glyph=1 cell paradigm; that means languages like Thai, where a suite of chars can have their glyphs stacked one up the other in a single cell will display horribly; languages needing glyph recomposition like those using indic alphabets are simply impossible. Note that even some languages using latin alphabet are hurt, as they use some accented letters not present in unicode which are encoded as base letter and composing accent. THe difference whith xterm-like terminals here is very huge; on X11 powerfull font functions are available, and there are text terminals that are able to nicely display scripts where 1 char is not necessarly equal to 1 glyph and not necessarly equal to 1 cell; and you are not limited to number of glyphs, so you can write in chinese without problem. Plus, the resolution is much better, and the range of available and choosable fonts much, much, much wider. There are also input problems in console. Typing directly unicode chars (with 1 keystroke = 1 char) is not a problem at all (it is just tedious to write the keymaps, and if you want to support both utf-8 and one or several old encodings, you have to provide a different keyboard file for each encoding; that is very bad, it would be much better to be able to have a single keyb description file, in unicode, and just tell to loadkeys the character set wanted (the default being whatever the glibc says is the default for the current locale). For composing however it's bad; kernel composing tables use char, and so it is not possible to properly use dead keys or compose key while in unicode in the console (if you compose only chars also in the iso-8859-1 character set, it more or less work, you just have to type an extra keystrke, which is lost in outer space; but I doubt it will work for other chars, I suspect the fact it mostly work for some chars is because their iso-8859-1 8 bit code is the same numeric value as their unicode code). For languages needed help of an input method, the console is mostly unusable (it would be very nice to be able to have a single input method backend usable on both the console and X11; but so fat I know of none that does that and that is usable and widely used). Input works (almost) perfectly on X11 (the problem is due to the input framework of XFree86 that doesn't allow to switch input methods; so you cannot type some words in korean, then switch to chinese input... but some programs have started to bypass it, and xorg seems to use an input framework that solves that long standing annoyance). And output works on X11. So it could be a good think to have the engine of a good xterm-like terminal be used for the console, of course removing any unneeded linking to X11 libs; and it would solve a lot of things. Of course it would only work on screens with graphical capabilities, not on real vt100, French minitels or hp48 screens; but nobody is expecting to be able to write in devanagary in such devices I think. The real console is essentially a graphical device, Not always. Not on some local screens on old PCs (it has always been a graphical device on locale screens for all non-PCs ports of linux; but for the PC itself the text mode in the local screen as graphical device is something quite new (you can look at when the framebuffer appeared on the i386 branch of linux to see the exact date). Also, you can redirect the console to another device than a local screen (again, it was there first on non-PCs branchs, I think the SUN ports were first; on PC you can redirect the console to a serial port) In fact, whether the console is physically a graphical device or not, for the operating system it is not; it is just text. That doesn't mean there couldn't been a graphical device, nor that such device couldn't be used for the console, nor that such graphical console couldn't do nice graphical things with text, like it is done on modern xterms on X11. But that is not done trough the normal I/O channels; programs see the console just as a text device, and send text flows, with some control codes to place cursor, change color, etc; but there is no way to play with individual pixels at the console I/O API for example. with screen(=display), keyboard and mouse, and whatever else might be considered interesting... Applications do not open the real console directly, but in theory, they could --in DOS they could: the interface could be made public; there would have to be a device special file for the real
Re: Unicode Keyboard Input Linux
Kaixo! On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote: I might like to vary the dead-key sequence from {accent, letter} to {letter, accent}. On console I'm afraid you can't easily do that, as the compose sequences definition are quite poor. On X11 I would use unicode combining accents instead of dead keys for what you want; then on the Compose file define sequences like: letter U03xx : precomposed letter with accent eg: a U0301 : on console you can define combining accent keys also; but you will most likely be stuck with un-canonical text files (eg, encoded as aU0301 isntead as of aacute) wheter it is a problem for you or not I don't know. I'm particularily interested in polytonic Greek. Once I've selected the keyboard (alt-h), I could type a small_omega_dasia_perispomeni_ypogegrammeni, like this: ascii '`', ascii '~', ascii 'w', ascii 'i' on console you can't define such long compose sequences (unless the kernel handling had been completly rewriten recently) on X11 it is possible; and indeed a lot of polytonic greek combinations have been defined, to use with dead keys or with compose key (called Multi_key on X11). note also that while on X11 there is a lot of dead keys defined, allowing you to type all the greek accents; on console the number of available dead keys is much smaller; I'm not sure it would be enough for all needed accents. This is about as complicated as it gets in polytonic Greek, three dead keys, two pre-position, one post-position, 'w' representing omega, and an 'i' for iota subscript. No, dead keys cannot be post-position; they must always be typed *before* the key they modify; that is in fact the very definition of a dead_key: they modify the behavioiur of what is typed after them. If it is typed after it is not a dead key, but just a regular key. The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2 (, omega with psili varia and ypogrammeni) are: Multi_key bar greater grave Greek_omega : U1fa2 Multi_key bar grave greater Greek_omega : U1fa2 Multi_key greater bar grave Greek_omega : U1fa2 Multi_key greater grave bar Greek_omega : U1fa2 Multi_key grave bar greater Greek_omega : U1fa2 Multi_key grave greater bar Greek_omega : U1fa2 dead_iota dead_horn dead_grave Greek_omega : U1fa2 dead_iota dead_grave dead_horn Greek_omega : U1fa2 dead_horn dead_iota dead_grave Greek_omega : U1fa2 dead_horn dead_grave dead_iota Greek_omega : U1fa2 dead_grave dead_iota dead_horn Greek_omega : U1fa2 dead_grave dead_horn dead_iota Greek_omega : U1fa2 6 ways to type it with dead keys (corresponding to the six possible combinations of the three dead keys; but dead keys always after the letter) and 6 ways to type it with Multi_key (you press Multi_key, then the following keys in the given order). What you would like would be in fact: dead_horn dead_grave Greek_omega U0345 : U1fa2 dead_grave dead_horn Greek_omega U0345 : U1fa2 (that is, two dead keys, followed by two normal keys; a key sending Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI) I haven't tested it but if it works, it could indeed be added for all the cases and a layout with U0345 instead of dead_iota, if that is more intuitive to type. The keyboard map is therefore more than a map, it is a fsm, a stateful-map. That is not supported at all. If you need that, you need to develop an input method actually (like japanese or vietnamese use), that is, a program that interpretes what you type and produces a different input. Yes there is something of that in console (but very limited) and in X11 (more powerfull), but it is always linear. (also, I m' not sure if it is possible to have, for example, dead_horn dead_grave Greek_omega U0345 and dead_horn dead_grave Greek_omega sequences (that is, sequences that one is subset of another)) If I change keyboards in midstream (using alt-a, for example), the fsm would output the components of an unaccepted character individually. How far will keymaps go? You can't. pressing Alt-A means (or any other key) means you broke the sequence. in such case you simply lost what you typed in the incomplete sequence. The alt key is used like the shift key. What ascii character does it send? None. Just as Shift doesn't send any character either. Alt, Shift, Ctrl, etc. are interpreted by the keyboard driver; then the keyboard driver decides what to do; on console those keys decide which one of the many values attached to a given key is to be sent. Those keys doesn't send any character by themselves; it is the combination of them and and one normal key that determines what is sent. -- Ki a vos vye bn, Pablo Saratxaga http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Catalan or Esperanto] [min povas skribi en valona, esperanta, angla aux latinidaj
Unicode Keyboard Input Linux
To: [EMAIL PROTECTED] Re: Unicode Keyboard Input Linux Hello World, I'm interested in using the Linux console as a multi-language keyboard, disregarding graphical X (and xterm) for the moment. 1) How do I switch the keyboard from language to language? I work in English, Greek, Latin (i.e. French, German, Spanish, and Italian), and Russian. I am not interested in right-to-left processing, nor double-column glyphs, yet. Do I use an escape sequence? Do I use an alt-key combination? 2) Can I set up my own keymaps for these languages? Are they defined already? I might like to vary the dead-key sequence from {accent, letter} to {letter, accent}. 3) What about console fonts? How do I get/create them and install them? These fonts won't work on my dot-matrix printer. That's ok, I can print from X. I do not have a Linux PC yet. My computer is Windows 98. I have an older(=2001) version of cygwin installed, but I haven't used it alot. Maybe I should. I have been googling for this information. The descriptions are plentiful, but they all seem to ignore the obvious. Can you help me? Joe PS I read somewhere yesterday that you can switch between Ukranian and English keyboards using the RightAlt key, on Debian, I believe. Since no other examples were given, let me make some proposals: alt-a = ascii alt-d = German alt-f = French i.e. generic french, I don't care about locale yet. alt-g = monotonic Greek alt-h = polytonic Greek (h=homer) alt-l = Latin = {French, German, Spanish, Italian} saves typing alt-r = cyrillic Russian alt-s = Spanish alt-u = cyrillic Ukranian I realize the locale would specify the keyboard layout with more precision --for French, locale = {Belgium, Canada, France, ...}, for Spanish, locale = {Spain, Mexico, Columbia,...} -- but I don't understand locales yet. I need a locale primer too. The list of keyboards should be configurable, meaning another configuration file, in the user's home directory, I guess. Each keyboard would have a keymap, but I didn't understand the man page for keymaps. Is 'keymaps' a console abstraction? Is there another 'keymaps' for X? Then there is the problem of the 9-bit, fixed pitch console fonts (we're ignoring X for the moment). Are there simple tools I can use to roll my own? How do I map unicode(=utf-8) characters to the glyph in the font set? I'm particularily interested in polytonic Greek. Once I've selected the keyboard (alt-h), I could type a small_omega_dasia_perispomeni_ypogegrammeni, like this: ascii '`', ascii '~', ascii 'w', ascii 'i' psili = fine (breathing) dasia = rough (breathing) oxia= accute (accent) varia = grave (accent) perispomeni = circumflex (accent) ypogegrammeni = subscript (iota) prosgegrammeni = prescript (iota omega = big-O, the final letter of the Greek alphabet omicron = small-o, our letter 'o' small = miniscule, lower-case capital = majuscule, upper-case This is about as complicated as it gets in polytonic Greek, three dead keys, two pre-position, one post-position, 'w' representing omega, and an 'i' for iota subscript. The keyboard map is therefore more than a map, it is a fsm, a stateful-map. If I change keyboards in midstream (using alt-a, for example), the fsm would output the components of an unaccepted character individually. How far will keymaps go? The alt key is used like the shift key. What ascii character does it send? (None, so how do I use it for the tty driver? It would be ok for a real keyboard driver, where I have access to keyboard events. I'm thinking the keyboard map should be part of the tty(=ascii) driver, mapping ascii to utf-8, and a teletypewriter only understands ascii...) Escape Sequences Otherwise, I could use an esc sequence to change keyboards, like { esc a, esc g, esc h etc.} Is there already a standard way of doing this? I know escape sequences have already been defined for other control operations on the terminal, why not changing keyboards? What is ISO 2022? The VT-100 had a whole bunch of escape sequences, {blank screen, position cursor, etc.} then there were the ANSI escape sequences, which mapped a standard set of terminal-control operations to a vendor-specific set of escape sequences. The Ctrl Key worked like ths Shift key and was used to output C0 control characters to the tty. Some of the commands I remember are: ctrl-c = break ctrl-z = end-of-file ctrl-s = stop scrolling ctrl-p = print screen? ctrl-b = backspace? What is C1-safe, and why is that a problem for utf-8? Since the C1 range is not part of the ascii table, I don't know why a tty would care. How does a traditional tty driver handle C1 control characters? Anyway, this is how I imagine it. Thanks again. __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ -- Linux-UTF8: i18n of Linux on all levels Archive
Re: Unicode Keyboard Input Linux
On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote: Re: Unicode Keyboard Input Linux I'm interested in using the Linux console as a multi-language keyboard 1) How do I switch the keyboard from language to language? The kernel keyboard driver does not have the concept of language. It has a keymap. You load it with the loadkeys utility. Keymaps are rather powerful. They have 256 possible shift states and any key can be a locking shift, so after pressing one of your chosen key combinations you can use a different part of the keymap. You have a FSM here. I read somewhere yesterday that you can switch between Ukranian and English keyboards using the RightAlt key, This is not a property of Linux, but a property of that particular keymap. You can do things just as you like. let me make some proposals: Proposals to yourself? 2) Can I set up my own keymaps for these languages? Are they defined already? Yes and yes. I might like to vary the dead-key sequence from {accent, letter} to {letter, accent}. You define pairs of arbitrary symbols, so can use 'e just as easily as e'. But so far these compose sequences used pairs of 8-bit characters, not Unicode. Some extremely recent kernels may work. 3) What about console fonts? How do I get/create them and install them? They exist already. But you can make your own, if you want. I do not have a Linux PC yet. I have been googling for this information. The descriptions are plentiful, but they all seem to ignore the obvious. The base documentation is that which comes with the kbd package. Manual pages for loadkeys, setfont, keymaps. These things are tricky and messy, and it is easiest just to leave matters to the distribution. But if you like to fiddle with them yourself, you can. Andries -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/