Unicode Keyboard Input Linux - Naive keymap

2004-06-24 Thread Elvis Presley
Hi All,

This is a utf-8 file produced with Microsoft Word 2000 on Windows 98. Wordpad
only does utf-16. Windows 98 Notepad does not do Unicode at all. I'm going to
send you this message using copypaste into the Yahoo webmail form to see what
happens. I've attached the original.

Tschüss.

Elvis

NaiveKeymap.xml

A keymap is essentially a list of keys. It translates (or maps) keycodes to
charcodes. The keycodes represent keys on the keyboard and must be enumerated
somewhere.

A keymap is also a list(=set) of other keymaps, each keymap containing a list
of keycode to charcode translations. The character set can be given for each
(key)map; it doesn’t have to be Unicode, or utf-8.

For example:

keymap
  map name=english key=ShiftAltF1.../map
  map name=russian key=ShiftAltF2.../map
  map name=greek   key=ShiftAltF3 charset=utf-8
key name=V#969;/key --small omega=U03C9
key name=SemiColon
  key name=W#974;/key/key --small omega with tonos=U03CE
key name=ShiftV#937;/key --capital omega=U03A9
key name=SemiColon
  key name=ShiftV#911;/key/key --capital omega with tonos=U038F
key name=F1#922;#945;#955;#951;#956;#941;#961;#945;
#954;#972;#963;#956;#949;/key --the string, kalhmera kosme
key name=CtrlAltDelete action=reboot/key
  /map
/keymap

I have arbitrarily reserved AltF1, AltF2, AltF3 for VC1, VC2, VC3 etc. --these
keys are not part of the VC keymap. You could use the ShiftAltFn to switch
between maps within a keymap.

Characters are written in utf-8, because this is xml, but you could also use
the Unicode number format like this:

key name=Q char=U+1FF6/key

'compose' key sequences are just nested character definitions.

enum actions {characters, strings, switching keymaps, booting the machine,
etc.};

There is no distinction between characters and strings. String actions are like
character actions, just normal content.

Emulating a Graphical Terminal

What tty mode is rawer than RAW?

It would have to be bit-mapped mode, or tty=null. Just pull out the ldterm
all together and put in a graphical hardware emulator.

How much of the X Protocol can be moved to a kernel-space stream, so that
virtual consoles can share the same graphics interface (including fonts,
keyboard events, and mouse) that X Windows system uses?

DOS programmers will be familiar with the traditional graphical PC interface,
but I am not one of them.

For graphical virtual terminals to work, you would only need the graphics
interface, not the windows, because the notion of a 'window' is fundamentally
different to a VC, which does not share its virtual display with many
applications. (That is what X, running in a VC, is for, to multiplex the VC.) 

Character mode VCs would be built on top of the graphics emulator.

/* Naïve_Keyboard.h */

// Sorry about the mistake

namespace mod {enum {shift=0x8000, ctrl=0x4000, alt=0x2000}; } // --
Correction

namespace event {enum {make=0x0, brk=0x0080}; }

namespace key {enum {ESC=0x01, k1=0x02, k2=0x03, k3=0x04, k4=0x05, k5=0x06,
 k6=0x07, k7=0x08, k8=0x09, k9=0x0a, k0=0x0b, minus=0x0c, equal=0x0d,
 BS=0x0e, tab=0x0f, Q=0x10,  W=0x11,  E=0x12,  R=0x13, T=0x14, Y=0x15,
 U=0x16,  I=0x17};}

const int q_make  = key::Q | event::make;
const int q_break = key::Q | event::brk;

const int q_key   = q_make;

const int q_shift = mod::shift | q_key;
const int q_ctrl  = mod::ctrl  | q_key;
const int q_alt   = mod::alt   | q_key;

const int q_ctrl_alt = mod::ctrl  | mod::alt | q_key;

Composite Keys

The Q key can be represented by the make code(=0x10 | 0x00). Shift-Q is a
composite key being composed of Shift and Q keys (=0x8000 | 0x0010).

Ctrl-Alt-Q = 0x4000 | 0x2000 | 0x0010; ß better

The keyboard driver translates real keyboard scancodes into integer-valued
composite keys, which are more easily mapped to (utf-8) characters.

Users can change keyboard map in midstream(=line) by using an Alt key
combination, so you can't replace a keyboard map as a module in a Stream.

A composite keycode would contain bits set corresponding to the state of the
keyboard.

X Keyboard events (unlike console scan codes) contain not only make, break
events, but they also pass along the state of the keyboard determined by the
modifier keys down at the time.

Applications not interested in break events can ignore them, but they have to
look at the entire keycode to determine the value of the key. (Keycode is part
of scancode.)

Events are translated into characters by a conversion function. This function
must look at a sequence of dead keys before outputting an accented character.

All keys are dead keys until they emit a character.

Traditional terminals send ascii characters to the tty driver in the host, so
the keyboard logic (=event processing) is part of the terminal.





__
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mailHi All,

This is a utf-8 file produced with Microsoft Word 2000 on Windows 98. 

Re: Unicode Keyboard Input Linux

2004-06-24 Thread Denis Barbier
On Sun, Jun 20, 2004 at 11:24:32PM +0200, Denis Barbier wrote:
 I have another question about UTF-8 and kbd: some keymaps are defined
 twice, with Unicode notation and numerical (or litteral) notation, like
 mk-utf.map and mk-cp1251.map.  Since Unicode notation is needed to
 input non-ASCII characters, all keymaps will sooner or later provide
 a -utf variant.  But why is Unicode notation handled in a different
 manner by loadkeys?  This distinction could be made depending on
 KDGKBMODE ioctl but not on input format so that a single keymap can
 be used.

Here is a first patch.  It has not been fully tested, but it should
explain what I am talking about.  The same keymap file can be loaded
when keyboard is in UTF-8 or ASCII mode, and this file can contain
literal strings, numbers or Unicode codepoints.  A charset has to be
declared so that conversion between these formats can be performed
without trouble.

Denis
diff -ur kbd-1.12.orig/src/analyze.l kbd-1.12/src/analyze.l
--- kbd-1.12.orig/src/analyze.l 2004-01-16 22:51:44.0 +0100
+++ kbd-1.12/src/analyze.l  2004-06-24 21:28:14.0 +0200
@@ -77,7 +77,7 @@
 \- {return(DASH);}
 \, {return(COMMA);}
 \+ {return(PLUS);}
-{Unicode}  {yylval=strtol(yytext+1,NULL,16);return(UNUMBER);}
+{Unicode}  {yylval=strtol(yytext+1,NULL,16) ^ 0xf000;return(UNUMBER);}
 {Decimal}|{Octal}|{Hex}{yylval=strtol(yytext,NULL,0);return(NUMBER);}
 RVALUE{Literal}  {return((yylval=ksymtocode(yytext))==-1?ERROR:LITERAL);}
 {Charset}  {return(CHARSET);}
diff -ur kbd-1.12.orig/src/dumpkeys.c kbd-1.12/src/dumpkeys.c
--- kbd-1.12.orig/src/dumpkeys.c2004-01-16 20:45:31.0 +0100
+++ kbd-1.12/src/dumpkeys.c 2004-06-24 23:50:48.0 +0200
@@ -131,11 +131,10 @@
t = KTYP(code);
v = KVAL(code);
if (t = syms_size) {
-   code = code ^ 0xf000;
-   if (!numeric  (p = unicodetoksym(code)) != NULL)
+   if (!numeric  (p = codetoksym(code)) != NULL)
printf(%-16s, p);
else
-   printf(U+%04x  , code);
+   printf(U+%04x  , code ^ 0xf000);
return;
}
if (t == KT_LETTER) {
diff -ur kbd-1.12.orig/src/ksyms.c kbd-1.12/src/ksyms.c
--- kbd-1.12.orig/src/ksyms.c   2004-01-16 20:45:31.0 +0100
+++ kbd-1.12/src/ksyms.c2004-06-25 01:14:42.0 +0200
@@ -1,7 +1,9 @@
+#include linux/kd.h
 #include linux/keyboard.h
 #include stdio.h
 #include string.h
 #include ksyms.h
+#include getfd.h
 #include nls.h
 
 /* Keysyms whose KTYP is KT_LATIN or KT_LETTER and whose KVAL is 0..127. */
@@ -1615,9 +1617,6 @@
 
 /* Functions for both dumpkeys and loadkeys. */
 
-static int prefer_unicode = 0;
-static const char *chosen_charset = NULL;
-
 void
 list_charsets(FILE *f) {
int i,j,lth,ct;
@@ -1655,10 +1654,8 @@
sym *p;
int i;
 
-   if (!strcasecmp(charset, unicode)) {
-   prefer_unicode = 1;
+   if (!strcasecmp(charset, unicode))
return 0;
-   }
 
for (i = 0; i  sizeof(charsets)/sizeof(charsets[0]); i++) {
if (!strcasecmp(charsets[i].charset, charset)) {
@@ -1667,7 +1664,6 @@
if(p-name[0])
syms[0].table[i] = p-name;
}
-   chosen_charset = charset;
return 0;
}
}
@@ -1677,10 +1673,15 @@
 }
 
 const char *
-unicodetoksym(int code) {
+codetoksym(int code) {
int i, j;
sym *p;
 
+   if (KTYP(code) == KT_META)
+   return NULL;
+   if (KTYP(code)  syms_size)
+   return syms[KTYP(code)].table[KVAL(code)];
+   code = code ^ 0xf000;
if (code  0)
return NULL;
if (code  0x80)
@@ -1697,49 +1698,60 @@
 
 /* Functions for loadkeys. */
 
-int unicode_used = 0;
-
 int
 ksymtocode(const char *s) {
int i;
-   int j, jmax;
+   int j;
int keycode;
+   int fd;
+   int kbd_mode;
+   int syms_start = 0;
sym *p;
 
+   if (!s) {
+   fprintf(stderr, %s\n, _(null symbol found));
+   return -1;
+   }
+
+   fd = getfd(NULL);
+   ioctl(fd, KDGKBMODE, kbd_mode);
if (!strncmp(s, Meta_, 5)) {
+   /* Temporarily change kbd_mode to ensure that keycode is
+  right. */
+   ioctl(fd, KDSKBMODE, K_XLATE);
keycode = ksymtocode(s+5);
+   ioctl(fd, KDSKBMODE, kbd_mode);
if (KTYP(keycode) == KT_LATIN)
return K(KT_META, KVAL(keycode));
/* fall through to error printf */
}
 
-   for (i = 0; i  syms_size; i++) {
-   jmax = ((i == 0  prefer_unicode) ? 128 : 

Re: Unicode Keyboard Input Linux

2004-06-23 Thread Elvis Presley
--- Denis Barbier [EMAIL PROTECTED] wrote:
 I have another question about UTF-8 and kbd: some keymaps are defined
 twice, with Unicode notation and numerical (or litteral) notation, like
 mk-utf.map and mk-cp1251.map.  Since Unicode notation is needed to
 input non-ASCII characters, all keymaps will sooner or later provide
 a -utf variant.  But why is Unicode notation handled in a different
 manner by loadkeys?  This distinction could be made depending on
 KDGKBMODE ioctl but not on input format so that a single keymap can
 be used.

In reading keymaps man pages, I naturally assume that these text files are
utf-8, and that international characters can be used in keysym i.e. action
positions, so I don't know exactly what you mean by /unicode/ notation. Is that
some kind of ascii notation used to represent unicode?

Elvis

I have some questions about keymaps myself :

1) Man page of LOADKEYS

BUGS

The keyboard translation table is common for all the virtual consoles, so any
changes to the keyboard bindings affect all the virtual consoles. :-(

This is not a bug; it is a design defect. Each virtual keyboard needs its own
keymap which translates keycodes (= modifier, key, event} into characters.

2) Man page of DUMPKEYS

My cygwin does not respond to any of the keymaps commands, but I haven't
checked my PATH variable (yet).

3) Virtual Consoles and xterms coexist!

You can even run a complete X session(=display?) in each console. The X Window
System will use the virtual console 7 by default. So if you start X and then
switch to one of the text-based virtual consoles, you can go back again to X by
typing Alt-F7.
O'Reilly, Running Linux, 3rd ed. 1999, pg. 94

When X is started, it opens the first unused console [unused? /dev/console as
a clone driver?]. While X is running, you can use Ctrl-Alt-Fn to switch to VTn.
When X finishes, it will return to the original console [huh?]
http://www.tldp.org/HOWTOP/Keyboard-and-Console-HOWTO-13.html

Therefore,

a) 'virtual consoles' must emulate a graphical terminal. (xterms emulate
vt100's, character-mode terminals, so you could not run an instance of X in an
xterm, but it looks like you can (and do!) run X in a virtual console.)

Hypothesis: you should also be able to run an X server as an X client in an X
window, since each window is a graphical terminal emulator. Only the interfaces
are different. I imagine the second, nested version of X, would open an X
window using the xlib protocol.

b) 'vterms(=virtual consoles)' and 'xterms' must be able to coexist.

notes

Tell me if you like this terminology:

/*naive_keyboard.h*/

// I thought C++ enums introduced a new namespace. This is really ugly:

namespace mod {enum {shift=0x8000, ctrl=0xc00, alt=0xe00}; }

// Too bad I can't use the term /break/ in this context:

namespace event {enum {make=0x0, brk=0x0080}; }

// Then, a list of keycodes...

namespace key {enum {ESC=0x01, k1=0x02, k2=0x03, k3=0x04, k4=0x05, k5=0x06,
 k6=0x07, k7=0x08, k8=0x09, k9=0x0a, k0=0x0b, minus=0x0c, equal=0x0d,
 BS=0x0e, tab=0x0f, Q=0x10,  W=0x11,  E=0x12,  R=0x13, T=0x14, Y=0x15,
 U=0x16,  I=0x17};} //etc

//scancodes

const int q_make  = key::Q | event::make;
const int q_break = key::Q | event::brk;

// the Q key (code)

const int q_key   = q_make;

// Composite keys:

const int q_shift = mod::shift | q_key;
const int q_ctrl  = mod::ctrl  | q_key;
const int q_alt   = mod::alt   | q_key;

const int q_ctrl_alt = mod::ctrl  | mod::alt | q_key;

Composite Keys

The Q key can be represented by the make code(=0x10 | 0x00). Shift-Q is a
/*composite*/ key being composed of Shift and Q keys (=0x8000 | 0x0010).

Ctrl-Alt-Q = 0xc00 | 0xe00 | 0x | 0x0010;

The keyboard driver translates real keyboard scancodes into integer-valued
composite keys (=keysyms in some idioms), which are more easily mapped to
(utf-8) characters. X and Microsoft Windows do it this way.

Users can change keyboard map in midstream(=line) by using an Alt key
combination, so you can't replace a keyboard map as a module in a Stream.

A composite keycode would contain bits set corresponding to the state of the
keyboard. Careful, the CapsLock key changes the state of each (virtual)
keyboard.

X Keyboard events (unlike console scan codes) contain not only make, break
events, but they also pass along the state of the keyboard determined by the
modifier keys down at the time.

Applications not interested in break events can ignore them, but they have to
look at the entire keycode to determine the value of the key i.e. they must
consider the modifier bits. (keycode is part of scancode in PC terminology.)

Keyboard events are translated into characters by a conversion function.
(naturally :) This function looks at a sequence of dead keys before
outputting an accented character.

Therefore: /*All keys are dead keys until they emit a character.*/

Traditional terminals send (ascii) characters to the tty driver in the host, so
the keyboard logic (=event processing) was part of the 

Re: Unicode Keyboard Input Linux

2004-06-20 Thread Denis Barbier
I have another question about UTF-8 and kbd: some keymaps are defined
twice, with Unicode notation and numerical (or litteral) notation, like
mk-utf.map and mk-cp1251.map.  Since Unicode notation is needed to
input non-ASCII characters, all keymaps will sooner or later provide
a -utf variant.  But why is Unicode notation handled in a different
manner by loadkeys?  This distinction could be made depending on
KDGKBMODE ioctl but not on input format so that a single keymap can
be used.

Denis

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-17 Thread Elvis Presley
Thank you.

My version of Notepad has a little checkbox in the lower left corner of the
'Save As' dialog box called 'Save as Unicode'.

I never saw it.

'Save as Type' has only Text Documents(*.txt) and All files.

What about the other options, Unicode Big Endian, UTF-8?

There is no entry in the Notepad help.

Elvis

PS

Maybe I don't need a Linux box after all
:)

--- Wu Yongwei [EMAIL PROTECTED] wrote:
 You are wrong.  Check the File - Save As menu item of Notepad.  You will 
 find the encoding option: ANSI, Unicode, Unicode Big Endian, UTF-8 are 
 supported.  You may need to specify a different font if some characters 
 cannot display.
 
 By the way, many think it is a good idea to use real names in mailing 
 list correspondence.
 
 Best regards,
 
 Wu Yongwei
 
 Elvis Presley wrote:
 
   As far as I remember, Notepad on NT (New Technology ;) systems has
   been doing Unicode for text files as long as it exists (or at least
   since NT4, that's the first I saw it on), if we consider so-and-so
   UCS-2/UTF-16 support as Unicode support.
 
 No, I'm sitting at an NT workstation right now, and I see no way to do 
 Unicode in Notepad. In fact, the 'View Source' menu selection on my 
 browser blithley opens Notepad to view html, and everything shows up as 
 boxes but the ascii tags.
 
 On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you 
 can imagine my surprise when the NT workstation at the library reported, 
 Unicode text file support had been removed from this version of Wordpad.
 
 I immediately thought it was a cynical attempt on Microsoft's part to 
 get us to use Word 2000, also installed on the Workstation, but, as I 
 said, it's so fat, I hate using it.
 
 Otherwise, I have no idea why they did it.
 
 Search your memory. If you did see Unicode in Notepad on NT, I'd be 
 interested.
 
 Thanks,
 
 Elvis
 
 
 --
 Linux-UTF8:   i18n of Linux on all levels
 Archive:  http://mail.nl.linux.org/linux-utf8/
 
 




__
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-17 Thread Wu Yongwei
Thank you.
My version of Notepad has a little checkbox in the lower left corner 
of the 'Save As' dialog box called 'Save as Unicode'.

I never saw it.
'Save as Type' has only Text Documents(*.txt) and All files.
What about the other options, Unicode Big Endian, UTF-8?
I do not quite understand it.  I just checked a Windows XP Professional 
box of a colleague's, and found the dialog just as I have described 
(same as Windows 2000).  And his Wordpad has the type Unicode text 
file.  Aren't you using the Home version?

There is no entry in the Notepad help.
Elvis
For the moment maybe we should talk off the mailing list since it is not 
about Linux any more.  Maybe you should first have a Linux box 
installed, use it for a while, and then talk again on the list.

Best regards,
Wu Yongwei

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/


Re: Unicode Keyboard Input Linux

2004-06-16 Thread Elvis Presley
--- Danilo Segan [EMAIL PROTECTED] wrote:
 Today at 20:13, Elvis Presley wrote:
  I should have said, unicode text file support. Wordpad still does
unicode,
  but only in Word format, not as a text file, so I can still edit a document
  in unicode, but I have to copy and paste it into a unicode editor to create
  a text file.
 
 As far as I remember, Notepad on NT (New Technology ;) systems has
 been doing Unicode for text files as long as it exists (or at least
 since NT4, that's the first I saw it on), if we consider so-and-so
 UCS-2/UTF-16 support as Unicode support.

No, I'm sitting at an NT workstation right now, and I see no way to do Unicode
in Notepad. In fact, the 'View Source' menu selection on my browser blithley
opens Notepad to view html, and everything shows up as boxes but the ascii
tags.

On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you can
imagine my surprise when the NT workstation at the library reported, Unicode
text file support had been removed from this version of Wordpad.

I immediately thought it was a cynical attempt on Microsoft's part to get us to
use Word 2000, also installed on the Workstation, but, as I said, it's so fat,
I hate using it.

Otherwise, I have no idea why they did it.

Search your memory. If you did see Unicode in Notepad on NT, I'd be interested.

Thanks,

Elvis

 
 Cheers,
 Danilo
 
 --
 Linux-UTF8:   i18n of Linux on all levels
 Archive:  http://mail.nl.linux.org/linux-utf8/
 
 




__
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-16 Thread Wu Yongwei
You are wrong.  Check the File - Save As menu item of Notepad.  You will 
find the encoding option: ANSI, Unicode, Unicode Big Endian, UTF-8 are 
supported.  You may need to specify a different font if some characters 
cannot display.

By the way, many think it is a good idea to use real names in mailing 
list correspondence.

Best regards,
Wu Yongwei
Elvis Presley wrote:
 As far as I remember, Notepad on NT (New Technology ;) systems has
 been doing Unicode for text files as long as it exists (or at least
 since NT4, that's the first I saw it on), if we consider so-and-so
 UCS-2/UTF-16 support as Unicode support.
No, I'm sitting at an NT workstation right now, and I see no way to do 
Unicode in Notepad. In fact, the 'View Source' menu selection on my 
browser blithley opens Notepad to view html, and everything shows up as 
boxes but the ascii tags.

On Windows 98 I can do utf-16 using Wordpad --it's not so bad-- so you 
can imagine my surprise when the NT workstation at the library reported, 
Unicode text file support had been removed from this version of Wordpad.

I immediately thought it was a cynical attempt on Microsoft's part to 
get us to use Word 2000, also installed on the Workstation, but, as I 
said, it's so fat, I hate using it.

Otherwise, I have no idea why they did it.
Search your memory. If you did see Unicode in Notepad on NT, I'd be 
interested.

Thanks,
Elvis
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/


Re: Unicode Keyboard Input Linux

2004-06-15 Thread Elvis Presley
Hi All,

Thanks for your help. I'm still processing your input.

They just changed my Yahoo! mail; let's hope you get this.

Elvis

PS

User space text-mode only virtual terminals

On Mon, Jun 14, 2004 at 21:38:13 +0200, Pablo Saratxaga wrote:

 [...] It would be perfectly ok to provide only very minimalistic kernel
support (even simpler and lighter than the current one) and have a user space
'vc' loaded early in the boot process. 

Or none at all. Just move the VC mux out of the kernel and into user space:

==
--+  :  +---+  :  +--+
...-:-|console|-:-|  |
--+  :  +---+  +---+   :  |VC mux|
 :   +|tcp|---:-|  |
 :   | +---+   :  +--+
 :   | :
 :   | +---+   :  +-+ 
 :   | +--|tcp|--:-| | 
 :   | |   +---+   :  |vterm| 
 :   | | +-:-| | 
 :   | | | :  +-+ 
 :   | | | m+-+s   ++  :  +-+
 :   | | +-|ptty0|--|tty0|-:-|shell|
 :   | |+-+++  :  +-+
 :   | |   :
 :   | |   +---+   :  +-+ 
 :   | | +|tcp|--:-| | 
 :   | | | +---+   :  |vterm| 
 :   | | |   +-:-| | 
 :   | | |   | :  +-+ 
 :   | | |   | m+-+s   ++  :  +-+
 :   | | |   +-|ptty1|--|tty1|-:-|login|
 :   | | |  +-+++  :  +-+
 :   v v v : 
 : +--++---+   :  +-+
 : ...  --|IP mux|--|tcp|--:-| |
 : +--++---+   :  |vterm|
 :   +-:-| |
 :   | :  +-+
 :   | m+-+s   ++  :  +-+
 :   +-|ptty2|--|tty2|-:-|getty|
 :  +-+++  :  +-+
==

This is exactly the same situation which applies to xterms, only the VC mux
opens the console in character mode. It then forks a fixed number of 'vterms'
as child processes. Each vterm holds the character contents of its display as
well as the state of its keyboard.

Conclusion: vterms and xterms are redundant, so there is no good reason to run
them both at the same time. And xterms are more flexible.

Still, the keyboards are the same, so both could share the same, better(=X)
'keymaps' fsm.

512(=2**9) character glyphs in the vterm character buffer would be plenty for
my purposes: Latin (french, german, spanish, italian), Greek (mono- and
polytonic) and Cyrillic, but I'd have to be able to chose the unicode
characters I want, and map them to glyphs in the console-font.

(You couldn't pull the IP mux out as easily, relying on traditional Unix pipes
for IPC... that's another mailing list.)

  Unicode [...] is prejudiced against non-speakers.

 ??? [...]

I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1
characters. Now everything (but English) is twice the size. (That's not true,
only the accented vowels are.)

The Perseus Project does a nice job with unicode, it has to, because there is
no national character set for poly Greek (well there is, sort of, the encoding
schemes used in academia, but they are less well known, and the unicode font
support is better).

  Why do Greek newspapers still use ISO 8859-7?

 For the same reason that a majority of English language web sites still use
windows-1252, I suppose.

I guess we'll have to ask them.

  it looks like these older character sets will be around for a long time.

 Yes, but not for that reason (to save space); they are around because there
is a lot of *OLD* data in those encodings [...]

http://www.dolnet.ta-nea.gr/ is still producing alot of new material, and their
mix is text-oriented.

I thought it might be because they were using web authoring tools based on the
older, national character set.

Wide characters are easily compressed, by the file system, or the network. In
fact, there is alot of network 

Re: Unicode Keyboard Input Linux

2004-06-15 Thread Elvis Presley
Windows Virtual Machines

Kalhmera kosme.

I'm at the library right now and our NT workstations do not have international
keyboard drivers installed.

So I have to write Greeklish.

Elvis

PS

 On Tue, 15 Jun 2004 at 16:44:48 +0200, Pablo Saratxaga wrote: 

  On Tue, Jun 15, 2004 at 05:55:18AM -0700, Elvis Presley wrote:

  Conclusion: vterms and xterms are redundant, so there is no good reason to
run them both at the same time. And xterms are more flexible.

 Yes, but there is a big difference: xterms need a running X terminal; vterms
don't.

Can you help me out? I don't have a Linux PC. Do vterms and xterms run together
on a real system, on your system?
 
  Still, the keyboards are the same, so both could share the same, better(=X)
'keymaps' fsm.

 The way the keyboards are handled is quite different (on X11 there is a high
hardware abstraction; while the linux keyboard on console interacts directly
with the kernel.

So, it looks like you get a console from the kernel whether you want one or
not. I'm thinking of those virtual terminal emulator processes. It's gotta be
possible to emulate an xterm in a vterm, then neanderthals like me can use
their stone tools.
 
  I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1
characters. Now everything (but English) is twice the size. (That's not true,
only the accented vowels are.)

 And some are 3 bytes long, and some other are 4 bytes long,... But who cares?
What matters is the ability to type any letter used in any human written
language. That is a very huge improvement.

I agree. I'm on your side.

Why do Greek newspapers still use ISO 8859-7?
 
   For the same reason that a majority of English language web sites still
use windows-1252, I suppose.
 
  http://www.dolnet.ta-nea.gr/ is still producing alot of new material,

 [unknown adress]

Sorry about that. The url is:

http://ta-nea.dolnet.gr/

They don't use the 'www' prefix as an alias, and I keep forgetting their parent
company name, 'dolnet'. (I asked them to register 'tanea.gr' but they haven't.)

The Communist Party newspaper is:

http://www.rizospastis.gr/

They have a much nicer name, and they also have a 'text-only' link which does
not download images, just the text. You can get the entire daily newspaper
through http. (Only I don't know if they are using unicode, but I assume not,
it's probably ISO 8859 too. You see? I've become skeptical.)

Is there a version of Linux which runs as a Microsoft Window (not
cygwin)?
 
   ?? What you say doesn't make sense. (you can on the other hand run an
 operating system inside of a virtual computer box inside another operating
 system)
 
  I should have asked, Is there a version of Microsoft Windows which will
run a copy of Linux?

 It doesn't make any more sense in the other way either :)

 Both MS-Windows and Linux are operating systems, you can run one, or the
other, not one inside the other; they are built in order to run at the very
bottom in direct interaction with the hardware.

 They can be run inside an emulated hardware box, but not as normal
programs.

  Microsoft describes Windows as a virtual-machine operating system, and
DOS does, indeed, run, as an operating system in a window.

 I never read of MS-Windows described as a virtual-machine... And what runs
in a window is in fact command.com, which is the equivalent (in much less
powerful) of /bin/bash

  I assume a VxD would map/share the PC hardware, controlled by Windows, to
the device drivers in the Linux kernel.

 No, the linux kernel needs direct access to the hardware. What you need is to
emulate an entire system, like vmware does.

I haven't been able to determine exactly what vmware does from their website,
too proprietary, too hush-hush, but I assume they write VxDs which map the
Linux kernel to the Windows VMM, and the real hardware. Someone once told me
their product ran on the NT platform, but not Windows 98, but it was quite
expensive. (All hearsay. No personal experience.)

The heart of the Windows operating system is called the VMM(=Virtual Machine
Manager). There are alot of descriptions out there, like

http://win32assembly.online.fr/vxd-tut2.html

When the VMM is running an instance of DOS, you get direct access to the DOS
INT21 interface. Your program can even write directly into display memory, just
like the old days, when your program owned the console. The VMM manages to
control access to the real display, by remapping the real memory(=the virtual
memory address space) used by DOS, which still has that weird 20-bit memory
line.

Even 32-bit protected-mode programs designed to run under a DOS extension
called ???extenders??? --I forget the jargon-- still run in a DOS Window.
Microsoft has managed to recreate the entire the DOS OS, not just command.com.

I think you could host Linux, if you had the right VxDs.

The VMM remaps the i486 ports used by the hosted OS's device drivers, so when
the Linux kernel writes to port addresses, the VMM traps them in a 

Re: Unicode Keyboard Input Linux

2004-06-15 Thread Elvis Presley
Windows Virtual Machines

Kalhmera kosme.

I'm at the library right now and our NT workstations do not have international
keyboard drivers installed.

So I have to write Greeklish.

Elvis

PS

 On Tue, 15 Jun 2004 at 16:44:48 +0200, Pablo Saratxaga wrote: 

  On Tue, Jun 15, 2004 at 05:55:18AM -0700, Elvis Presley wrote:

  Conclusion: vterms and xterms are redundant, so there is no good reason to
run them both at the same time. And xterms are more flexible.

 Yes, but there is a big difference: xterms need a running X terminal; vterms
don't.

Can you help me out? I don't have a Linux PC. Do vterms and xterms run together
on a real system, on your system?
 
  Still, the keyboards are the same, so both could share the same, better(=X)
'keymaps' fsm.

 The way the keyboards are handled is quite different (on X11 there is a high
hardware abstraction; while the linux keyboard on console interacts directly
with the kernel.

So, it looks like you get a console from the kernel whether you want one or
not. I'm thinking of those virtual terminal emulator processes. It's gotta be
possible to emulate an xterm in a vterm, then neanderthals like me can use
their stone tools.
 
  I meant to say utf-8. The irony is that utf-8 also blew up the Latin-1
characters. Now everything (but English) is twice the size. (That's not true,
only the accented vowels are.)

 And some are 3 bytes long, and some other are 4 bytes long,... But who cares?
What matters is the ability to type any letter used in any human written
language. That is a very huge improvement.

I agree. I'm on your side.

Why do Greek newspapers still use ISO 8859-7?
 
   For the same reason that a majority of English language web sites still
use windows-1252, I suppose.
 
  http://www.dolnet.ta-nea.gr/ is still producing alot of new material,

 [unknown adress]

Sorry about that. The url is:

http://ta-nea.dolnet.gr/

They don't use the 'www' prefix as an alias, and I keep forgetting their parent
company name, 'dolnet'. (I asked them to register 'tanea.gr' but they haven't.)

The Communist Party newspaper is:

http://www.rizospastis.gr/

They have a much nicer name, and they also have a 'text-only' link which does
not download images, just the text. You can get the entire daily newspaper
through http. (Only I don't know if they are using unicode, but I assume not,
it's probably ISO 8859 too. You see? I've become skeptical.)

Is there a version of Linux which runs as a Microsoft Window (not
cygwin)?
 
   ?? What you say doesn't make sense. (you can on the other hand run an
 operating system inside of a virtual computer box inside another operating
 system)
 
  I should have asked, Is there a version of Microsoft Windows which will
run a copy of Linux?

 It doesn't make any more sense in the other way either :)

 Both MS-Windows and Linux are operating systems, you can run one, or the
other, not one inside the other; they are built in order to run at the very
bottom in direct interaction with the hardware.

 They can be run inside an emulated hardware box, but not as normal
programs.

  Microsoft describes Windows as a virtual-machine operating system, and
DOS does, indeed, run, as an operating system in a window.

 I never read of MS-Windows described as a virtual-machine... And what runs
in a window is in fact command.com, which is the equivalent (in much less
powerful) of /bin/bash

  I assume a VxD would map/share the PC hardware, controlled by Windows, to
the device drivers in the Linux kernel.

 No, the linux kernel needs direct access to the hardware. What you need is to
emulate an entire system, like vmware does.

I haven't been able to determine exactly what vmware does from their website,
too proprietary, too hush-hush, but I assume they write VxDs which map the
Linux kernel to the Windows VMM, and the real hardware. Someone once told me
their product ran on the NT platform, but not Windows 98, but it was quite
expensive. (All hearsay. No personal experience.)

The heart of the Windows operating system is called the VMM(=Virtual Machine
Manager). There are alot of descriptions out there, like

http://win32assembly.online.fr/vxd-tut2.html

When the VMM is running an instance of DOS, you get direct access to the DOS
INT21 interface. Your program can even write directly into display memory, just
like the old days, when your program owned the console. The VMM manages to
control access to the real display, by remapping the real memory(=the virtual
memory address space) used by DOS, which still has that weird 20-bit memory
line.

Even 32-bit protected-mode programs designed to run under a DOS extension
called ???extenders??? --I forget the jargon-- still run in a DOS Window.
Microsoft has managed to recreate the entire the DOS OS, not just command.com.

I think you could host Linux, if you had the right VxDs.

The VMM remaps the i486 ports used by the hosted OS's device drivers, so when
the Linux kernel writes to port addresses, the VMM traps them in a 

Re: Unicode Keyboard Input Linux

2004-06-15 Thread Danilo Segan
Today at 20:13, Elvis Presley wrote:

 I haven't been able to determine exactly what vmware does from their website,
 too proprietary, too hush-hush, but I assume they write VxDs which map the
 Linux kernel to the Windows VMM, and the real hardware. Someone once told me
 their product ran on the NT platform, but not Windows 98, but it was quite
 expensive. (All hearsay. No personal experience.)

Look at http://bochs.sf.net/, or at least do a better search of the
web.  This is not the list for such a discussion (whether Linux can
or cannot be emulated on Windows). 

 It's fascinating technology, but you'd need inside information to make it work.
 Google isn't enough. Given enough time, I'm sure these VxDs will appear out of
 nowhere, as freeware or sharewhare or whatever it's called.

Or you could go with Free Software[1] such as bochs running on a Free
platform, such as GNU/Linux (though I believe it runs even on some
proprietary platforms).  It does the complete emulation of Intel
architecture, and thus works even across incompatible architectures
(it also makes it slower, but you can't have it all). No VxD is
needed (and they're not so fascinating, it's just about dumping some
code in kernel-level, and using VM86 features of Intel CPUs).

[1] http://gnu.org/philosophy/free-sw.html

 I should have said, unicode text file support. Wordpad still does unicode,
 but only in Word format, not as a text file, so I can still edit a document in
 unicode, but I have to copy and paste it into a unicode editor to create a text
 file.

As far as I remember, Notepad on NT (New Technology ;) systems has
been doing Unicode for text files as long as it exists (or at least
since NT4, that's the first I saw it on), if we consider so-and-so
UCS-2/UTF-16 support as Unicode support.

Cheers,
Danilo

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-15 Thread Vasilis Vasaitis
On Mon, Jun 14, 2004 at 11:39:44PM +0200, Pablo Saratxaga wrote:
 Kaixo!
 
 On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote:

..[snip]..

  This is about as complicated as it gets in polytonic
  Greek, three dead keys, two pre-position, one
  post-position, 'w' representing omega, and an 'i' for
  iota subscript. 
 
 No, dead keys cannot be post-position; they must always be typed
 *before* the key they modify; that is in fact the very definition
 of a dead_key: they modify the behavioiur of what is typed after them.
 If it is typed after it is not a dead key, but just a regular key.
 
 The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2
 (, omega with psili varia and ypogrammeni) are:
 
 Multi_key bar greater grave Greek_omega   :   U1fa2
 Multi_key bar grave greater Greek_omega   :   U1fa2
 Multi_key greater bar grave Greek_omega   :   U1fa2
 Multi_key greater grave bar Greek_omega   :   U1fa2
 Multi_key grave bar greater Greek_omega   :   U1fa2
 Multi_key grave greater bar Greek_omega   :   U1fa2
 dead_iota dead_horn dead_grave Greek_omega  :   U1fa2
 dead_iota dead_grave dead_horn Greek_omega  :   U1fa2
 dead_horn dead_iota dead_grave Greek_omega  :   U1fa2
 dead_horn dead_grave dead_iota Greek_omega  :   U1fa2
 dead_grave dead_iota dead_horn Greek_omega  :   U1fa2
 dead_grave dead_horn dead_iota Greek_omega  :   U1fa2
 
 6 ways to type it with dead keys (corresponding to the six
 possible combinations of the three dead keys; but dead keys
 always after the letter)
 and 6 ways to type it with Multi_key (you press Multi_key, then
 the following keys in the given order).

  Note that, even Multi_key combinations always have the letter last,
so that, when a letter arrives, it is certain that the sequence is
complete. See my comments below.

 What you would like would be in fact:
 
 dead_horn dead_grave Greek_omega U0345 :   U1fa2
 dead_grave dead_horn Greek_omega U0345 :   U1fa2
 
 (that is, two dead keys, followed by two normal keys; a key sending
 Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI)
 
 I haven't tested it but if it works, it could indeed be added for
 all the cases and a layout with U0345 instead of dead_iota, if
 that is more intuitive to type.
 
  The keyboard map is therefore more than a map, it is a
  fsm, a stateful-map.
 
 That is not supported at all.
 If you need that, you need to develop an input method actually
 (like japanese or vietnamese use), that is, a program that interpretes
 what you type and produces a different input.
 
 Yes there is something of that in console (but very limited) and
 in X11 (more powerfull), but it is always linear.
 
 (also, I m' not sure if it is possible to have, for example,
 dead_horn dead_grave Greek_omega U0345 and
 dead_horn dead_grave Greek_omega sequences (that is, sequences
 that one is subset of another))

  You can't. The problem with that is that, if you wanted to type the
second sequence, the composition engine wouldn't know whether to stop
there and emit the symbol, or to wait for another symbol to complete
the sequence. So it waits. This could probably be fixed (partly): when
a symbol comes that causes the sequence to become invalid, the engine
could check the compose sequence just before the arrival of that
symbol, and emit the result. But this is not the current behaviour.

  If I change keyboards in
  midstream (using alt-a, for example), the fsm would
  output the components of an unaccepted character
  individually. How far will keymaps go?
 
 You can't.
 pressing Alt-A means (or any other key) means you broke the sequence.
 in such case you simply lost what you typed in the incomplete sequence.

  Indeed.


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-15 Thread Vasilis Vasaitis
On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote:

..[snip]..

 Comparing characters would be easy, they compare as
 unsigned integers, but sorting them would be a
 problem, because you'd want to group all the
 (accented) vowels together, according to language
 specific rules. In Greek, this wouldn't be a problem,
 because monotonic vowels and polytonic vowels, though
 occupying different code ranges, are not mixed in the
 same word: they are essentially different languages. A
 'tonos' is not a 'oxia' or a 'varia'.

  Actually, tonos and oxia are treated as equivalents in Unicode.
Nevertheless, sorting wouldn't be a problem indeed, because it is done
according to the base letter only, punctuation is irrelevant.

 Why do Greek newspapers still use ISO 8859-7?

  If it ain't broke, don't fix it.

 nightmare), but if you're only working in Greek, why
 not stick with what you know?

  Exactly. Nothing to do with size issues, and everything to do with
that. Plus, a major operating system doesn't really support UTF-8, and
instead concentrates on UTF-16, which is unusable in UNIX/GNU systems
for most practical purposes.

 My Microsoft browser(=IE) has problems with ISO Greek
 and Windows Greek, especially capital Alpha with
 tonos: it gets confused, and displays a box.

  Well actually, this particular letter is the only incompatibility
between the two character sets. In ISO-8859-7, this letter occupies
the code point that MS Word once had hardcoded as representing the
paragraph symbol. So for Windows-1253, Microsoft put the paragraph
symbol there and moved capital Alpha with tonos elsewhere.



-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-14 Thread Elvis Presley
Unicode Keyboard Input Linux

Hello,

This attached diagram represents my (naive)
understanding of terminal IO under Linux/Unix.

Elvis

PS

The real console is essentially a graphical device,
with screen(=display), keyboard and mouse, and
whatever else might be considered interesting...
Applications do not open the real console directly,
but in theory, they could --in DOS they could: the
interface could be made public; there would have to be
a device special file for the real console, and the
virtual consoles too, and the pseudo terminals... Have
I forgotten anything?

The state of the VC mux is controlled by Alt-Func
keys. The VC mux sends all console IO to the current
console (except the Alt-Func keys): you wouldn't need
to run a VC mux in a vc.

A virtual console(vc) holds the state of the each
unicode terminal: 1) the display contents, 2) the
keyboard map, 3) the mouse position (yes, because the
vc mux does not do overlapping windows, so mouse
position would be independent of each vc), therefore
the keymap would be part of the virtual console, not
the tty driver (as I thought), so you could change
keyboards using any Alt-key combination (but Alt-Func
keys are already used by the vc mux).

You could not really use keymaps in a traditional tty
configuration anyway, because the ascii terminal can't
display unicode characters, unless you ran a unicode
emulator on a PC, then connected to a shell through a
traditional Start/Stop interface. Each vc is a unicode
terminal emulator. They understand utf-8, they can
display utf-8 and they can generate utf-8.

You could put a keymap module in the stream which
translated ascii into utf-8 unicode, but why bother?

Of course, the tty module still must understand
unicode. I don't think this is a big problem, beacuse
the basic repetoire remains the same (=ascii) thanks
to the utf-8 encoding, but I'm sure there a few hidden
traps.

The pseudo-terminal driver (i.e. module) has got to be
a pretty simple device: it just copies everything it
sees to the (traditional) tty driver(=line
discipline). In fact, the VC mux could contain each
VC, then you'd have a big, multiplexed
pseudo-terminal.

Anything (module or program) which opens the master
side of a pseudo-terminal is called a terminal
emulator, therefore a 'vc' and an 'xterm' perform the
same function, but in different spaces. I wonder how
much of the software can be reused. You need vc's in
the kernel in the absence of X, to support Linux
virtual terminals. It you run X, you don't need vc's.

Therefore, the remote telnet is a terminal emulator
too. It connects to a pseudo-terminal through its tcp
conncetion. Kermit, too, would be a terminal emulator,
even though both, as application programs, might be
running in xterms on the remote computers. Kermit has
the interesting property that it can download files
over its session connection (unlike ftp) which means
it would work through the firewall: if you can connect
to a telnet server using kermit, you can surely
download files, by changing the mode of the emulator,
a pretty nice feature. This can get pretty confusing.

Is the ftp client a terminal emulator? I think so,
because it's control connection is going to be made to
a pseudo terminal, but ftp doesn't use getty to check
the userid, does it? 

A virtual console(=VC) has a set of abstract qualities
which closely resemble the real console. If there were
two types of virtual console, graphical and
character-mode, then the real console could be shared
with X which opens an instance of a graphical virtual
console. There could be more than one instance of X
running, why not? Otherwise, you'd have to choose
between vterms and xterms. They both do the same thing
anyway.

Can I use xterm to connect to a remote X server? If I
could, the X server would have to validate the user's
identity, much like telnet does, using the /etc/passwd
file. In the local scenario, xterm starts up shells
automatically. In X I've got to use some
replacement-for-getty/login program.

Unicode C/C++ Application Programming

wchar_t is a 32-bit wide character.

1) Does zero(=0x0) still represent end-of-string?

2) Does -1(=0x) still represent end-of-stream?

Comparing characters would be easy, they compare as
unsigned integers, but sorting them would be a
problem, because you'd want to group all the
(accented) vowels together, according to language
specific rules. In Greek, this wouldn't be a problem,
because monotonic vowels and polytonic vowels, though
occupying different code ranges, are not mixed in the
same word: they are essentially different languages. A
'tonos' is not a 'oxia' or a 'varia'.

The editor 'vi' would have to be modified to get/put
wcar_t, so I don't understand why you'd need a
separate unicode editor, or separate unicode
application, whatever it might be.

1) Does 'sort' work on utf-8 input?

2) Does 'grep' (Unix search) work on utf-8 input?

3) Is there a laundry list or Unix filters which need
to be changed to support

Re: Unicode Keyboard Input Linux

2004-06-14 Thread Pablo Saratxaga
Kaixo!

On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote:

 Unicode Keyboard Input Linux

In fact unicode (trough utf-8 of course) mostly works on the console.
The drawbacks are currently tied to the nature of the console (in
the current text mode) and not to the encoding.

The main drawbacks are:
- display is limited to up to 512 different glyphs; it is enough for
  most alphabetic languages; but it is not enough for CJK languages,
  for example.
- display is limited to 1 char=1 glyph=1 cell paradigm; that means
  languages like Thai, where a suite of chars can have their glyphs
  stacked one up the other in a single cell will display horribly;
  languages needing glyph recomposition like those using indic alphabets
  are simply impossible.
  Note that even some languages using latin alphabet are hurt, as they
  use some accented letters not present in unicode which are encoded
  as base letter and composing accent.

THe difference whith xterm-like terminals here is very huge; on X11
powerfull font functions are available, and there are text terminals
that are able to nicely display scripts where 1 char is not necessarly
equal to 1 glyph and not necessarly equal to 1 cell; and you are not
limited to number of glyphs, so you can write in chinese without
problem.
Plus, the resolution is much better, and the range of available and
choosable fonts much, much, much wider.

There are also input problems in console.
Typing directly unicode chars (with 1 keystroke = 1 char) is not a
problem at all (it is just tedious to write the keymaps, and if you want
to support both utf-8 and one or several old encodings, you have to
provide a different keyboard file for each encoding; that is very bad,
it would be much better to be able to have a single keyb description
file, in unicode, and just tell to loadkeys the character set wanted
(the default being whatever the glibc says is the default for the
current locale).

For composing however it's bad; kernel composing tables use char,
and so it is not possible to properly use dead keys or compose key
while in unicode in the console (if you compose only chars also 
in the iso-8859-1 character set, it more or less work, you just have to
type an extra keystrke, which is lost in outer space; but I doubt it
will work for other chars, I suspect the fact it mostly work for
some chars is because their iso-8859-1 8 bit code is the same numeric
value as their unicode code). 

For languages needed help of an input method, the console is mostly
unusable (it would be very nice to be able to have a single input method
backend usable on both the console and X11; but so fat I know of none
that does that and that is usable and widely used).

Input works (almost) perfectly on X11 (the problem is due to the input
framework of XFree86 that doesn't allow to switch input methods; so
you cannot type some words in korean, then switch to chinese input...
but some programs have started to bypass it, and xorg seems to use
an input framework that solves that long standing annoyance).

And output works on X11.

So it could be a good think to have the engine of a good xterm-like
terminal be used for the console, of course removing any unneeded
linking to X11 libs; and it would solve a lot of things.
Of course it would only work on screens with graphical capabilities,
not on real vt100, French minitels or hp48 screens; but nobody is
expecting to be able to write in devanagary in such devices I think.

 The real console is essentially a graphical device,

Not always.
Not on some local screens on old PCs (it has always been a graphical
device on locale screens for all non-PCs ports of linux; but for the PC
itself the text mode in the local screen as graphical device is
something quite new (you can look at when the framebuffer appeared 
on the i386 branch of linux to see the exact date).
Also, you can redirect the console to another device than a local screen
(again, it was there first on non-PCs branchs, I think the SUN ports
were first; on PC you can redirect the console to a serial port)

In fact, whether the console is physically a graphical device or not,
for the operating system it is not; it is just text.
That doesn't mean there couldn't been a graphical device, nor that
such device couldn't be used for the console, nor that such graphical
console couldn't do nice graphical things with text, like it is done
on modern xterms on X11.
But that is not done trough the normal I/O channels; programs see the
console just as a text device, and send text flows, with some control
codes to place cursor, change color, etc; but there is no way to
play with individual pixels at the console I/O API for example.

 with screen(=display), keyboard and mouse, and
 whatever else might be considered interesting...
 Applications do not open the real console directly,
 but in theory, they could --in DOS they could: the
 interface could be made public; there would have to be
 a device special file for the real

Re: Unicode Keyboard Input Linux

2004-06-14 Thread Pablo Saratxaga
Kaixo!

On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote:

 I might like to vary the dead-key sequence from
 {accent, letter} to {letter, accent}. 

On console I'm afraid you can't easily do that,
as the compose sequences definition are quite poor.

On X11 I would use unicode combining accents instead
of dead keys for what you want; then on the Compose file
define sequences like:

letter U03xx : precomposed letter with accent

eg:

a U0301 : 

on console you can define combining accent keys also;
but you will most likely be stuck with un-canonical
text files (eg, encoded as aU0301 isntead as of aacute)
wheter it is a problem for you or not I don't know.

 I'm particularily interested in polytonic Greek.
 
 Once I've selected the keyboard (alt-h), I could type
 a small_omega_dasia_perispomeni_ypogegrammeni, like
 this: 
 
 ascii '`', ascii '~', ascii 'w', ascii 'i' 

on console you can't define such long compose sequences (unless
the kernel handling had been completly rewriten recently)
on X11 it is possible; and indeed a lot of polytonic greek
combinations have been defined, to use with dead keys or
with compose key (called Multi_key on X11).

note also that while on X11 there is a lot of dead keys defined,
allowing you to type all the greek accents; on console the
number of available dead keys is much smaller; I'm not sure
it would be enough for all needed accents.

 This is about as complicated as it gets in polytonic
 Greek, three dead keys, two pre-position, one
 post-position, 'w' representing omega, and an 'i' for
 iota subscript. 

No, dead keys cannot be post-position; they must always be typed
*before* the key they modify; that is in fact the very definition
of a dead_key: they modify the behavioiur of what is typed after them.
If it is typed after it is not a dead key, but just a regular key.

The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2
(, omega with psili varia and ypogrammeni) are:

Multi_key bar greater grave Greek_omega   :   U1fa2
Multi_key bar grave greater Greek_omega   :   U1fa2
Multi_key greater bar grave Greek_omega   :   U1fa2
Multi_key greater grave bar Greek_omega   :   U1fa2
Multi_key grave bar greater Greek_omega   :   U1fa2
Multi_key grave greater bar Greek_omega   :   U1fa2
dead_iota dead_horn dead_grave Greek_omega  :   U1fa2
dead_iota dead_grave dead_horn Greek_omega  :   U1fa2
dead_horn dead_iota dead_grave Greek_omega  :   U1fa2
dead_horn dead_grave dead_iota Greek_omega  :   U1fa2
dead_grave dead_iota dead_horn Greek_omega  :   U1fa2
dead_grave dead_horn dead_iota Greek_omega  :   U1fa2

6 ways to type it with dead keys (corresponding to the six
possible combinations of the three dead keys; but dead keys
always after the letter)
and 6 ways to type it with Multi_key (you press Multi_key, then
the following keys in the given order).

What you would like would be in fact:

dead_horn dead_grave Greek_omega U0345 :   U1fa2
dead_grave dead_horn Greek_omega U0345 :   U1fa2

(that is, two dead keys, followed by two normal keys; a key sending
Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI)

I haven't tested it but if it works, it could indeed be added for
all the cases and a layout with U0345 instead of dead_iota, if
that is more intuitive to type.

 The keyboard map is therefore more than a map, it is a
 fsm, a stateful-map.

That is not supported at all.
If you need that, you need to develop an input method actually
(like japanese or vietnamese use), that is, a program that interpretes
what you type and produces a different input.

Yes there is something of that in console (but very limited) and
in X11 (more powerfull), but it is always linear.

(also, I m' not sure if it is possible to have, for example,
dead_horn dead_grave Greek_omega U0345 and
dead_horn dead_grave Greek_omega sequences (that is, sequences
that one is subset of another))

 If I change keyboards in
 midstream (using alt-a, for example), the fsm would
 output the components of an unaccepted character
 individually. How far will keymaps go?

You can't.
pressing Alt-A means (or any other key) means you broke the sequence.
in such case you simply lost what you typed in the incomplete sequence.

 The alt key is used like the shift key. What ascii
 character does it send? 

None.
Just as Shift doesn't send any character either.

Alt, Shift, Ctrl, etc. are interpreted by the keyboard driver;
then the keyboard driver decides what to do;
on console those keys decide which one of the many values attached to
a given key is to be sent.
Those keys doesn't send any character by themselves; it is the
combination of them and and one normal key that determines what is
sent.


-- 
Ki a vos vye bn,
Pablo Saratxaga

http://chanae.walon.org/pablo/  PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Catalan or Esperanto]
[min povas skribi en valona, esperanta, angla aux latinidaj 

Unicode Keyboard Input Linux

2004-06-12 Thread Elvis Presley
To: [EMAIL PROTECTED] 
 
Re: Unicode Keyboard Input Linux 

Hello World, 

I'm interested in using the Linux console as a
multi-language keyboard, disregarding graphical X (and
xterm) for the moment. 

1) How do I switch the keyboard from language to
language?

I work in English, Greek, Latin (i.e. French, German,
Spanish, and Italian), and Russian. I am not
interested in right-to-left processing, nor
double-column glyphs, yet. 

Do I use an escape sequence? 

Do I use an alt-key combination? 

2) Can I set up my own keymaps for these languages?
Are they defined already?

I might like to vary the dead-key sequence from
{accent, letter} to {letter, accent}. 

3) What about console fonts? How do I get/create them
and install them? These fonts won't work on my
dot-matrix printer. That's ok, I can print from X. 

I do not have a Linux PC yet. My computer is Windows
98. I have an older(=2001) version of cygwin
installed, but I haven't used it alot. Maybe I should.
I have been googling for this information. The
descriptions are plentiful, but they all seem to
ignore the obvious. 

Can you help me? 

Joe 

PS 

I read somewhere yesterday that you can switch between
Ukranian and English keyboards using the RightAlt key,
on Debian, I believe. Since no other examples were
given, let me make some proposals: 

alt-a = ascii 
alt-d = German 
alt-f = French i.e. generic french, I don't care about
locale yet. 
alt-g = monotonic Greek 
alt-h = polytonic Greek (h=homer) 
alt-l = Latin = {French, German, Spanish, Italian}
saves typing 
alt-r = cyrillic Russian 
alt-s = Spanish 
alt-u = cyrillic Ukranian 

I realize the locale would specify the keyboard layout
with more precision --for French, locale = {Belgium,
Canada, France, ...}, for Spanish, locale = {Spain,
Mexico, Columbia,...} -- but I don't understand
locales yet. I need a locale primer too. 

The list of keyboards should be configurable, meaning
another configuration file, in the user's home
directory, I guess. Each keyboard would have a keymap,
but I didn't understand the man page for keymaps. Is
'keymaps' a console abstraction? Is there another
'keymaps' for X?

Then there is the problem of the 9-bit, fixed pitch
console fonts (we're ignoring X for the moment). Are
there simple tools I can use to roll my own? How do I
map unicode(=utf-8) characters to the glyph in the
font set?

I'm particularily interested in polytonic Greek.

Once I've selected the keyboard (alt-h), I could type
a small_omega_dasia_perispomeni_ypogegrammeni, like
this: 

ascii '`', ascii '~', ascii 'w', ascii 'i' 

psili   = fine   (breathing) 
dasia   = rough  (breathing) 
oxia= accute (accent) 
varia   = grave  (accent) 
perispomeni = circumflex (accent) 
ypogegrammeni = subscript (iota) 
prosgegrammeni = prescript (iota 

omega   = big-O, the final letter of the Greek
alphabet 
omicron = small-o, our letter 'o' 

small   = miniscule, lower-case 
capital = majuscule, upper-case 

This is about as complicated as it gets in polytonic
Greek, three dead keys, two pre-position, one
post-position, 'w' representing omega, and an 'i' for
iota subscript. 

The keyboard map is therefore more than a map, it is a
fsm, a stateful-map. If I change keyboards in
midstream (using alt-a, for example), the fsm would
output the components of an unaccepted character
individually. How far will keymaps go?

The alt key is used like the shift key. What ascii
character does it send? 

(None, so how do I use it for the tty driver? It would
be ok for a real keyboard driver, where I have access
to keyboard events. I'm thinking the keyboard map
should be part of the tty(=ascii) driver, mapping
ascii to utf-8, and a teletypewriter only understands
ascii...) 

Escape Sequences 

Otherwise, I could use an esc sequence to change
keyboards, like { esc a, esc g, esc h etc.} 

Is there already a standard way of doing this? 

I know escape sequences have already been defined for
other control operations on the terminal, why not
changing keyboards? 

What is ISO 2022? 

The VT-100 had a whole bunch of escape sequences,
{blank screen, position cursor, etc.} then there were
the ANSI escape sequences, which mapped a standard set
of terminal-control operations to a vendor-specific
set of escape sequences. 

The Ctrl Key worked like ths Shift key and was used to
output C0 control characters to the tty. Some of the
commands I remember are: 

ctrl-c = break 
ctrl-z = end-of-file 
ctrl-s = stop scrolling 
ctrl-p = print screen? 
ctrl-b = backspace? 

What is C1-safe, and why is that a problem for utf-8?
Since the C1 range is not part of the ascii table, I
don't know why a tty would care. How does a
traditional tty driver handle C1 control characters? 

Anyway, this is how I imagine it. 

Thanks again. 





__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

--
Linux-UTF8:   i18n of Linux on all levels
Archive

Re: Unicode Keyboard Input Linux

2004-06-12 Thread Andries Brouwer
On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote:

 Re: Unicode Keyboard Input Linux 
 
 I'm interested in using the Linux console as a
 multi-language keyboard
 
 1) How do I switch the keyboard from language to
 language?

The kernel keyboard driver does not have the concept of language.
It has a keymap. You load it with the loadkeys utility.
Keymaps are rather powerful. They have 256 possible shift states
and any key can be a locking shift, so after pressing one of your
chosen key combinations you can use a different part of the keymap.
You have a FSM here.

 I read somewhere yesterday that you can switch between
 Ukranian and English keyboards using the RightAlt key,

This is not a property of Linux, but a property of that particular keymap.
You can do things just as you like.

 let me make some proposals:

Proposals to yourself?


 2) Can I set up my own keymaps for these languages?
 Are they defined already?

Yes and yes.

 I might like to vary the dead-key sequence from
 {accent, letter} to {letter, accent}. 

You define pairs of arbitrary symbols, so can use 'e just
as easily as e'. But so far these compose sequences used
pairs of 8-bit characters, not Unicode.
Some extremely recent kernels may work.

 3) What about console fonts? How do I get/create them
 and install them?

They exist already. But you can make your own, if you want.

 I do not have a Linux PC yet.

 I have been googling for this information. The
 descriptions are plentiful, but they all seem to
 ignore the obvious. 

The base documentation is that which comes with the kbd package.
Manual pages for loadkeys, setfont, keymaps.

These things are tricky and messy, and it is easiest just to
leave matters to the distribution. But if you like to fiddle
with them yourself, you can.

Andries

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/