--- Elvis Presley <[EMAIL PROTECTED]> wrote:
> Request for Comments
>
> Single-Byte Input Method, so called, I presume, because the IM only
> produces single-byte encodings, English, and one other language, like
> Greek or Russian.
Then why not "Multi-Byte Input Method?" :)
Intuitively, an input method converts keyboard events into multi-byte
characters, using a set of keymaps, one for each writing system, to guide the
conversion, so I'll just refer to this set of keymaps as the input method.
A deterministic IM is one which is capable of determining the precise
'character' from the sequence of keysyms.
A non-deterministic IM would follow a sequence of keysyms down to more than one
possible character. At that point, a human operator would be prompted with a
list of possibilities to choose from.
Why the difference?
I guess there are cultural reasons: the 'character' chosen by the operator may
depend on the pragmatics of the language. Maybe there are more ways than one to
write a given character. Chinese characters are pretty complex. The operator
would choose the appropriate symbol based on the context. This doesn't occur in
the western languages, well, not in Greek anyway.
An input method is essentially a filter which translates sequences of keysyms
into characters.
An input method as a server?
This scenario would imply that the terminal emulator have explicit knowledge of
the existence of the keymapping function, because the emulator must ask the
server to do the translation. I don't quite get this scenario. I understand an
IM giving back visual feedback to the operator, to show the stages of character
composition, to prompt for advice... but, the filter could do that too.
Wouldn't an IM be part of the stream connecting the keyboard driver and the
terminal emulator?
In the attached grammar, an assignment statement associates a simple,
arithmetic expression with a name; it is a list (enclosed in parenthesis) of
pairs. The head of the list is the literal 'setq'. The rest is an optional list
of symbol definitions.
For demonstration purposes, a name need only represent a very simple,
integer-valued, numeric expression involving addition.
Numbers can be given in decimal or hexadecimal.
An input method is a set of keymaps. Each keymap is named by the keysym used to
activate it. This keysym is the first element of the keymap definition.
Pressing this keysym causes the IM scanner to output unfinished characters
using the old keymap, reset the state of the scanner, and change keymap.
A keysym represents a key combination, i.e. a "composite" key, "composed" by
holding down the shift keys and the key simultaneously.
A keycode is the unshifted (i.e. normal) keysym. A keycode is a keysym, but a
keysym is not necessarily a keycode.
There are two kinds of keymap statements, the 'compose' statement, which serves
to define the 'action' (i.e. value) of a sequence of keysyms, and the 'keycode'
statement, which defines all the (shift) values (i.e. keysyms) of a keycode.
The compose sequence is terminated by a keycode statement.
Confusing?
Yes. Intuition is better. But I need the grammar to write the parser.
Notice there are no keywords for 'keycode' and 'compose' statements. The
intended meaning can be extracted from the context: the last (or only) element
of a compose sequence is, naturally, a 'keycode' definition.
A new LP (i.e. '(') indicates the next keysym in the sequence.
The nested list structure would seem more intuitive for building keysyms from
keycodes and keyboard events (i.e. make, and break), rather than compose
sequences. Well, it is, that is its real, natural meaning, a simultaneous
keycode event: the keycode events are nested in time.
(Oh yeah, a keycode event is also called a "scancode". A scan code combines the
keycode with the event {make, break}. Keysyms are devoid of events. There are
no {make, break} keysyms. Keysyms represent "key combinations", like Shift+A,
or Ctrl+Alt+Delete.
So why do we need scan codes?
To combine keycodes, making keysyms.
And why do we need keysyms?
To create characters.)
But the nested list also works for sequences of keysyms too. The resulting list
looks just like the IM state table, only you don't have to keep track of the
states. They are represented "lexically" in the list. The nested list is a
logical representation of the compose sequence.
To be frank, I have no idea how to parse the traditional KEYMAPS statement
compose <keysym> <keysym> <keysym> to <character>
It looks intuitive, but each compose statement destroys the logical state of
the scanner. You'd have to do some magic, and I'm too old for tricks. "You
can't teach an old dog new tricks."
Elvis
Multi-Byte Input Method
GRAMMAR
<input method> ::= {<assignment stmt> | <keymap>}*
<assignment stmt> ::= ( setq <symbol def>* )
<symbol def> ::= <name> <expr>
<name> ::= <letter> {<letter> | <digit> | <underscore>}*
<expr> ::= <name> | <number> | <expr> + <expr>
<number> ::= <decimal number> | <hex number> | <zero>
<hex number> ::= '0' 'x' <hex digit>+
<decimal number> ::= {'1'..'9'} <decimal digit>*
<zero> ::= '0'
<keymap> ::= ( <keysym> <compose>* )
<keysym> ::= <expr>
<compose> ::= ( <keysym> <compose>* )
| <keycode stmt>
<keycode stmt> ::= ( <keycode> {<action>}+ )
<keycode> ::= <keysym>
<action> ::= <character> | <string> | <procedure>
EXAMPLE
<attached>
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
;;; The Rosetta Stone
;;; Console Keymaps
;;; SIM Simulator Polytonic Greek Keymap
;;; SIM Simulator State Table
;;;
;;; ordinary symbols representing arbitrary constants
(setq Normal 0)
(setq Shift 256) ;; Shift = 256
(setq Alt 512)
(setq Ctrl 1024)
;;; ordinary symbols representing numeric keysyms
(setq enter 28 ctrl 29 shift 42 alt 56)
(setq F1 59 F2 60 F3 61 F4 62 F5 63 F6 64 F7 65 F8 66)
;; keysyms which represent keymaps
(setq latin Alt+Shift+F1)
(setq greek Alt+Shift+F2)
(setq cyrillic Alt+Shift+F3)
(setq modern_greek Alt+Shift+F4)
(setq thai Alt+Shift+F5)
;; more keysyms
(setq alpha 30) ;; alpha = 30
(setq Alpha Shift+alpha) ;; Alpha = 296
(setq eta 35) ;; eta = 35
(setq omega 47)
(setq oxia 53)
(setq varia 43)
(setq perispomeni Shift+41) ;; perispomeni = 297
(setq ennea 10)
(setq mhden 11)
(setq dasia Shift+ennea) ;; dasia = 266
(setq psili Shift+mhden) ;; psili = 267
(setq ypogegrammeni Shift+varia) ;; ypogegrammeni = 299
;;; ordinary symbols representing Unicode characters
(setq small_alpha 0x3b1
capital_alpha 0x391)
(setq small_alpha_ypogegrammeni 0x1fb3
capital_alpha_prosgegrammeni 0x1fbc)
(setq small_eta 0x3b7
capital_eta 0x397)
(setq small_eta_ypogegrammeni 0x1fc3
capital_eta_prosgegrammeni 0x1fcc)
(setq small_omega 0x3c9
capital_omega 0x3a9)
(setq small_omega_ypogegrammeni 0x1ff3
capital_omega_prosgegrammeni 0x1ffc)
(setq small_omega_varia 0x1f7c
capital_omega_varia 0x1ffa)
(setq small_omega_varia_ypogegrammeni 0x1ff2) ;;; no cap
(setq small_omega_oxia 0x1f7d
capital_omega_oxia 0x1ffb)
(setq small_omega_oxia_ypogegrammeni 0x1ff4) ;;; no cap
(setq small_omega_perispomeni 0x1ff6) ;;; no cap
(setq small_omega_perispomeni_ypogegrammeni 0x1ff7) ;; no cap
(setq small_omega_psili 0x1f60
capital_omega_psili 0x1f68)
(setq small_omega_psili_ypogegrammeni 0x1fa0
capital_omega_psili_prosgegrammeni 0x1fa8)
(setq small_omega_psili_varia 0x1f62
capital_omega_psili_varia 0x1f6a)
(setq small_omega_psili_varia_ypogegrammeni 0x1fa2
capital_omega_psili_varia_prosgegrammeni 0x1faa)
(setq small_omega_psili_oxia 0x1f64
capital_omega_psili_oxia 0x1f6c)
(setq small_omega_psili_oxia_ypogegrammeni 0x1fa4
capital_omega_psili_oxia_prosgegrammeni 0x1fac)
(setq small_omega_psili_perispomeni 0x1f66
capital_omega_psili_perispomeni 0x1fae)
(setq small_omega_psili_perispomeni_ypogegrammeni 0x1fa6
capital_omega_psili_perispomeni_prosgegrammeni 0x1fae)
(latin ... )
(greek
;; keycode 30 = 0x3b1 0x391 ...
(alpha small_alpha capital_alpha
escape_alpha escape_Alpha
ctrl_alpha ctrl_Alpha)
;; 0, 30, 0x3b1, 0
;; 0, 286, 0x391, 0
;; 0, 542, -1, 0
;; 0, 798, -1, 0
;; 0, 1054, -1, 0
;; 0, 1310, -1, 0
;; keycode 35 = 0x3b7 0x397 ...
(eta small_eta capital_alpha
escape_eta escape_Eta
ctrl_eta ctrl_Eta)
;; 0, 35, 0x3b7, 0
;; 0, 291, 0x397, 0
;; keycode 47 = 0x3c9 0x3a9 ...
(omega small_omega
capital_omega
escape_omega
escape_Omega
ctrl_omega
ctrl_Omega)
;; 0, 47, 0x3c9, 0
;; 0, 303, 0x3a9, 0
;; compose 30 23 to 0x1fb3
;; compose 35 23 to 0x1fc3
;; omega as a dead key? Why not!
;; compose 47 23 to 0x1ff3
(alpha (iota small_alpha_ypogegrammeni))
(eta (iota small_eta_ypogegrammeni))
(omega (iota small_omega_ypogegrammeni))
;; 0, 30, 0x03b1, 960
;; 960, 30, 0x1fb3, 0 ᾳ
;; 0, 35, 0x03b7, 961
;; 961, 35, 0x1fc3, 0 ῃ
;; 0, 47, 0x03c9, 962
;; 962, 47, 0x1ff3, 0 ῳ
;; compose 286 23 to 0x1fbc
;; compose 291 23 to 0x1fcc
;; compose 303 23 to 0x1ffc
(Shift+alpha(iota capital_alpha_prosgegrammeni))
(Shift+eta (iota capital_eta_prosgegrammeni))
(Shift+omega(iota capital_omega_prosgegrammeni))
;; 963, 256+30, 0x1fdc, 0
;; 964, 256+35, 0x1fcc, 0
;; 965, 256+47, 0x1ffc, 0
;; canonical, i.e. dead key sequence
;; compose 299 30 to 0x1fb3
;; compose 299 286 to 0x1fbc
;; compose 299 35 to 0x1fc3
;; compose 299 291 to 0x1fcc
;; compose 299 47 to 0x1ff3
;; compose 299 303 to 0x1ffc
(ypogegrammeni
(alpha small_alpha_ypogegrammeni
capital_alpha_prosgegrammeni)
(eta small_eta_ypogegrammeni
capital_eta_prosgegrammeni)
(omega small_omega_ypogegrammeni
capital_omega_prosgegrammeni))
;; 0, 299, '|', 966
;; 966, 30, 0x1fb3, 0
;; 966, 256+30, 0x1fbc, 0
;; 966, 35, 0x1fc3, 0
;; 966, 256+35, 0x1fcc, 0
;; 966, 47, 0x1ff3, 0
;; 966, 256+47, 0x1ffc, 0
;; compose 53 299 30 to -1
;; compose 53 299 35 to -1
;; compose 53 299 47 to 0x1ff3
;; compose 53 30 to -1
;; compose 53 286 to -1
;; compose 53 35 to -1
;; compose 53 291 to -1
;; compose 53 47 to 0x1f7d
;; compose 53 303 to 0x1ffb
(oxia
(ypogegrammeni
(alpha small_alpha_oxia_ypogegrammeni
capital_alpha_oxia_prosgegrammeni
alt_alpha_oxia_ypogegrammeni
alt_shift_alpha_oxia_ypogegrammeni
ctrl_alpha_oxia_ypogegrammeni
ctrl_shift_alpha_oxia_ypogegrammeni
ctrl_alt_alpha_oxia_ypogegrammeni
ctrl_alt_shift_alpha_oxia_ypogegrammeni)
(eta small_eta_oxia_ypogegrammeni)
(omega small_omega_ypogegrammeni))
(alpha small_alpha_oxia
capital_alpha_oxia)
(eta small_eta_oxia
capital_eta_oxia)
(omega small_omega_oxia
capital_omega_oxia)))
;; 0, 53, '/', 967
;; 967, 299, -1, 968
;; 968, 30, 0x , 0
;; 968, 286, 0x , 0
;; 968, 542, 0x , 0
;; 968, 798, 0x , 0
;; 968, 1054, 0x , 0
;; 968, 1310, 0x , 0
;; 968, 1566, 0x , 0
;; 968, 1822, 0x , 0
;; 968, 35, 0x , 0
;; 968, 47, 0x1ff3, 0
;; 967, 30, 0x , 0
;; 967, 286, 0x , 0
;; 967, 35, 0x , 0
;; 967, 299, 0x , 0
;; 967, 47, 0x1f7d, 0
;; 967, 303, 0x1ffb, 0
(psili
(alpha small_alpha_psili capital_alpha_psili)
(eta small_eta_psili capital_eta_psili)
(omega small_omega_psili capital_omega_psili)
(ypogegrammeni
(alpha small_alpha_psili_ypogegrammeni)
(eta small_eta_psili_ypogegrammeni)
(omega small_omega_psili_ypogegrammeni))
(oxia
(alpha small_alpha_psili_oxia)
(eta small_eta_psili_oxia)
(omega small_omega_psili_oxia)
(ypogegrammeni
(alpha small_alpha_psili_oxia_ypogegrammeni)
(eta small_eta_psili_oxia_ypogegrammeni)
(omega small_omega_psili_oxia_ypogegrammeni)))
(varia
(alpha small_alpha_psili_varia)
(eta small_eta_psili_varia)
(omega small_omega_psili_varia)
(ypogegrammeni
(alpha small_alpha_psili_varia_ypogegrammeni)
(eta small_eta_psili_varia_ypogegrammeni)
(omega small_omega_psili_varia_ypogegrammeni)))
(perispomeni
(alpha small_alpha_psili_perispomeni)
(eta small_eta_psili_perispomeni)
(omega small_omega_psili_perispomeni)
(ypogegrammeni
(alpha small_alpha_psili_perispomeni_ypogegrammeni)
(eta small_eta_psili_perispomeni_ypogegrammeni)
(omega small_omega_psili_perispomeni_ypogegrammeni))))
;; 0, 267, ')', 969
;; 969, 30, 'ἀ', 0
;; 969, 286, 'Ἀ', 0
;; 969, 35, 'ἠ', 0
;; 969, 291, 'Ἠ', 0
;; 969, 47, 'ὠ', 0
;; 969, 303, 'Ὠ', 0
;; 969, 299, -1, 970
;; 970, 30, 'ᾀ', 0
;; 970, 35, 'ᾐ', 0
;; 970, 47, 'ᾠ', 0
;; 969, 53, -1, 971
;; 971, 30, 'ἄ', 0
;; 971, 35, 'ἤ', 0
;; 971, 47, 'ὤ', 0
;; 971, 299, -1, 972
;; 972, 30, 'ᾄ', 0
;; 972, 35, 'ᾔ', 0
;; 972, 47, 'ᾤ', 0
;; 969, 43, -1, 973
;; 973, 30, 'ἂ', 0
;; 973, 35, 'ἢ', 0
;; 973, 47, 'ὢ', 0
;; 973, 299, -1, 974
;; 974, 30, 'ᾂ', 0
;; 974, 35, 'ᾒ', 0
;; 974, 47, 'ᾢ', 0
;; 969, 297, -1, 975
;; 975, 30, 'ἆ', 0
;; 975, 35, 'ἦ', 0
;; 975, 47, 'ὦ', 0
;; 975, 299, -1, 976
;; 976, 30, 'ᾆ', 0
;; 976, 35, 'ᾖ', 0
;; 976, 47, 'ᾦ', 0
) ;; End of greek
(cyrillic ... )
(modern_greek ...)
(thai ...)
;; End of Keymap