Re: Experiments with classical Greek keyboard input
Since I'm still cc'ed here... On Tue, May 09, 2006 at 09:04:52PM +0300, Joe Schaffner wrote: ..[snip].. > is on the semi-colon key and is on the same > key, shifted, the colon key. > > is on the single-quote key and is on the > double-quote key. > > That's a pretty good layout. I like it. > > Why not name these keysyms and ? Because the list of keysyms is fixed, as defined in /usr/include/X11/keysymdef.h. At the time, using arbitrary existing keysyms made more sense than petitioning for "correctly-named" new ones. It works, after all. But OK, now maybe it's time to ask for a few new names if people are annoyed by the current state of affairs. > Anyway, I activate the gr keymap like this: > > setxkbmap "us,gr(polytonic)" -option "grp:alt_shift_toggle" > > The command syntax is troublesome. There seem to be other ways of > doing it. Maybe I'm wrong, but it seems to work. The canonical invocation would be: setxkbmap -layout us,gr -variant ,polytonic \ -option grp:alt_shift_toggle > Yes, the keymap is there, I can see it on the task bar. To switch to > another group, I can use the alt_shift combination (another meta > symbol? Where are all these symbols defined?). In /etc/X11/xkb, rules/xorg transforms grp:alt_shift_toggle to group(alt_shift_toggle). So you can look at the relevant section in symbols/group to see how this implements the layout switching. It all boils down to the generation of the ISO_Next_Group and ISO_Prev_Group keysyms. > Yes, I can enter greek characters. The seems to work, but > I am not sure if it is outputting a tonos or a acute. It's probably a > tonos. > > None of the other dead keys seem to work. > > Any ideas? For the others to work, you need to have at least LC_CTYPE=el_GR.UTF-8. In my system, with LANG=el_GR.UTF-8, everything is working as it should. Keep in mind that for GTK+ applications you also need GTK_IM_MODULE=xim defined (or else you have to right-click on each textbox, and select Input Methods -> X Input Method). -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Text printing with Openoffice
On Mon, Nov 14, 2005 at 09:51:27AM +0100, Koblinger Egmont wrote: > On Mon, Nov 14, 2005 at 12:52:23AM +0800, Abel Cheung wrote: > > > On 11/13/05, Koblinger Egmont <[EMAIL PROTECTED]> wrote: > > > fancy, just the good old fixed-width fonts with 80 columns, but the > > > accented > > > (NFC) letters are okay. > > > > ... While all multibyte characters become junk. (since 2001) > > What do you mean by multibyte characters? Of course all the accented letters > are multibyte characters in UTF-8. I created several simple text files in > UTF-8 encoding, containing standard accented letters that are also part of > latin-1 or latin-2 (e.g. e with acute grave, e with acute accent, o with > double acute) as well as euro symbol, low-99 and high-99 quote marks etc., > sent them to the printer with "lpr filename" (with LANG=hu_HU.UTF-8 and no > other LC_* variables) and they all got printed correctly. I tried printing a simple UTF-8 text file with greek text, and the result was quite inadequate. It managed to get the simple letters from the Symbol font (I assume), but the accented letters did not get printed out at all. The result is both ugly and unreadable for the most part. The OOo method, on the other hand, handled it fine. > What I didn't test is double-width (cjk) characters, combining symbols, > non-printable characters, invalid UTF-8 sequences and other similar more > tricky files. It's easily possible that OOo is better in this respect. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote: ..[snip].. > Comparing characters would be easy, they compare as > unsigned integers, but sorting them would be a > problem, because you'd want to group all the > (accented) vowels together, according to language > specific rules. In Greek, this wouldn't be a problem, > because monotonic vowels and polytonic vowels, though > occupying different code ranges, are not mixed in the > same word: they are essentially different languages. A > 'tonos' is not a 'oxia' or a 'varia'. Actually, "tonos" and "oxia" are treated as equivalents in Unicode. Nevertheless, sorting wouldn't be a problem indeed, because it is done according to the base letter only, punctuation is irrelevant. > Why do Greek newspapers still use ISO 8859-7? "If it ain't broke, don't fix it." > nightmare), but if you're only working in Greek, why > not stick with what you know? Exactly. Nothing to do with size issues, and everything to do with that. Plus, a major operating system doesn't really support UTF-8, and instead concentrates on UTF-16, which is unusable in UNIX/GNU systems for most practical purposes. > My Microsoft browser(=IE) has problems with ISO Greek > and Windows Greek, especially capital Alpha with > tonos: it gets confused, and displays a box. Well actually, this particular letter is the only incompatibility between the two character sets. In ISO-8859-7, this letter occupies the code point that MS Word once had hardcoded as representing the paragraph symbol. So for Windows-1253, Microsoft put the paragraph symbol there and moved capital Alpha with tonos elsewhere. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Unicode Keyboard Input Linux
On Mon, Jun 14, 2004 at 11:39:44PM +0200, Pablo Saratxaga wrote: > Kaixo! > > On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote: ..[snip].. > > This is about as complicated as it gets in polytonic > > Greek, three dead keys, two pre-position, one > > post-position, 'w' representing omega, and an 'i' for > > iota subscript. > > No, dead keys cannot be post-position; they must always be typed > *before* the key they modify; that is in fact the very definition > of a dead_key: they modify the behavioiur of what is typed after them. > If it is typed after it is not a dead key, but just a regular key. > > The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2 > (á, omega with psili varia and ypogrammeni) are: > >: "á" U1fa2 >: "á" U1fa2 >: "á" U1fa2 >: "á" U1fa2 >: "á" U1fa2 >: "á" U1fa2 > : "á" U1fa2 > : "á" U1fa2 > : "á" U1fa2 > : "á" U1fa2 > : "á" U1fa2 > : "á" U1fa2 > > 6 ways to type it with dead keys (corresponding to the six > possible combinations of the three dead keys; but dead keys > always after the letter) > and 6 ways to type it with Multi_key (you press "Multi_key", then > the following keys in the given order). Note that, even Multi_key combinations always have the letter last, so that, when a letter arrives, it is certain that the sequence is complete. See my comments below. > What you would like would be in fact: > > : "á" U1fa2 > : "á" U1fa2 > > (that is, two dead keys, followed by two normal keys; a key sending > Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI) > > I haven't tested it but if it works, it could indeed be added for > all the cases and a layout with instead of , if > that is more intuitive to type. > > > The keyboard map is therefore more than a map, it is a > > fsm, a stateful-map. > > That is not supported at all. > If you need that, you need to develop an input method actually > (like japanese or vietnamese use), that is, a program that interpretes > what you type and produces a different input. > > Yes there is something of that in console (but very limited) and > in X11 (more powerfull), but it is always linear. > > (also, I m' not sure if it is possible to have, for example, > " " and > "dead_horn> " sequences (that is, sequences > that one is subset of another)) You can't. The problem with that is that, if you wanted to type the second sequence, the composition engine wouldn't know whether to stop there and emit the symbol, or to wait for another symbol to complete the sequence. So it waits. This could probably be fixed (partly): when a symbol comes that causes the sequence to become invalid, the engine could check the compose sequence just before the arrival of that symbol, and emit the result. But this is not the current behaviour. > > If I change keyboards in > > midstream (using alt-a, for example), the fsm would > > output the components of an unaccepted character > > individually. How far will keymaps go? > > You can't. > pressing Alt-A means (or any other key) means you broke the sequence. > in such case you simply lost what you typed in the incomplete sequence. Indeed. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: JOE editor has just added UTF-8 support
On Mon, May 03, 2004 at 10:27:10PM +0900, Derek Martin wrote: ..[snip].. > in Gaim. =8^) Now if only Mutt will work properly with UTF-8... Err... I'm reading these messages inside mutt, which in turn runs under a UTF-8 enabled xterm (uxterm), with the el_GR.UTF-8 locale. And let me tell you, it works great, and in fact it's been supporting UTF-8 for a long time now. Make sure that you have a fairly recent version of mutt, and that it's compiled against ncursesw, not plain ncurses or slang, and you should be set. The Debian unstable package is what I'm using, BTW. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: JOE editor has just added UTF-8 support
On Mon, May 03, 2004 at 07:32:24PM +0200, Jan Willem Stumpel wrote: > Thanks very much, this clears up a lot. A few more questions: > > 1. Most GTK+ programs allow right-clicking in text boxes to change the > input method, but Mozilla, unfortunately, does not. But it *is* affected > by the GTK_IM_MODULE=xim environment variable, so it appears to be a > GTK+ program all right. In fact, starting Mozilla (from the command > line) with > > GTK_IM_MODULE=im-ja mozilla > > works, resulting in a Mozilla which accepts Japanese input -- without > using kinput2! > > Now it would be extremely nice if this could also be done (somehow) > dynamically, "on the fly". GTK+ programs seem to be able to do this, so > (I think) Mozilla should be able to do it also. Is there a way to > achieve this? Mozilla twists and turns GTK+ to its whims, so the result is very different than usual GTK+ applications. AFAIK, there's no equivalent method to switch input method; if it's important for you, you could try filing a bug report in their bugzilla, asking for the Input Methods submenu to be added to the input box context menu. > 2. In programs which *do* allow right-clicking for input method > selection, the "default" input method is (apparently) less useful than > the "xim" input method, because of the less-than-perfect Compose > implementation in "default". Is there a way to make xim the "default"? > Anyway, what exactly is the "default" in the input methods menu? Where > is it defined? It's already been stated that you can specify GTK_IM_MODULE=xim in your environment, which makes all applications use that by default. Isn't that good enough for you? -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: JOE editor has just added UTF-8 support
On Mon, May 03, 2004 at 02:39:22AM +0900, Derek Martin wrote: > On Sun, May 02, 2004 at 05:10:42PM +0300, Vasilis Vasaitis wrote: > > Programs that use the GTK+ library use GTK+'s own composition > > mechanism by default, instead of the one supplied by X. > > Do you know why that is? GTK+ strives to be portable over many platforms (X, Win32, linux framebuffer, etc.). As such, it has been decided that it cannot rely on the input methods provided by each platform, so instead it is shipped with its own ones. I guess the quest for consistency has led to those being the default. > > You can switch that temporarily for a text box, by right clicking on > > it and selecting Input Methods -> X Input Method. Or, you can switch > > it permanently for all applications, by setting GTM_IM_MODULE=xim > > among your environment variables. > > Thanks for posting that... I'd been trying to figure out a way to do > that for some time. I never could find anything useful about it > though. Is there some place where it's documented, should I want to > refer to or quote it in the future? I can't find anything relevant, sorry. But from now on you could always use the appropriate link to the linux-utf8 web archives. ;^) Oh, and of course it's GTK_IM_MODULE, as mentioned in another message. Also, the file /etc/gtk-2.0/gtk.immodules (the directory might be different on other systems) has a similar role. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: JOE editor has just added UTF-8 support
On Sun, May 02, 2004 at 02:09:30PM +0200, Jan Willem Stumpel wrote: ..[snip].. > I was looking at the Compose file because I am trying to understand > keyboard input. My keyboard is set for us_intl, with a lot of "dead > keys". All of the combinations (I think) in the Compose file, including > some very complicated ones, work in joe (on an xterm with Unicode font) > and in Openoffice. However, in other programs (like Mozilla, bluefish, > the input window of gucharmap) only a small subset of the combinations > work. For instance, o with macron (AltGr-minus-o) works in all programs, > while o with breve (AltGr-left parenthesis-o) works only in joe and > Openoffice. In other programs you hear a beep, and no character appears. > > Why do the programs behave differently with respect to keyboard input? > Is there a "cure"? Programs that use the GTK+ library use GTK+'s own composition mechanism by default, instead of the one supplied by X. You can switch that temporarily for a text box, by right clicking on it and selecting Input Methods -> X Input Method. Or, you can switch it permanently for all applications, by setting GTM_IM_MODULE=xim among your environment variables. -- Vasilis Vasaitis "A man is well or woe as he thinks himself so." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: mp3-tags, zip-archives, tool to convert filenames to UTF
On Fri, Feb 14, 2003 at 07:01:56PM +0100, Helge Hielscher wrote: > success. Is there a way to convert all ID3-Tags to Unicode? How does > ogg-Vorbis handle this issue? Ogg Vorbis comment values are encoded in UTF-8 by default, see: http://www.xiph.org/ogg/vorbis/doc/v-comment.html > 3) Do .zip Files store the encoding of the filenames somewhere and will > unzip convert the encodings to utf8? How about uft8 and the other > packers/archivers (tar,ace,rar)? Are there any known problems? I would expect these to behave pretty much like filesystems themselves do. -- Vasilis Vasaitis [EMAIL PROTECTED] +306976604701 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Emacs automatic UTF-8 setup
Hello, Lately, I've started to slowly migrate my environment to UTF-8. Since I don't feel ready to do a complete switch yet, as the environment doesn't seem to be mature enough, I like to have a "transition period", when I can run applications in either a UTF-8 or a non-UTF-8 locale, as needed. More specifically, I want to be able to constantly switch back and forth between el_GR.ISO8859-7 and el_GR.UTF-8. Most programs don't need any particular setup for this. Emacs [0], however, is a notable exception. In its default setup, it doesn't completely support a UTF-8 environment, the main problem being that it doesn't recognise UTF-8 keyboard input. So I set out to discover the minimum configuration possible, so that it would fully support the UTF-8 locale, without creating any problems at the ISO8859-7 locale at the same time. In addition, it would have to work both in X11 and terminal mode, and in the latter, both on the Linux console and inside an xterm. The result isn't the most obvious setup, so I thought I'd post it here, in the hope that others find it useful as well (esp. Emacs developers). First of all, I wanted to make sure that Emacs automatically sets the language environment to "Greek" in all cases, without actually configuring it to be the default. This is accomplished with the following line in .emacs: (setq locale-language-names (cdr locale-language-names)) The variable locale-language-names is a list of patters that match locale names to names of language environments. In my version of Emacs, the first entry inhibits all UTF-8 locales from setting any language environment. In my case, this seems to cause more harm than good, so I eliminate that entry with the above command. In addition, I want to set the various coding systems for each locale to sane values. This is achieved with the following piece of code: (setq locale-preferred-coding-systems (cons (cons ".*\\.utf-8" 'utf-8) locale-preferred-coding-systems)) ((lambda (cs) (set-keyboard-coding-system cs) (if cs (set-terminal-coding-system cs))) (set-locale-environment nil)) This makes UTF-8 the preferred coding system for UTF-8 locales, and sets the various coding systems according to the current locale settings. Now Emacs behaves just like most other applications: assumes an 8-bit, ISO8859-7 environment under the el_GR.ISO8859-7 locale, and a multi-byte, UTF-8 environment when run under el_GR.UTF-8. [0] I use GNU Emacs 21.2-5, the latest version in Debian unstable. -- Vasilis Vasaitis [EMAIL PROTECTED] +30976604701 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: [Devel] Re: Linux Console in UTF-8 - current state
On Sun, Oct 06, 2002 at 07:35:53PM +0400, Vadim Plessky wrote: > > Let's also clarify a few things here. > Do you speak about CJK fonts, or about Latin+Greek+Cyrillic fonts, right? > There are some rather good fonts available *for free* for Latin+Greek+Cyrillic Not to my experience. There seem to be a lot of Latin+Cyrillic fonts around, and I would guess that at least some of them have quite good quality, but for Greek fonts this is not the case. OTOH, there is a serious shortage of Greek fonts that are (a) indisputably free, and (b) of adequate quality so as not to pop my eyeballs off their sockets. No, come to think of it, there is even a shortage of Greek fonts that only satisfy the (a) condition. > alphabet. And I am working on set of my own fonts (Latin+Cyrillic, Greek can > follow, as it's not extremly difficult to add if you have already Cyrillic > glyphs) Well, it's not that easy either. Capital letters are almost ready if you have capital Latin & Cyrillic ones, but a lot of the small letters are *very* different from anything else in the other scripts. Anyway, I don't intend to criticise anyone, just wanted to clarify some things... -- Vasilis Vasaitis [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
On Fri, Apr 12, 2002 at 11:11:41AM -0400, [EMAIL PROTECTED] wrote: > On Fri, 12 Apr 2002, Vasilis Vasaitis wrote: > > > Just like in the case of the opposite conversion, this conversion can also > > be easily achieved with an one-liner. The following seems to be able to do > > the job: > > > > perl -ne 'for (unpack "U*", $_) { printf $_ > 255 ? "U+%04X" : "%c", $_ }' > > Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to > be '127' :-) Er, right. That's what I meant, actually, but I guess I wasn't thinking much at that moment :^). And since I only tested this with an iconv'ed ISO-8859-7 text to UTF-8, I didn't even notice... Cheers, -- Vasilis Vasaitis [EMAIL PROTECTED] "Don't do drugs. Santa Claus is watching." -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 file to ASCII file converter
On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote: > I already have a perl script (thanks to Oyvind A. > Holm) that converts an ascii file with U+ unicode > codes to an utf-8 file. > Now I would like to do the oposite, convert an utf-8 > file to an ascii file, each utf-8 character would be > encoded back to U+. Many thanks in advance for any > help! Just like in the case of the opposite conversion, this conversion can also be easily achieved with an one-liner. The following seems to be able to do the job: perl -ne 'for (unpack "U*", $_) { printf $_ > 255 ? "U+%04X" : "%c", $_ }' -- Vasilis Vasaitis [EMAIL PROTECTED] "Don't do drugs. Santa Claus is watching." -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: ASCII file to UTF-8 file converter
On Tue, Mar 26, 2002 at 06:58:58AM -0800, Pedro Ferreira wrote: > Please, what is the best tool to convert an ascii file > with unicode character codes like this: > U+3400 > U+3405 > to another UTF-8 file with the corresponding unicode > characters? > Many thanks! > Pedro Ferreira Perl, of course! Try something like this: perl -pe 's/U\+([0-9A-Fa-f]{4})/pack "U", hex $1/ge' No wonder they call it the "UNIX swiss army chainsaw"... -- Vasilis Vasaitis [EMAIL PROTECTED] "Don't do drugs. Santa Claus is watching." -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: SI/SO G0/G1 in linux console
On Thu, Feb 28, 2002 at 03:26:38PM +0100, Erika Pacholleck wrote: > When doing some experimenting with the acm/screen maps I discovered > some strange things, as to not-working and vice-versa-working. And > I would need an advice where to hook in for fixing, please. > > system information: > - running linux-2.4.10, compiled with/against 2.4.5 > - kbd-1.06, ncurses-5.2 > state after system start: > - with matrox fb > - with keymap de-latin-nodeadkeys > - without font loading/changing > - and $TERM=linux > - and dumpkeys saying 0x000f=control_o 0x000e=control_n > - and inputrc allowing 8-bits > - and locale charmap as ISO-8859-1 So far so good. > expected behaviour from this: > - G0 set to default latin1 and G1 set to VT100 graphics > - typing [Ctrl]+[o] sending Control_o (switch to G0) > - typing [Ctrl]+[n] sending Control_n (switch to G1) Expected behaviour where? You can't expect terminal applications (I assume you were using the shell here) to just echo every character they receive to the terminal. That defeats the whole point of keyboard handling. For example, a lot of programs, upon receiving ^L, while redraw the screen, and not send that character to the terminal. A notable exception to this is cat(1), which will just echo at its output whatever it receives in its input. In fact, cat is very useful for tests like this. > here is what happens in reality: > docs hit keysecho -e orders > SI=G0=^O [Ctrl]+[o] \\033o \\x1bo \\x0e \\016 > SO=G1=^N [Ctrl]+[n] \\033n \\x1bn \\x0f \\017 > resultsnegative neg.neg. both switched > > Ctrl+o looks like CR is sent; Ctrl+n beeps As I said, just because you press ^N or ^O at the bash prompt, doesn't mean that it will echo them to the terminal. Use cat and it will do what you expect. > \\033o \\x1bo and according n's don't show any change These sequences are ESCn and ESCo, completely different from the control characters you are talking about. > \\x0e \\016 look like switched to G1 VT100 graphics > \\x0f \\017 look like switched to G0 latin1 > > So only echoing the hex/oct values seem to get at least the G0/G1 maps > -- only that they now are just the other way round than it should be. Actually, the values you are using should be the other way round, as ^N is ASCII 14 and ^O is ASCII 15. As a rule of thumb, the ASCII value for ^X, where X is any letter of the english alphabet, equals the position of that letter in the alphabet. > And there seems to be no program to report the status (like say, tty2: > using G0, currently set to latin1) so I can only judge from what I see. Indeed, if there is a way I'm not aware of it. > I guess the keymap will need change to make the key-hitting work the > way it should be, but what do I check for the vice-versa 0e/0f signal? > > Thanks for your comments. To summarize, remember that ^N and ^O, like most other codes, change the state of the terminal when they are output to it, not when they are input from it. -- Vasilis Vasaitis [EMAIL PROTECTED] "Don't do drugs. Santa Claus is watching." -- winamp.com -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/