Re: Input under RH8
On Fri, 6 Dec 2002, Maiorana, Jason wrote: First, thanks to Jungshik Shin Mike FABIAN for your replies. You're welcome :-) I surmise that the current state of RH8 is that it is not yet suitable for entry of all languages simultaneously. (flaws in XIM itself being part of the problem) You're right. You can't do MS Windows/MacOS style IME switching, yet, in all applications. I can probably setup some scripts to pop up a gedit in a given mode, but, with the exception of VIQR and Korean, I cannot yet graphically switch around to any input method with the version of gtk2 that comes with rh8. Gtk2 as shipped in RH8 has Thai(broken?), Tamil, Cyrillic(transliterated), Innuikitut, IPA, Tigrigna-Ethiopian, Tigrigna-Eriterian, and Amharic input modules in addition to XIM, Vietnamese, *broken* Korean(KSC5601) input module. For Korean, you'd better install 'imhangul' input module at http://imhangul.kldp.net. You can download the source by clicking 'download' in red and install it by following the instruction in the gray box below the link for download. If this is the first time you install 'imhangul', you have to run 'make install' twice (it's due to a bug to be fixed.) You can also make use of Xkb. With its support of multiple levels, you can add yet another 'input method' to your repertoire of input methods accessible in gedit(a gtk2 application). As for Xkb, refer to XFree86 I18N archive. Hopefully, in the near future, RH will ship all utf-8 locales by default, and gtk2 will have a XIM wrapper that allows access to any input method on the system from any language locale. Alternatively, 'meta XIM server' (as implemented at the client level by Yudit and mlterm) that lets users switch between multiple XIMs will be handy. Then, it can be used for non-gtk2 applications as well as gtk2 applications. BTW, has anybody heard of gtk2 input modules for Chinese and Japanese? A quick googling didn't turn up anything. Jungshik -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 wakeup call
On Fri, Dec 06, 2002 at 11:39:43AM -0500, Henry Spencer wrote: On Fri, 6 Dec 2002, Keld =?iso-8859-1?Q?J=F8rn?= Simonsen wrote: Actually it is funny that you call it Unicode. UTF-8 clearly comes from the 10646 side of UCS, Unicode did not invent it at all... It did not come from 10646 either; it came from the *Unix* side of the house, specifically from X/Open. And my understanding is that it was originally specifically an encoding for Unicode (although the distinction quickly became academic because of the conversion of 10646 into a Unicode clone). UTF-8 came then thru the 10646 side. Unicode was strictly 16-bit from the outset. Per-merger 10646 was 32 bit with an 8-bit encoding made possible. After the merger 10646 had an 8-bit encoding called UTF-1. Nobody except some standards zombies cared about encoding 10646, or indeed about any aspect of 10646; Unicode was the standard that the real world was clearly signing up for. Which is why the 10646 committee, seeing the writing on the wall, abandoned its own efforts and aligned its standard with Unicode. Hmm, what is implemeted in Linux is really not Unicode, but 10646, and was that from the outset, eg 32 bit wchar_t. But then you may call Linux people for standard zombies, that is your call, and yes, we use standards like POSIX and ISO C etc. Anyway I was writing about linux and UTF-8. Some of the Unicode standards guys were dead-set against any encoding except plain 16-bit (but which byte order? :-)), but potential *users* of Unicode were much more pragmatic. UTF-8 originally came out of the desire for a backward-compatible encoding for use in Unix filenames. Yes, true. And that is then why I find it funny that the people that were dead-set against anything other than 16 bit, now gets all the glory for the stuff they fought so hard. The irony of history:-) In any case, Unicode is much the more widely-known name, and much the more readily available standard, and (as others have noted) also comes with a lot of relevant supplementary information that 10646 lacks. The supplementary information is much covered in 14651 and 14652. And the specifications in Linux are then build on these standards not Unicode tables. The way 10646 is coming to Linux is also much with the support from the ISO 14651 sorting standard and the ISO TR 14652 locale standard. My understanding is that an ISO TR, by definition, is not a standard. The definitions in ISO on what is standards in general, encompasses ISO TRs as standadrs. It is not an ISO standard, but it is a standard in the generic sense of the word. I think the proper way to characterize what we do now in Linux is to say ISO 10646, and probably mention Unicode in parenthesis the first time it appears. The pragmatic, and historically correct, way is the reverse. ISO 10646 delivers the ISO stamp (stomp? :-)) of approval for Unicode... but the standard you will find on the shelves of the people who do the work is labelled Unicode. I then think you are mislabelling the use of UTF-8 in Linux, as the Unicode standards are not adhered to. Linux UTF-8 is not Unicode conformant, but follows ISO 10646, 14651 and TR 14652. Kind regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
RE: UTF-8 wakeup call
Frank T. Pohlmann: Actually, I tried to get people to realize the scale of the coming changes. http://www.linuxuser.co.uk/articles/issue22/lu22-All_you_need_to_know_about-Unicode.pdf -Frank Pohlmann Apart from the very overpretentious title, that article contains a number of errors. I will mention just one: the notion of implementation levels (a 10646 thing; Unicode does not formalise that) have been scarily confused with the notion of planes. Kind regards /kent k -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 gnroff mangles up syntax samples
On Fri, 6 Dec 2002, Larry Wall wrote: ...Of course, if you make a way to translate the old format to something resembling the new format, the transition can happen more quickly. Also, for a quick hack that's likely to give good results: if the man macros merely render all explicitly-requested boldface as the verbatim font with verbatim processing, that will go a long way toward doing the right thing. Bold does not see much other use in traditional manpages. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 wakeup call
On Fri, 6 Dec 2002, Antoine Leca wrote: [UTF-8] did not come from 10646 either; it came from the *Unix* side of the house, specifically from X/Open. I thought it came from Plan 9 (Rune) then passed to X-Open (FSS-UTF?). Did I miss something? Note I was not there at this time. Markus has helpfully explained this (especially helpful since I didn't have much detail on the earliest history): Plan 9 was the earliest major implementation but didn't actually originate it. Plan 9 in fact started with a different Unicode encoding, but switched when UTF-8 appeared. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
RE: UTF-8 wakeup call
Keld, Maybe there are flaws in 14651, but it is ISO 14651 which is used in Linux. That is a problem, not a feature. While UAX 10 is conforming to 14651, it does specify a number of requirements in addition to 14651. Specifically, for Thai, Lao, and combining characters support. and the ISO TR 14652 locale standard. 14652 is NOT a standard. It is also very unlikely to ever develop into one. Keld, please stop promoting it as a standard, when you very well know that it is NOT a standard. It is as much a standard as Unicode in the generic sense of the word standard, but it is not an ISO standard. Please understand that. It's an ISO TR that became a TR because it FAILED to become an ISO standard. Please understand that. ... The mappings used are at least also from the RFC 1345 (recode uses that) or the IS 15897 which uses many if the same names and mappings. Specifically I have seen that Linux is *not* using the Unicode data because of copyright issues. Hmmm. From http://www.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html: Limitations on Rights to Redistribute This Data Recipient is granted the right to make copies in any form for internal distribution and to freely use the information supplied in the creation of products supporting the UnicodeTM Standard. The files in the Unicode Character Database can be redistributed to third parties or other organizations (whether for profit or not) as long as this notice and the disclaimer notice are retained. Information can be extracted from these files and used in documentation or programs, as long as there is an accompanying notice indicating the source. I don't see this as restrictive for use in Linux. I'm sure Unicode consortium would like to see its data being used also in open source project, like Linux. Note that IBM has its own open source project on Unicode support (ICU). Kind regards /kent k -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Input under RH8
Jungshik Shin [EMAIL PROTECTED] writes: Hopefully, in the near future, RH will ship all utf-8 locales by default, and gtk2 will have a XIM wrapper that allows access to any input method on the system from any language locale. Alternatively, 'meta XIM server' (as implemented at the client level by Yudit and mlterm) that lets users switch between multiple XIMs will be handy. Then, it can be used for non-gtk2 applications as well as gtk2 applications. BTW, has anybody heard of gtk2 input modules for Chinese and Japanese? A quick googling didn't turn up anything. Off the top of my head... Japanese: http://bonobo.gnome.gr.jp/~nakai/immodule/ Chinese: http://sourceforge.net/projects/wenju/ [ Actually, a generic table based method, but contains tables for various methods of inputting Chinese ] There may possibly be others. Regards, Owen -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
RE: UTF-8 wakeup call
On Sat, 7 Dec 2002, Kent Karlsson wrote: The mappings used are at least also from the RFC 1345 (recode uses that) or the IS 15897 which uses many if the same names and mappings. Specifically I have seen that Linux is *not* using the Unicode data because of copyright issues. Hmmm. From http://www.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html: Limitations on Rights to Redistribute This Data Recipient is granted the right to make copies in any form for internal distribution and to freely use the I don't see this as restrictive for use in Linux. I'm sure Unicode consortium would like to see its data being used also in open source glibc 2.x may not use them, yet. However, glib(and other libraries built on top of it) indeed makes an extensive use of Unicode data files. So do Perl, Yudit, Mozilla and other free/opensource programs/projects that run on Linux. Jungshik -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
30100------300000emailaddress only usd$10
ÄãÈç¹ûÐèÒª×ö¹ã¸æ£¬ÇëÄ㽫ÄãËùÒªÐû´«µÄ²úÆ·ºÍ¶ÔÏó¸æËßÎÒ£¬ÎÒ»áΪÄ㶨ÖÆÄãËùÒªµÄÓÊÖ·£¬¼Û¸ñ£º30ÍòÓÊÖ·100Ôª£¬ËÍÈí¼þÒ»Ì×£¬Ò»ÈÕÄÚ½»»õ£¬ÎÒµÄÓÊÏäÓû§ÃûÊÇusa123888»òusa123268»òusa123368 ÓÊÏäÊÇÊôÓÚyaoweb.comµÄ ÇëÁôÄãµÄÓÊÏäµØÖ·ÒÔ±ãÁªÏµ£¬Èç¹ûÄã²»ÐèÒªÎÒ¿´µ½ÎҵĹã¸æ£¬Çë·¢Óʼþµ½Óû§Ãûusa123468 µÄ88998.com»òyaoweb.comÖÐÒªÇó³·³ýÄãµÄÓÊÏäµØÖ·¡£ do you have products to sell in net ,please contact with me .i can find 30 emailaddress for you only 10$ ,and i will sent it with a emailsendersoft to you,if you need please sent a mail to me ,my email: username£ºusa123888 or username:usa123268 or username:usa123368 they are yaoweb.com if you want to remove your emailaddress pleses sent email to username:usa123468its yaoweb.com
$BL$>5Bz9-9p"(I,8+!*N">pJs!*(J
MÒ dq[LÐ ¡ãALð²ó]³êÈ¢ûͱ±Ö [EMAIL PROTECTED] K¸{¶É ȽÌ[AhXÌÝ𨫺³¢ === ÐÌåÈLÍð©µÜ¹ñ©I zMƱ©çz[y[W»ìÜÅiÀÉĨó¯vµÜ·B ºLe`wÉĨ\µÝº³¢B === §104-0061 sæâÀ8-19-3 æ2ECOr@3F [}KWs TEL@03-3544-6222 FAX@03-3544-6218@@ === âè¤iΩèWßܵ½BÁ³êé°êª èÜ·ÌÅ ¨\ÝͨßÉI = \\\\\\\\\\\\\\\\\\\\\\\\ rfIÌEÁê_b`CtErlNu @@ `ujDåWEðÛErdwthEA_gObYÈÇ @A_gÖAÌîñÚ@ @@¨\ÝE²¶E¤iÚ×Í@ @@@@@ºLtqkðNbNµÄ²º³¢B «@@@@«@@@@«@ @@@http://www.ss-koukoku.com/ \\\\\\\\\\\\\\\\\\\\\\\\ @@J^ObYEÉéîñEhÆObYEàׯîñÈÇ@ @@@@@@@@@@»Ì¼îñÚ@ @@¨\ÝE²¶E¤iÚ×Í@ @@@@@ºLtqkðNbNµÄ²º³¢B «@@@@«@@@@«@ http://www.pp-koukoku.com/ \\\\\\\\\\\\\\\\\\\\\\\\ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Emacs automatic UTF-8 setup
Hello, Lately, I've started to slowly migrate my environment to UTF-8. Since I don't feel ready to do a complete switch yet, as the environment doesn't seem to be mature enough, I like to have a transition period, when I can run applications in either a UTF-8 or a non-UTF-8 locale, as needed. More specifically, I want to be able to constantly switch back and forth between el_GR.ISO8859-7 and el_GR.UTF-8. Most programs don't need any particular setup for this. Emacs [0], however, is a notable exception. In its default setup, it doesn't completely support a UTF-8 environment, the main problem being that it doesn't recognise UTF-8 keyboard input. So I set out to discover the minimum configuration possible, so that it would fully support the UTF-8 locale, without creating any problems at the ISO8859-7 locale at the same time. In addition, it would have to work both in X11 and terminal mode, and in the latter, both on the Linux console and inside an xterm. The result isn't the most obvious setup, so I thought I'd post it here, in the hope that others find it useful as well (esp. Emacs developers). First of all, I wanted to make sure that Emacs automatically sets the language environment to Greek in all cases, without actually configuring it to be the default. This is accomplished with the following line in .emacs: (setq locale-language-names (cdr locale-language-names)) The variable locale-language-names is a list of patters that match locale names to names of language environments. In my version of Emacs, the first entry inhibits all UTF-8 locales from setting any language environment. In my case, this seems to cause more harm than good, so I eliminate that entry with the above command. In addition, I want to set the various coding systems for each locale to sane values. This is achieved with the following piece of code: (setq locale-preferred-coding-systems (cons (cons .*\\.utf-8 'utf-8) locale-preferred-coding-systems)) ((lambda (cs) (set-keyboard-coding-system cs) (if cs (set-terminal-coding-system cs))) (set-locale-environment nil)) This makes UTF-8 the preferred coding system for UTF-8 locales, and sets the various coding systems according to the current locale settings. Now Emacs behaves just like most other applications: assumes an 8-bit, ISO8859-7 environment under the el_GR.ISO8859-7 locale, and a multi-byte, UTF-8 environment when run under el_GR.UTF-8. [0] I use GNU Emacs 21.2-5, the latest version in Debian unstable. -- Vasilis Vasaitis [EMAIL PROTECTED] +30976604701 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: UTF-8 wakeup call
On Sat, Dec 07, 2002 at 03:21:44PM +0100, Keld Jørn Simonsen wrote: : Yes, true. And that is then why I find it funny that the people : that were dead-set against anything other than 16 bit, now gets all : the glory for the stuff they fought so hard. The irony of history:-) Yes, it's ironic, but the reason they get the glory has very little to do with history--except the part of history in which they were clever enough to pick the snappier name. All other things being equal, had the 10646 and Unicode folks swapped names from the start, it would still be called Unicode today, because that's the right name for it, culturally speaking. Most people don't give a rip about history, but they do care about sounding cool. Larry -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/