Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Michael B . Allen

On Tue, 30 Apr 2002 21:27:13 -0500
David Starner [EMAIL PROTECTED] wrote:

 On Tue, Apr 30, 2002 at 09:30:46PM -0400, Jungshik Shin wrote:
   Debian should support LC_PAPER locale category instead of
  relying on /etc/papersize. psutil (psresize, psnup,etc)
  relies on it to pick the default paper size. If it's set to
  en_US.xxx, it uses letter *by default*. Otherise, it uses
  A4 *by default*.
 
 And Debian psutils uses /etc/papersize (actually libpaperg, a wrapper
 around that one line file), since that's the Debian way. The
 locale method doesn't make sense, as what size paper the printer has is
 not usually a user setting. Also, the locale system has two
 alternatives, letter or A4, ignoring the possiblity that someone might
 have, say, legal size paper loaded.

If /etc/papersize is specific to Debian, how would developers
consistently detect something like paper size? I think using LC_PAPER
should be enough to satisfy most programs. The rest should be handled
by an external program like qtcups. That can consult /etc/papersize in
Debian or whatever the LSB is recommending.

Mike

-- 
May The Source be with you.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Lars Engebretsen

Yann Dirson [EMAIL PROTECTED] writes:

 The problem here is satisfying users, not programs. Papersize is a
 setting that is specific to available printers, not to any locale
 one may use.

What should the default paper size be in an application that
creates PDF documents?

Or, for another example: What if one has both a letter tray and an A4
tray in the available printer?

/Lars

-- 
Lars Engebretsen, PhD, [EMAIL PROTECTED]
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




LC_PAPER vs /etc/papersize (was..Re: Please do not use en_US.UTF-8..)

2002-05-01 Thread Jungshik Shin




On Tue, 30 Apr 2002, David Starner wrote:

 On Tue, Apr 30, 2002 at 11:09:55PM -0400, Jungshik Shin wrote:
  However, to me overiding the default at the command line is a perfectly
  good solution.

 Everytime you use a program?
 Stuff like that gets real tiring, real fast
 to me.

  What are shell scripts/aliases for ;-) ? What if your site has
multiple printers with different sizes of paper loaded by default?
How about printers with multiple trays?  Whichever method you use to
set the default, you have to use a command line option or other means
to overide the default. However, I have to admit that you clearly have
a point.  It's not most desirable for programs to derive the default paper
size from the locale *name* assigned to LC_PAPER. It's certainly true that
if programs rely on /etc/papersize instead of mapping the locale *name*
to the default papersize, it's easier to change the default paper size.

 What has to be done is to use the actual *value* stored in LC_PAPER
instead of 'guessing' the default paper size from the locale *name*
provided that LC_PAPER is  a standard locale category. It's not, yet.

  I was wrong to say that LC_PAPER is defined in ISO 14652
(draft).  It's not there. SUS V3 doesn't have it, either. So,
it's not a standard locale category but at least it's available
where glibc 2.2.x is used (i.e. all Linux distributions
including Debian) Even there, nl_langinfo(PAPER_HEIGHT) and
nl_langinfo(PAPER_WIDTH) don't work yet. langinfo.h in glibc 2.2.x has
_NL_PAPER_HEIGHT and _NL_PAPER_WIDTH. Therefore, programmers might
use nl_langinfo(_NL_PAPER_WIDTH) and nl_langinfo(_NL_PAPER_HEIGHT).
However, it's not very portable (both across platforms and over the
time) because I believe '_' at the beg. of _NL_PAPER_* indicates their
non-standard nature.  Now what follows is based on not what it's widely
available (or standard) but what it may be in the future.

hypothetic situation
  How often do you (think people) use papersize other than US letter
(or A4 outside the US)? If the answer is most of time, you can
build your own locale with LC_PAPER defined for the most frequently used
papersize at your site (say, en_US.UTF-8@legal)? Then, you can have

  LC_PAPER=en_US.UTF-8@legal
  LANG=en_US.UTF-8

And a French living in the US may have

  LC_PAPER=en_US.UTF-8@legal
  LANG=fr_FR.UTF-8

  What difference is there between setting /etc/papersize and building
and installing a new locale for your favorite size? Sure, editing one-line
is easier than building a new locale. However, it's not so flexible as
you think.  With en_US.UTF-8@legal built and installed, different users
with different choices of the default paper size (because their offices
have different printers with the primary tray for different papersize)
can happily *share* a *single* system. They don't have to fight over
which paper size goes into /etc/papersize.  Those who mainly use US letter
can just set LANG to en_US.UTF-8 and leave LC_PAPER alone (or they can
specify that to en_US.UTF-8 if they want to). Others who mainly use legal
paper can set LC_PAPER to en_US.UTF-8@legal with LANG set to en_US.UTF-8.
/hypothetic situation



   Jungshik Shin


(1) LC_PAPER definition for US letter goes like this (the unit is mm.)
LC_PAPER
height   279
width216
END LC_PAPER

You can change height and width to whatever value you want.


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Yann Dirson

On Wed, May 01, 2002 at 02:27:27PM +0200, Lars Engebretsen wrote:
 Yann Dirson [EMAIL PROTECTED] writes:
 
  The problem here is satisfying users, not programs. Papersize is a
  setting that is specific to available printers, not to any locale
  one may use.
 
 What should the default paper size be in an application that
 creates PDF documents?

Maybe both /etc/papersize and LC_PAPER have use.  But their field of
application has to be clearly defined for programmers to do the right
thing...


 Or, for another example: What if one has both a letter tray and an A4
 tray in the available printer?

Not sure.  One is probably default printer anyway.

-- 
Yann Dirson[EMAIL PROTECTED] |Why make M$-Bill richer  richer ?
Debian-related: [EMAIL PROTECTED] |   Support Debian GNU/Linux:
Pro:[EMAIL PROTECTED] |  Freedom, Power, Stability, Gratuity
 http://ydirson.free.fr/| Check http://www.debian.org/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Florian Weimer

Markus Kuhn [EMAIL PROTECTED] writes:

   c) Emacs - Current Emacs UTF-8 support is still a bit too provisional
  for my comfort. In particular, I don't like that the UTF-8 mode is not
  binary transparent. Work on turning Emcas completely into a UTF-8
  editor is under way, and I'd be very curious to hear about the
  current status and whether there is anything to test already.
  Anyone?

AFAIK, there is some activity on the Emacs 22 branch.  XEmacs is in
the process of switching to UCS for its internal character set, too.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Lars Engebretsen [EMAIL PROTECTED]
In newsgroup: linux.utf8
 
 What should the default paper size be in an application that
 creates PDF documents?
 
 Or, for another example: What if one has both a letter tray and an A4
 tray in the available printer?
 

In the latter case you can use either -- most printers will pick the
appropriate tray depending on the input.  Use whichever one is more
appropriate for your locale (letter in the U.S., A4 in Sweden, for
example.)

In the former case, I would like to propose a worldwide compromise
page size -- 210 x 279 mm.  Such a page can be printed, cleanly, on
either on A4 (210 x 297 mm) or US-letter (216 x 279 mm) by expanding
either the horizontal (US-letter) or vertical (A4) margin.

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt[EMAIL PROTECTED]
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Paper size and locale

2002-05-01 Thread Markus Kuhn

H. Peter Anvin wrote on 2002-05-01 21:08 UTC:
 In the former case, I would like to propose a worldwide compromise
 page size -- 210 x 279 mm.  Such a page can be printed, cleanly, on
 either on A4 (210 x 297 mm) or US-letter (216 x 279 mm) by expanding
 either the horizontal (US-letter) or vertical (A4) margin.

I do like using the so-called PA4 format (210x280 mm, see
http://www.cl.cam.ac.uk/~mgk25/iso-paper.html for its
history) for producing presentation slide PDF files. Primarily,
because it has exactly 4:3 aspect ratio (and therefore will fill all
pixels of a data projector / monitor), and also fits without scaling
onto both A4 and P4. (P4 being the Canadian version of US Letter with
215x280 mm; the difference is within the tolerance interval anyway.)

As for the actual physical paper format (as opposed to PDF document
layout), I'd like to warmly encourage people in North America to start
using A4 paper. It's widely supported by software and printing
equipment, well established, and allows far more pleasant magnification/
reduction on photocopiers thanks to its sqrt(2) aspect ratio than the
ugly ad-hoc other system that was the result of the turf fight between
two early 20th century U.S. paper industry committees.

Some U.S. stationary retailers who sell A4 are listed on

  http://www.cl.cam.ac.uk/~mgk25/iso-paper.html

Additions welcome!

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Florian Weimer

Markus Kuhn [EMAIL PROTECTED] writes:

 As we are talking about en_US.UTF-8:

 General warning: Please do not use the locale name en_US.UTF-8 anywhere
 outside North America.

Why can't you use it for LC_CTYPE and LC_MESSAGES, say?

Determining paper size by locale is rather strange.  What's next?
Keyboard layout?  Mouse orientation?  Monitor size?
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Please do not use en_US.UTF-8 outside the US

2002-05-01 Thread Jungshik Shin

On Thu, 2 May 2002, Keld Jørn Simonsen wrote:

 The nice thing about LC_PAPER is that it is set either on installation,
 or as part of the normal setup. I think most people knows how to set the
 locale, while some, maybe many, would not know that there be a
 /etc/papersize file.

  Yes, I've been bitten more than once by these 'hidden' files
lurking around in /etc that affect the way programs work.

 LC_PAPER was in 14652 at some time but was taken out, because some
 people thought that it was not useful :-(

  So, my memory was not telling me a lie. I was almost sure I had
seen it in ISO 14652 when I wrote that LC_PAPER is in ISO 14652.
Later when I checked it, it's not there, which led me to believe
that my memory didn't serve me right once more.

  Anyway, what's the plan of ISO/IEC JTC1/SC22/WG20 on this?

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Gaspar Sinai

On Wed, 1 May 2002, Florian Weimer wrote:
 Markus Kuhn [EMAIL PROTECTED] writes:

c) Emacs - Current Emacs UTF-8 support is still a bit too provisional
   for my comfort. In particular, I don't like that the UTF-8 mode is not
   binary transparent. Work on turning Emcas completely into a UTF-8
   editor is under way, and I'd be very curious to hear about the
   current status and whether there is anything to test already.
   Anyone?

 AFAIK, there is some activity on the Emacs 22 branch.  XEmacs is in
 the process of switching to UCS for its internal character set, too.

I am not much of an Emacs guy but if I were I would probably
use QEmacs, which looks pretty decent to me:

   http://fabrice.bellard.free.fr/qemacs/

As I don't use Emacs so I can not really tell the difference,
it might not have all the functionality that Emacs has. But
I have a feeling that the functionality you can expect from a
text editor is there.

I like that Qemacs has a much smaller memory and binary size
than “mainstream” Emacs.

Open Source is funny: you probably will never hear Microsoft
praising Java ☺

Gáspár・ガーシュパール・Гашьпар・갓팔・Γασπαρ
ᏱᎦᏊ ᎣᏌᏂᏳ ᎠᏓᏅᏙ ᎠᏓᏙᎵᎩ ᏂᎪᎯᎸᎢ ᎾᏍᏋ 
ᎤᏠᏯᏍᏗ ᏂᎯ.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Paper size

2002-05-01 Thread Markus Kuhn

David Starner wrote on 2002-05-01 22:17 UTC:
 On Wed, May 01, 2002 at 11:06:08PM +0100, Markus Kuhn wrote:
  As for the actual physical paper format (as opposed to PDF document
  layout), I'd like to warmly encourage people in North America to start
  using A4 paper. 
 
 Why would we?

For the exact same reason you should switch to the metric system: The
entire civilised world has agreed half a century ago to use this single
system, to make life significantly easier for everyone on this planet.
Only the Americans still make fools of themselves with their ridiculous
Flintstone units (how many cubic feet to the gallon again? :) and paper
formats (two different aspect ratios, due to a lack of mathematical
knowledge in a 1920s committee). I think, it's a disgrace and we should
rub it in until you see the light.

 All our printers and copiers take and are loaded with letter paper.

And? Most of them are produced by companies who sell the exact same
models in the rest of the world. Some merely need a replacement paper
tray.

A 1992 US government study claimed that apart from the operators of some
types of large rotory print presses, the migration could be done at
quite moderate cost:

  http://www.srcf.ucam.org/~mgk25/gpo-report.pdf

There's this 1972 Canadian study that highly recommended already then a
switchover to ISO paper formats (which the Ontario government attempted
to do in 1974, but failed because of ignorance in the US):

  http://www.cl.cam.ac.uk/~mgk25/volatile/dunn-papersizes.pdf

A detailed discussion of the practical advantages of ISO paper formats:

  http://www.cl.cam.ac.uk/~mgk25/iso-paper.html

I agree that it is not very practical for an individual at the moment to
start such a switchover (though some multinational companies in the US
do have A4 in the second paper tray, both Xerox and Staples sell it in
the US). Standardization of office paper formats requires a major
political decision and takes around a decade to complete. To prepare
that, a sufficient number of people have to get knowledgeable and
enthusiastic about the idea first.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Paper size and locale

2002-05-01 Thread Edmund GRIMLEY EVANS

  As for the actual physical paper format (as opposed to PDF document
  layout), I'd like to warmly encourage people in North America to start
  using A4 paper. 
 
 Why would we?

Because you will eventually, so you might as well do it now to
minimise suffering. Well, I don't know how true that is for A4 paper,
but that's a generic reason for accepting a good standard.

I have heard of a US company using A4 for compatibility with its own
officies in other countries, but I don't suppose it happens very often
yet.

I can still remember the old foolscap paper that preceded A4 in
Britain. I'm certainly glad they replaced it.

Sorry, I'm now totally off topic ...

Edmund
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Tomohiro KUBOTA

Hi,

At Wed, 01 May 2002 20:02:57 +0100,
Markus Kuhn wrote:

 I have for some time now been using UTF-8 more frequently than
 ISO 8859-1. The three critical milestones that still keep me from
 moving entirely to UTF-8 are

How about bash?  Do you know any improvement?

Please note that tcsh have already supported east Asian EUC-like
multibyte encodings.  I don't know it also supports UTF-8.

How about zsh?


For Japanese, character width problems and mapping table problems
should be solved to _start_ migration to UTF-8.  (This is why
several Japanese localization patches are available for several
UTF-8-based softwares such as Mutt.  We should find ways to stop
such localization patches.)

Also, I want people who develop UTF-8-based softwares to have
a custom to specify the range of UTF-8 support.  For example,

 * range of codepoints
U+ - U+2fff?  all BMP? SMP/SIP?

 * special processings
combining characters?  bidi?  Arab shaping?  Indic scripts?
Mongol (which needs vertical direction)?  How about wcwidth()?

 * input methods
Any way to input complex languages which cannot be supported
by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)
Or, any software-specific input methods (like Emacs or Yudit)?

 * fonts availability
   Though each software is not responsible for this, This software
   is designed to require Times font means that it cannot use
   non-Latin/Greek/Cyrillic characters.

Though people in ISO-8859-1/2/15 region people don't have to care
about these terms, other peole can easily believe a UTF-8-supported
software and then disappointed to use it.  Then he/she will become
distrust UTF-8-supported softwares.  We should avoid many people
will become such.

---
Tomohiro KUBOTA [EMAIL PROTECTED]
http://www.debian.or.jp/~kubota/
Introduction to I18N  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Off-topic Re: Paper size and locale

2002-05-01 Thread David Starner

On Wed, May 01, 2002 at 11:32:35PM +0100, Edmund GRIMLEY EVANS wrote:
   As for the actual physical paper format (as opposed to PDF document
   layout), I'd like to warmly encourage people in North America to start
   using A4 paper. 
  
  Why would we?
 
 Because you will eventually, so you might as well do it now to
 minimise suffering. Well, I don't know how true that is for A4 paper,
 but that's a generic reason for accepting a good standard.

But the current standard is good enough that there's no incentive to
switch. It's like the metric system; Markus asks how many cubic feet to
the gallon, and my answer is who cares? Anybody who really cares how
many cubic feet to the gallon there are, already uses metric. Soda comes
in mixed Imperial/metric units (8 oz., 16 oz., 1 l, 2 l, 3 l), a
situation worse than just using either to which someone like Markus
somewhere is to blame, and I have no problem. I use three sets of
Imperial measurements on a regular basis: time, length and liquid
volume, and one of them metric doesn't change. To go from
inches/feet/miles and ounces/gallons to meters and liters doesn't save
me enough to worry about it, and if I need the metric system for a
problem, I know where to find it.

The paper situation is even worse. I see no benefits accruing to me from
switching paper sizes; and I don't want to switch out all my notebooks
or end up mixing letter sized and A4 sized paper.

-- 
David Starner - [EMAIL PROTECTED]
It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side. 
- K's Choice (probably referring to the Internet)
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Glenn Maynard

On Thu, May 02, 2002 at 11:38:38AM +0900, Tomohiro KUBOTA wrote:
  * input methods
 Any way to input complex languages which cannot be supported
 by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)
 Or, any software-specific input methods (like Emacs or Yudit)?

How much extra work do X apps currently need to do to support input
methods?

In Windows, you do need to do a little--there's a small API to tell the
input method the cursor position (for when it opens a character selection
box) and to receive characters.  (The former can be omitted and it'll
still be usable, if annoying--the dialog will be at 0x0.  The latter can
be omitted for Unicode-based programs, or if the system codepage happens
to match the characters.)

It's little enough to add it easily to programs, but the fact that it
exists at all means that I can't enter CJK into most programs.  Since
the regular 8-bit character message is in the system codepage, it's
impossible to send CJK through.

How does this compare with the situation in X?

  * fonts availability
Though each software is not responsible for this, This software
is designed to require Times font means that it cannot use
non-Latin/Greek/Cyrillic characters.

I can't think of ever using an (untranslated, English) X program and having
it display anything but Latin characters.  When is this actually a problem?

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Tomohiro KUBOTA

Hi,

At Thu, 2 May 2002 00:16:25 -0400,
Glenn Maynard wrote:

   * input methods
  Any way to input complex languages which cannot be supported
  by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)
  Or, any software-specific input methods (like Emacs or Yudit)?
 
 How much extra work do X apps currently need to do to support input
 methods?

Much work.  I think this is one problematic point of XIM which
caused very few softwares (which are developed by XIM-knowing
developers, who are very few) can input CJK languages.

X.org distribution (and XFree86 distribution) has a specification
of XIM protocol.  However, it is difficult.  (At least I could not
understand it).  So, for practical usage by developers,
http://www.ainet.or.jp/~inoue/im/index-e.html
would be useful to develop XIM clients.  I have not read a good
introduction article to develop XIM servers.

I think that low-level API should integrate XIM (or other input 
method protocols) support so that XIM-innocent developers (well,
almost all developers in the world) can use it and they cannot
annoy CJK people.  Gnome2 seems to take this way.  However, I
wonder why Xlib doesn't have such wrapper functions which omit
XIM programming troubles.


 It's little enough to add it easily to programs, but the fact that it
 exists at all means that I can't enter CJK into most programs.  Since
 the regular 8-bit character message is in the system codepage, it's
 impossible to send CJK through.

Well, I am talking about Unicode-based softwares.  More and more
developers in the world start to understand that 8bit is not enough
for Unicode because it is a unversal fact.  I am optimistic in this
field; many developers will think 8bit character is a bad idea in
near future.  However, it is unlikely many developers will recognize
the need of XIM (or other input method) support in near future because
it is needed only for CJK languages.  My concern is how to force thse
XIM-innocent people to develop CJK-supporting softwares.


 How does this compare with the situation in X?

Though I don't know about Windows programming, I often use Windows
for my work.  Imported softwares usually cannot handle Japanese
because of font problem.  However, input method (IME?) seems to be
invoked even in these imported softwares.


   * fonts availability
 Though each software is not responsible for this, This software
 is designed to require Times font means that it cannot use
 non-Latin/Greek/Cyrillic characters.
 
 I can't think of ever using an (untranslated, English) X program and having
 it display anything but Latin characters.  When is this actually a problem?

For example, XCreateFontSet(-*-times-*) cannot display Japanese
because there are no Japanese fonts which meet the name.  (Instead,
mincho and gothic are popular Japanese typefaces.)  Such
types of implementation is often seen in window managers and their
theme files.

---
Tomohiro KUBOTA [EMAIL PROTECTED]
http://www.debian.or.jp/~kubota/
Introduction to I18N  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Jungshik Shin




On Thu, 2 May 2002, Glenn Maynard wrote:

 On Thu, May 02, 2002 at 11:38:38AM +0900, Tomohiro KUBOTA wrote:
   * input methods
  Any way to input complex languages which cannot be supported
  by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)
  Or, any software-specific input methods (like Emacs or Yudit)?

 How much extra work do X apps currently need to do to support input
 methods?

 In Windows, you do need to do a little--there's a small API to tell the
 input method the cursor position (for when it opens a character selection
...
 How does this compare with the situation in X?


  I know very little about Win32 APIs, but according to  what little
I learned from Mozilla source code, it doesn't seem to be so simple as
you wrote in Windows, either.  Actually, my impression is that Windows
IME APIs are almost parallel (concept-wise) to those of XIM APIs.  (btw,
MS WIndows XP introduced an enhanced IM related APIs called TSF?.) In
both cases, you have to determine what type of preediting support
(in XIM terms, over-the-spot, on-the-spot, off-the-spot and none?)
is shared by clients and IM server. Depending on the preediting type,
the amount of works to be done by clients varies.


  I'm afraid your impression that Windows IME clients have very little
to do to get keyboard input comes from your not having written programs
that can accept input from CJK IME(input method editors) as it appears
to be confirmed by what I'm quoting below.

  It just occurred to me that Mozilla.org has an excellent summary
of input method supports on three major platforms (Unix/X11, MacOS,
MS-Windows). See

  http://www.mozilla.org/projects/intl/input-method-spec.html.

 It's little enough to add it easily to programs, but the fact that it
 exists at all means that I can't enter CJK into most programs.  Since
 the regular 8-bit character message is in the system codepage, it's
 impossible to send CJK through.

  Even in English or any SBCS-based Windows 9x/ME, you
can write programs that can accept CJK characters from CJK (global)
IMEs. Mozilla, MS IE, MS Word, and MS OE are good examples.

   Jungshik Shin


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-01 Thread Jungshik Shin




On Thu, 2 May 2002, Tomohiro KUBOTA wrote:

 At Wed, 01 May 2002 20:02:57 +0100,
 Markus Kuhn wrote:

  I have for some time now been using UTF-8 more frequently than
  ISO 8859-1. The three critical milestones that still keep me from
  moving entirely to UTF-8 are

 How about bash?  Do you know any improvement?

 Please note that tcsh have already supported east Asian EUC-like
 multibyte encodings.  I don't know it also supports UTF-8.

  It doesn't seem to support UTF-8 locale as of tcsh 6.10.0
(2000-11-19). I can't find anything about UTF-8 at http://www.tcsh.org.
The newest release is 6.11.0 The same is true of zsh.
(http://www.zsh.org)

 combining characters?  bidi?  Arab shaping?  Indic scripts?
   and Hangul :-)
 Mongol (which needs vertical direction)?  How about wcwidth()?

  Pango and ST should certainly help, here

  * input methods
 Any way to input complex languages which cannot be supported
 by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)

  You mean IIIMF, didn't you? If there's any actual implementation,
I'd love to try it out. We need to have Windows 2k/XP or MacOS 9/X
style keyboard/IM switching mechanism/UI so that  keyboard/IM modules
targeted at/customized for each language can coexist and be brought up as
necessary. It appears that IIIMF seems to be the only way unless somebody
writes a gigantic one-fits-all XIM server for UTF-8 locale(s).

  How about just running your favorite XIM under ja_JP.EUC-JP while
all other applications are launched under ja_JP.UTF-8? As you know well,
it just works fine although the character repertoire you can enter
is limited to that of EUC-JP. Of course, this is not full-blown UTF-8
support, but at least it should give you the same degree of Japanese
input support under ja_JP.UTF-8 as under ja_JP.EUC-JP. Well, then
you would say what the point of moving to UTF-8 is. You can at least
display more characters  under UTF-8 than under EUC-JP, can't you? :-)

  In Korean case, as I wrote a couple of days ago, I had to
modify Ami (a popular Korean XIM) to make it run under ko_KR.UTF-8
because otherwise even though my applications are running under and
fully aware of UTF-8 (e.g. vim under UTF-8 xterm), I couldn't enter
over 8,000 Hangul syllables not in EUC-KR but in UTF-8.  Moreover,
under ko_KR.UTF-8, Xterm-16x and Vim 6.1 with a single line patch  works
almost flawlessly with U+1100 Hangul Jamos. Markus, can you update your
UTF-8 FAQ on this issue?  Xterm has been supporting Thai script and that
certainly brought in almost automagically Middle Korean support as
a by-product.

  BTW, Xkb may work for Korean Hangul, too and we don't need
XIM  if we use 'three-set keyboard' instead of 'two-set keyboard' and can
live without Hanjas.  I have to know more about Xkb to be certain, though.

 Or, any software-specific input methods (like Emacs or Yudit)?

  Yudit supports Indic, Thai, Arabic pretty well as far as I know.
And, judging from what Gaspar wrote to me, Middle Korean support with
U+1100 jamo is not so far away. Most of what's necessary is firmly in
place because Gaspar has written a very generic complex script support
routines which hopefully can be used for Middle Korean without much
effort.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/