Re: [I18n] some misuse in XLC_LOCALE file

Ivan Pascal Mon, 17 May 2004 03:50:06 -0700

  Hi,

> As to the idea of rewrite the xlib using iconv, is there a real working plan,
> or is it just an idea?


I would say there are some thoughts what and how can be done but no one line
of code yet.

Fist of all, I would like to keep existent i18n modules for a fallback (anyway
they are separate loadable modules and can coexist with new iconv-based module).
It means the changes in other Xlib's code should be as less as possible.

The thing is that Xlib's converters differs from iconv and can't be just
replaced but we need to wrap iconv calls into Xlib's converters.
The Xlib's converters operate with impersonal WideChar, MultiByte, etc. which
meaning depends on the current locale whereas iconv operates with exact
encoding names.  But MultiByte everywhere in Xlib means 'encoding_name' which
can be obatined from a locale name or some simple table like locale.dir and used
at the converter creation.  For WideChar we can take UCS2 or UCS4.  Thus it is
a simplest problem.

The second problem is the CompoundText conversion.  Standard iconv doesn't
support even iso2022 for all charsets.  But actually we have three cases for
non-Unicode encodings.  The non-standard encodings (charsets) is the simplest
case.  Their strings are being packed into 'extended segments' without any
changes but we need to know a multibyte length or a complete esc-sequence for
them.  The second case is the set of standard single-byte encodings that are
covered with one (strictly speaking - two) charset.  They doesn't need any
changes in strings but require a table of designators (esc-sequences).  And
the third case is CJK encodings that are represented in CTEXT with a few
different charsets.  But fortunately for them iconv has iso2022* encoding that
is almost the same as CTEXT an can be easy converted (or rather corrected)
to/from CTEXT.

But the worst case is the conversion from Unicode to CTEXT. The same Unicode
codepoint often can be coverted into many different charsets but will such
CTEXT be accepted by Unicode-unaware application depends on the locale used
in the application.  It isn't CJK problem only.  For example Russians still
use four different charsets for the same alphabet (the ISO standard for
cyrillic isn't widely used in Russia except commercial Unixes, in free Unixes
the Cyrillic standard is 'koi8' come from Soviet Union standards, but there
also is 'microsoft codepage 1251' that is often used in Unixes too).  Therefore
for this case we need some ordered list of 'preffered charsets' (or encodings
that can be converted into CTEXT) that should be configurable for a separate
locale.

Thus anyway we need some 'locale object' that is not the same as XLCD but
should keep 'encoding_name', en esc-sequence for non-standard encodings, the
name of iso2022* iconv code for locales that need it and the list of preffered
charsets for UTF-8 locales.

> What I wnat to do with the conversion part is to make related functioms get
> rid of XLCD binding. It's easy to achieve either by using iconv or by
> reusing code from lcCT.c and lcUTF8.c.

I don't see how lcCT.c and lcUTF8.c could help here.  The lcCT.c functions
are very oriented to existent XLCD data and it has unplesant restriction.
It is able to convert CTEXT to multibyte only if charset used in CTEXT presents
in the current locale XLCD.  If there are two application using Cyrillic but
on of them uses 'koi8' whereas the another one uses 'microsoft-cp1251' for
lcCT it means they are talking on absolutely different languages.
The lcUTF8.c module actually is iconv but with reduced set of supported
encodings.


> From CTEXT conversion has no problem.
> To CTEXT conversion need charset selection, which can be solve by either
> always use ISO10646-1
I don't think it is a way.  If an application is Unicode-aware it should use
UTF8_STRING for an inter-client communication.  If some application doesn't
know UTF8_STRING it is more likely it doesn't accept ISO10646 encapsulated
into CTEXT.

> or read preffered charset from localeDB without
> trigger XLCD object constructed.

Right.  But somewhere we have to have such preffered charsets list per locale.

> Input related part is most complicated part. merely considering XIM part,
> protocol, imdkit and client side library all should be enhanced. But this
> make things out of control. Changing client side library only and cheating
> IM server at some point may be a temporary resolution. Maybe IIIMF should
> became mainstream, but I think more research should be done on this point.

Agree.

> In fact, I want more here. I think three kind of input methods, namely
> keyboard mapping, composing and IM server, should have a common point
> to manage. A consistent switch method among different input methods should
> be offered, like Windows does.

I never use complex input methods in Windows.  Where can I read about them or
what I have to install on Windows box to see this 'consistent switch method'?

> I noticed the recent discussion on composing
> method in this maillist. What Kent Karlsson purposed is obviously coming
> from Windows. But his suggestion can't be fulfilled within current mechanism.
> I rember you worked to make keboard mapping and composing be synthesized on
> X server side some time ago.

No.  Probably you mean I said somewhere that it would be good if Compose rules
were a part of a keyboard map and be kept on a server with other XKB tables. :)
But I didn't mean all mapping and composing should be performed on the server
side.  And didn't work on it because it require protocol changes that I don't
dare to do yet.

> But I think the right point shoud be on client
> side. X server does too much on keyboard mapping. Mapping info and group
> switching process should be put in client side, and be load and set per
> client.

What do you mean saying "too much"?  Now the mapping itself is performed on
the client side, let alone the composing.  The server sends notifications
(events) about key press/release but reports a keycode (scan-code) and a state
of modifiers only but not a keysyms or Unicode chars as the mapping result.
The symbols map kept in the server is not used by the server itself but an
application can obtain it from such 'centralized repository'.  In theory any
application is free to use any other symbols map obtained from any other
source.  But Xlib doesn't provide an API for it.  It silently loads the map
from the server at the first call of XLookupString.  And what is worse is that
if some application wants another symbols map there is not way exept to load
new map into the server and all other applications get a notification and
reload own maps too.

And anyway the side (client or server) where the mapping is processed is
a detail of implementation only, that affects protocol but not an API.
The more important thing is what we want from this API.  I guess you want
a flexibility that allows different applications have different symbols maps
and use different 'input language engines'.  But if a server could keep as
many different maps as needed, remember what map for what client connection
or window is used and switch them internaly with a focus changes, I think it
is not worse than the client side mapping.

And don't forget about an opposite side.  Many people would like to choose
a language (or a country) once at the system installation and never think
about different keymaps, locales, etc.  even if they run client applications
from different machines.  I remember someone complained that when he run
clients from other computers they use the same keymap but the compose rules
are different or even don't work for some clients.  I suppose it is a serious
reson to have a 'centralized repository' for keymaps and compose rules (and
IMEs if it's possible).  On the other hand it doesn't mean applications should
not be able to load keymaps or connect IMEs from other sources.

As for group switching I think there are differrent opinions here too.  I wrote
a small utility that is a keyboard group switcher and indicator.  One of its
main features is that it can remember an XKB group for each window separately
and change them automaticaly with a focus moving.  This feature is switched
on by default.  But I had questions from some users how to switch it off
because they don't like to remember the group in each window but easy
remember what is the current 'global' group state and want a simple indicator
only. :)

> In my ideal situation, an application written with new API only (I mean i18n
> related) will use Unicode internal, and doesn't have XLCD object created.
> However, old applications work just as they used to be.

As I said if a group of applications uses Unicode internaly, UTF8_STRING for
a communication, Xft for output and don't use complex IM methods (servers)
they don't need any additional i18n API (except, maybe, keysym to UCS table).

-- 
 Ivan U. Pascal         |   e-mail: [EMAIL PROTECTED]
   Administrator of     |   Tomsk State University
     University Network |       Tomsk, Russia
_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] some misuse in XLC_LOCALE file

Reply via email to