subject:"One key stroke \-\-> two code\-points"

Re: One key stroke --> two code-points

2008-06-15 Thread Nguyen Thai Ngoc Duy

On 6/15/08, Clytie Siddall <[EMAIL PROTECTED]> wrote:
> > However, if there is a need for decomposed forms anyway, it is good know
> about it.
> >
>
>  I don't think there's much of a need, but there are definitely still
> decomposed layouts and old input versions around, and especially old fonts.
> We quite often get "bug" reports because people are still using pre-Unicode
> fonts.

It was from many years ago when people dropped custom charsets (VNI,
TCVN...) in favor of unicode. There were two groups: one favors
precomposed form, the other prefers composed form. There was no
standardization so people just used what they liked. All people is
using precomposed form now, I believe.

> >
> >
> > For Vietnamese, it is important to look at the xkeyboard-config project
> and check
> > what does default layout do, and that it is a reasonable choice.
> >
>
>  OK. I'll try to chase that up. However, Duy is probably the best person to
> do that, because he has been involved with input software. Duy, do you have
> time to check this? (The original discussion is pasted below, for
> reference.)

It maps alphanumeric letters to precomposed form letters. But I don't
think this layout really matters in real life. Vietnamese input
methods have a long traddition of using theirs own way: US layout, no
dead key, no preediting window, usually emiting backspace to modify
some letters. These input methods operate on "word" level instead of
letter level, which resembles how Vietnamese text is written.
-- 
Duy
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

Re: One key stroke --> two code-points

2008-06-15 Thread Clytie Siddall


Thanks for your prompt and helpful reply, Simos. :)

On 15/06/2008, at 12:55 AM, Simos Xenitellis wrote:


O/H Clytie Siddall έγραψε:
Just checking: so this problem does not affect languages using  
precomposed Unicode?


Vietnamese users _should_ be using precomposed forms for our added  
and combined diacritics. But I wonder if we should be ready for the  
fact that they might not. I was using a keyboard layout for a while  
which was decomposed, and I didn't know it. That could happen to  
others, too.

With precomposed characters, the compose sequences look like

--->  single codepoint

Producing a single codepoint is well defined, and has been available  
from the start.


When no precomposed forms exist, then

--->  codepointA, codepointB

This was not used in the X.Org Compose file (the Khmer compose  
sequences, first such sequences,

were added to X.Org just a few days back).

One thing I do not know about the Vietnamese written language is,
are there characters (with combined diacritics) that no  
corresponding precomposed forms exist?
That is, do characters exist that you cannot type them using the  
typical dead keys?


No, despite the fact that our glyphs are scattered all over the  
Unicode plane, there are precomposed forms for all our characters.



However, if there is a need for decomposed forms anyway, it is good  
know about it.


I don't think there's much of a need, but there are definitely still  
decomposed layouts and old input versions around, and especially old  
fonts. We quite often get "bug" reports because people are still using  
pre-Unicode fonts.



For Vietnamese, it is important to look at the xkeyboard-config  
project and check

what does default layout do, and that it is a reasonable choice.


OK. I'll try to chase that up. However, Duy is probably the best  
person to do that, because he has been involved with input software.  
Duy, do you have time to check this? (The original discussion is  
pasted below, for reference.)


from Clytie

Vietnamese Free Software Translation Team
http://vnoss.net/dokuwiki/doku.php?id=projects:l10n



On 10/06/2008, at 2:35 PM, Anousak Souphavanh wrote:


Thanks, Simos for your kind and time.

Much appreciated to Javier for brought a good solution indeed.

Lao input method  is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.


On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<[EMAIL PROTECTED]> wrote:

O/H Javier SOLA έγραψε:


Thanks Simos !!

Actually, we have had these additions for a while in X11.


Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
does not show these lines at the end. It is possible that these  
compose

sequences were added as a patch to the distribution package.


We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.


In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup

However, your case of compose sequences is different from the  
existing
compose sequences, that result to a single codepoint (you require  
to produce

two codepoints).

Therefore, the type of support you are looking for is similar to  
compose
sequences that result to letter+diacritic mark. Several languages  
have
characters that no pre-composed  letters exist, so the compose  
sequence
produces letter+diacritic marks (more than one codepoint). Such  
support is

missing, and there are already bug reports for them.

Bug 341341 – Compose mechanism in simple input method doesn't  
support

decomposed forms
http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining  
characters

http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of compose
sequences, thus the solution I expect to be different from the  
Khmer compose

sequences.
Specifically, for the Latin compose sequences, such as (it's a  
made up

example)

  : "t́" # LETTER T WITH ACUTE

one could convert to something like[ dead_acute, 't', 0].
We would put 0 for the resulting codepoint because we can deduce  
for this
category of compose sequences that the actual codepoints are 't'  
and 'acute'

(the resulting codepoints match the body of the compose sequence).

However, for the case of Khmer, the compose sequences look  
independent from
the resulting code points. Therefore, a new table should be  
required.


To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+  
codepoints

http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos



Thanks,

Javier

Simos Xenitellis wrote


O/H Javier SOLA έγραψε:


Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the  
keyboard

Re: One key stroke --> two code-points

2008-06-14 Thread Simos Xenitellis


O/H Clytie Siddall έγραψε:
Just checking: so this problem does not affect languages using 
precomposed Unicode?


Vietnamese users _should_ be using precomposed forms for our added and 
combined diacritics. But I wonder if we should be ready for the fact 
that they might not. I was using a keyboard layout for a while which 
was decomposed, and I didn't know it. That could happen to others, too.

With precomposed characters, the compose sequences look like

--->  single codepoint

Producing a single codepoint is well defined, and has been available 
from the start.


When no precomposed forms exist, then

--->  codepointA, codepointB

This was not used in the X.Org Compose file (the Khmer compose 
sequences, first such sequences,

were added to X.Org just a few days back).

One thing I do not know about the Vietnamese written language is,
are there characters (with combined diacritics) that no corresponding 
precomposed forms exist?
That is, do characters exist that you cannot type them using the typical 
dead keys?


However, if there is a need for decomposed forms anyway, it is good know 
about it.


For Vietnamese, it is important to look at the xkeyboard-config project 
and check

what does default layout do, and that it is a reasonable choice.

Simos


Clytie

On 10/06/2008, at 2:35 PM, Anousak Souphavanh wrote:


Thanks, Simos for your kind and time.

Much appreciated to Javier for brought a good solution indeed.

Lao input method  is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.


On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<[EMAIL PROTECTED]> wrote:

O/H Javier SOLA έγραψε:


Thanks Simos !!

Actually, we have had these additions for a while in X11.


Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8 

does not show these lines at the end. It is possible that these 
compose

sequences were added as a patch to the distribution package.


We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.


In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup 



However, your case of compose sequences is different from the existing
compose sequences, that result to a single codepoint (you require 
to produce

two codepoints).

Therefore, the type of support you are looking for is similar to 
compose

sequences that result to letter+diacritic mark. Several languages have
characters that no pre-composed  letters exist, so the compose 
sequence
produces letter+diacritic marks (more than one codepoint). Such 
support is

missing, and there are already bug reports for them.

Bug 341341 – Compose mechanism in simple input method doesn't support
decomposed forms
http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining characters
http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of compose
sequences, thus the solution I expect to be different from the 
Khmer compose

sequences.
Specifically, for the Latin compose sequences, such as (it's a made up
example)

  : "t́" # LETTER T WITH ACUTE

one could convert to something like[ dead_acute, 't', 0].
We would put 0 for the resulting codepoint because we can deduce 
for this
category of compose sequences that the actual codepoints are 't' 
and 'acute'

(the resulting codepoints match the body of the compose sequence).

However, for the case of Khmer, the compose sequences look 
independent from

the resulting code points. Therefore, a new table should be required.

To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+ codepoints
http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos



Thanks,

Javier

Simos Xenitellis wrote


O/H Javier SOLA έγραψε:


Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the 
keyboard)
require two code-points, so one keystroke must generate two code 
points.


It used to be that we could do the conversion in KBX by 
generating a
fictious code-point (Pablo Saratxaga explained this to us a few 
years ago),
which was later translated to two real code-points by puting the 
conversion

in the en-US locale file. I did work at the time.

But now this seems to have stopped working. Does anybody knows 
how we

can fix this?


These additions (pressing a single key and producing two 
codepoints), are

located at
/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are 
defined

# in this file

:   "ុះ"
:   "ុំ"
:   "េះ"
:   "ោះ"
:   "ាំ"

GTK+ based applications duplicate the Compose file in the gtk+ 
library

Re: One key stroke --> two code-points

2008-06-14 Thread Clytie Siddall

Just checking: so this problem does not affect languages using  
precomposed Unicode?


Vietnamese users _should_ be using precomposed forms for our added and  
combined diacritics. But I wonder if we should be ready for the fact  
that they might not. I was using a keyboard layout for a while which  
was decomposed, and I didn't know it. That could happen to others, too.


Clytie

On 10/06/2008, at 2:35 PM, Anousak Souphavanh wrote:


Thanks, Simos for your kind and time.

Much appreciated to Javier for brought a good solution indeed.

Lao input method  is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.


On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<[EMAIL PROTECTED]> wrote:

O/H Javier SOLA έγραψε:


Thanks Simos !!

Actually, we have had these additions for a while in X11.


Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
does not show these lines at the end. It is possible that these  
compose

sequences were added as a patch to the distribution package.


We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.


In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup

However, your case of compose sequences is different from the  
existing
compose sequences, that result to a single codepoint (you require  
to produce

two codepoints).

Therefore, the type of support you are looking for is similar to  
compose
sequences that result to letter+diacritic mark. Several languages  
have
characters that no pre-composed  letters exist, so the compose  
sequence
produces letter+diacritic marks (more than one codepoint). Such  
support is

missing, and there are already bug reports for them.

Bug 341341 – Compose mechanism in simple input method doesn't  
support

decomposed forms
http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining  
characters

http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of compose
sequences, thus the solution I expect to be different from the  
Khmer compose

sequences.
Specifically, for the Latin compose sequences, such as (it's a  
made up

example)

  : "t́" # LETTER T WITH ACUTE

one could convert to something like[ dead_acute, 't', 0].
We would put 0 for the resulting codepoint because we can deduce  
for this
category of compose sequences that the actual codepoints are 't'  
and 'acute'

(the resulting codepoints match the body of the compose sequence).

However, for the case of Khmer, the compose sequences look  
independent from
the resulting code points. Therefore, a new table should be  
required.


To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+  
codepoints

http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos



Thanks,

Javier

Simos Xenitellis wrote


O/H Javier SOLA έγραψε:


Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the  
keyboard)
require two code-points, so one keystroke must generate two  
code points.


It used to be that we could do the conversion in KBX by  
generating a
fictious code-point (Pablo Saratxaga explained this to us a few  
years ago),
which was later translated to two real code-points by puting  
the conversion

in the en-US locale file. I did work at the time.

But now this seems to have stopped working. Does anybody knows  
how we

can fix this?


These additions (pressing a single key and producing two  
codepoints), are

located at
/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are  
defined

# in this file

:   "ុះ"
:   "ុំ"
:   "េះ"
:   "ោះ"
:   "ាំ"

GTK+ based applications duplicate the Compose file in the gtk+  
library,
and currently the version of the Compose file that exists in gtk 
+ does not

include those specific compose sequences.
I think these are a recent addition.
Technically, it is possible for gtk+ to include compose  
sequences that
produce more than one code points (requires small change in the  
code),
however these recent Khmer digraphs are the only compose  
sequences using the

facility now.

To cut the long story short, you can bypass for now the GTK+  
version of
the Compose file and use the Compose file that comes with X.Org  
(shown

above) by setting the environment variable GTK_IM_MODULE to "xim".
This should not have adverse effect to the OLPC software.

It is important that if other keyboard layouts as well require  
compose

sequences that produce
two or more codepoints (such as Serbian), to add them to the  
XOrg Compose
file. In the next iteration of u

Re: One key stroke --> two code-points

2008-06-09 Thread Anousak Souphavanh

 Thanks, Simos for your kind and time.

 Much appreciated to Javier for brought a good solution indeed.

 Lao input method  is need a similar solution. Javier please post your
 solution (where and how to define a new table for Khmer) so I can
 define these code points for Lao.

 Thanks,
 Anousak
 The Lao team

> On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
> <[EMAIL PROTECTED]> wrote:
>> O/H Javier SOLA έγραψε:
>>>
>>> Thanks Simos !!
>>>
>>> Actually, we have had these additions for a while in X11.
>>
>> Hi Javier,
>>
>> Checking at
>> http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
>> does not show these lines at the end. It is possible that these compose
>> sequences were added as a patch to the distribution package.
>>>
>>> We will  do an issue for GTK+, and use the variable meanwhile.
>>>
>>> What file is it in GTK+? I have not been able to find it.
>>
>> In GTK+ (HEAD), the relevant file is
>> http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup
>>
>> However, your case of compose sequences is different from the existing
>> compose sequences, that result to a single codepoint (you require to produce
>> two codepoints).
>>
>> Therefore, the type of support you are looking for is similar to compose
>> sequences that result to letter+diacritic mark. Several languages have
>> characters that no pre-composed  letters exist, so the compose sequence
>>  produces letter+diacritic marks (more than one codepoint). Such support is
>> missing, and there are already bug reports for them.
>>
>> Bug 341341 – Compose mechanism in simple input method doesn't support
>> decomposed forms
>> http://bugzilla.gnome.org/show_bug.cgi?id=341341
>>
>> Bug 345254 – dead accents should at least produce combining characters
>> http://bugzilla.gnome.org/show_bug.cgi?id=345254
>>
>> There is a shortcut when trying to solve the above cases of compose
>> sequences, thus the solution I expect to be different from the Khmer compose
>> sequences.
>> Specifically, for the Latin compose sequences, such as (it's a made up
>> example)
>>
>>   : "t́" # LETTER T WITH ACUTE
>>
>> one could convert to something like[ dead_acute, 't', 0].
>> We would put 0 for the resulting codepoint because we can deduce for this
>> category of compose sequences that the actual codepoints are 't' and 'acute'
>> (the resulting codepoints match the body of the compose sequence).
>>
>> However, for the case of Khmer, the compose sequences look independent from
>> the resulting code points. Therefore, a new table should be required.
>>
>> To cut the story short, I have filed a bug report for this,
>> Bug 537457 – Support compose sequences that produce two+ codepoints
>> http://bugzilla.gnome.org/show_bug.cgi?id=537457
>>
>> Simos
>>
>>>
>>> Thanks,
>>>
>>> Javier
>>>
>>> Simos Xenitellis wrote

 O/H Javier SOLA έγραψε:
>
> Hi,
>
> I am working on Khmer localization (KhmerOS project).
>
> In Khmer, some of the basic vowels (which we include in the keyboard)
> require two code-points, so one keystroke must generate two code points.
>
> It used to be that we could do the conversion in KBX by generating a
> fictious code-point (Pablo Saratxaga explained this to us a few years 
> ago),
> which was later translated to two real code-points by puting the 
> conversion
> in the en-US locale file. I did work at the time.
>
> But now this seems to have stopped working. Does anybody knows how we
> can fix this?

 These additions (pressing a single key and producing two codepoints), are
 located at
 /usr/share/X11/locale/en_US.UTF-8/Compose
 The specific lines appear to be

 # Khmer digraphs
 # A keystroke has to generate several characters, so they are defined
 # in this file

 :   "ុះ"
 :   "ុំ"
 :   "េះ"
 :   "ោះ"
 :   "ាំ"

 GTK+ based applications duplicate the Compose file in the gtk+ library,
 and currently the version of the Compose file that exists in gtk+ does not
 include those specific compose sequences.
 I think these are a recent addition.
 Technically, it is possible for gtk+ to include compose sequences that
 produce more than one code points (requires small change in the code),
 however these recent Khmer digraphs are the only compose sequences using 
 the
 facility now.

 To cut the long story short, you can bypass for now the GTK+ version of
 the Compose file and use the Compose file that comes with X.Org (shown
 above) by setting the environment variable GTK_IM_MODULE to "xim".
 This should not have adverse effect to the OLPC software.

 It is important that if other keyboard layouts as well require compose
 sequences that produce
 two or more codepoints (such as Serbian), to add them to the XOrg Compose
 file. In the next iteration of update of the GTK+, all the

Re: One key stroke --> two code-points

2008-06-09 Thread Anousak Souphavanh

Thanks, Simos for your kind and time.

Much appreciated to Javier for brought a good solution indeed.

Lao input method  is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.

Thanks,
Anousak
The Lao team

On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<[EMAIL PROTECTED]> wrote:
> O/H Javier SOLA έγραψε:
>>
>> Thanks Simos !!
>>
>> Actually, we have had these additions for a while in X11.
>
> Hi Javier,
>
> Checking at
> http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
> does not show these lines at the end. It is possible that these compose
> sequences were added as a patch to the distribution package.
>>
>> We will  do an issue for GTK+, and use the variable meanwhile.
>>
>> What file is it in GTK+? I have not been able to find it.
>
> In GTK+ (HEAD), the relevant file is
> http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup
>
> However, your case of compose sequences is different from the existing
> compose sequences, that result to a single codepoint (you require to produce
> two codepoints).
>
> Therefore, the type of support you are looking for is similar to compose
> sequences that result to letter+diacritic mark. Several languages have
> characters that no pre-composed  letters exist, so the compose sequence
>  produces letter+diacritic marks (more than one codepoint). Such support is
> missing, and there are already bug reports for them.
>
> Bug 341341 – Compose mechanism in simple input method doesn't support
> decomposed forms
> http://bugzilla.gnome.org/show_bug.cgi?id=341341
>
> Bug 345254 – dead accents should at least produce combining characters
> http://bugzilla.gnome.org/show_bug.cgi?id=345254
>
> There is a shortcut when trying to solve the above cases of compose
> sequences, thus the solution I expect to be different from the Khmer compose
> sequences.
> Specifically, for the Latin compose sequences, such as (it's a made up
> example)
>
>   : "t́" # LETTER T WITH ACUTE
>
> one could convert to something like[ dead_acute, 't', 0].
> We would put 0 for the resulting codepoint because we can deduce for this
> category of compose sequences that the actual codepoints are 't' and 'acute'
> (the resulting codepoints match the body of the compose sequence).
>
> However, for the case of Khmer, the compose sequences look independent from
> the resulting code points. Therefore, a new table should be required.
>
> To cut the story short, I have filed a bug report for this,
> Bug 537457 – Support compose sequences that produce two+ codepoints
> http://bugzilla.gnome.org/show_bug.cgi?id=537457
>
> Simos
>
>>
>> Thanks,
>>
>> Javier
>>
>> Simos Xenitellis wrote
>>>
>>> O/H Javier SOLA έγραψε:

 Hi,

 I am working on Khmer localization (KhmerOS project).

 In Khmer, some of the basic vowels (which we include in the keyboard)
 require two code-points, so one keystroke must generate two code points.

 It used to be that we could do the conversion in KBX by generating a
 fictious code-point (Pablo Saratxaga explained this to us a few years ago),
 which was later translated to two real code-points by puting the conversion
 in the en-US locale file. I did work at the time.

 But now this seems to have stopped working. Does anybody knows how we
 can fix this?
>>>
>>> These additions (pressing a single key and producing two codepoints), are
>>> located at
>>> /usr/share/X11/locale/en_US.UTF-8/Compose
>>> The specific lines appear to be
>>>
>>> # Khmer digraphs
>>> # A keystroke has to generate several characters, so they are defined
>>> # in this file
>>>
>>> :   "ុះ"
>>> :   "ុំ"
>>> :   "េះ"
>>> :   "ោះ"
>>> :   "ាំ"
>>>
>>> GTK+ based applications duplicate the Compose file in the gtk+ library,
>>> and currently the version of the Compose file that exists in gtk+ does not
>>> include those specific compose sequences.
>>> I think these are a recent addition.
>>> Technically, it is possible for gtk+ to include compose sequences that
>>> produce more than one code points (requires small change in the code),
>>> however these recent Khmer digraphs are the only compose sequences using the
>>> facility now.
>>>
>>> To cut the long story short, you can bypass for now the GTK+ version of
>>> the Compose file and use the Compose file that comes with X.Org (shown
>>> above) by setting the environment variable GTK_IM_MODULE to "xim".
>>> This should not have adverse effect to the OLPC software.
>>>
>>> It is important that if other keyboard layouts as well require compose
>>> sequences that produce
>>> two or more codepoints (such as Serbian), to add them to the XOrg Compose
>>> file. In the next iteration of update of the GTK+, all these compose
>>> sequences can make it in.
>>>
>>> Simos
>>>
>>>
>>
>>
>
> ___
> gnome-i18n mailing list
>

Re: One key stroke --> two code-points

2008-06-09 Thread Jens Herden

Hi Simos,

> Checking at
> http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF
>-8 does not show these lines at the end. It is possible that these compose
> sequences were added as a patch to the distribution package.

yes indeed, this addition was made by Suse for their package. I am not aware 
if any other distro picked this up too.
But I have filed a bug-report for Xorg about this, which is not resolved yet:
https://bugs.freedesktop.org/show_bug.cgi?id=5706

Jens
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

Re: One key stroke --> two code-points

2008-06-09 Thread Simos Xenitellis


O/H Javier SOLA έγραψε:

Thanks Simos !!

Actually, we have had these additions for a while in X11.

Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
does not show these lines at the end. It is possible that these compose 
sequences were added as a patch to the distribution package.


We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.

In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup

However, your case of compose sequences is different from the existing 
compose sequences, that result to a single codepoint (you require to 
produce two codepoints).


Therefore, the type of support you are looking for is similar to compose 
sequences that result to letter+diacritic mark. Several languages have 
characters that no pre-composed  letters exist, so the compose sequence  
produces letter+diacritic marks (more than one codepoint). Such support 
is missing, and there are already bug reports for them.


Bug 341341 – Compose mechanism in simple input method doesn't support 
decomposed forms

http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining characters
http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of compose 
sequences, thus the solution I expect to be different from the Khmer 
compose sequences.
Specifically, for the Latin compose sequences, such as (it's a made up 
example)


  : "t́" # LETTER T WITH ACUTE

one could convert to something like[ dead_acute, 't', 0].
We would put 0 for the resulting codepoint because we can deduce for 
this category of compose sequences that the actual codepoints are 't' 
and 'acute' (the resulting codepoints match the body of the compose 
sequence).


However, for the case of Khmer, the compose sequences look independent 
from the resulting code points. Therefore, a new table should be required.


To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+ codepoints
http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos



Thanks,

Javier

Simos Xenitellis wrote

O/H Javier SOLA έγραψε:

Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the 
keyboard) require two code-points, so one keystroke must generate 
two code points.


It used to be that we could do the conversion in KBX by generating a 
fictious code-point (Pablo Saratxaga explained this to us a few 
years ago), which was later translated to two real code-points by 
puting the conversion in the en-US locale file. I did work at the time.


But now this seems to have stopped working. Does anybody knows how 
we can fix this? 
These additions (pressing a single key and producing two codepoints), 
are located at

/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are defined
# in this file

:   "ុះ"
:   "ុំ"
:   "េះ"
:   "ោះ"
:   "ាំ"

GTK+ based applications duplicate the Compose file in the gtk+ 
library, and currently the version of the Compose file that exists in 
gtk+ does not include those specific compose sequences.

I think these are a recent addition.
Technically, it is possible for gtk+ to include compose sequences 
that produce more than one code points (requires small change in the 
code), however these recent Khmer digraphs are the only compose 
sequences using the facility now.


To cut the long story short, you can bypass for now the GTK+ version 
of the Compose file and use the Compose file that comes with X.Org 
(shown above) by setting the environment variable GTK_IM_MODULE to 
"xim".

This should not have adverse effect to the OLPC software.

It is important that if other keyboard layouts as well require 
compose sequences that produce
two or more codepoints (such as Serbian), to add them to the XOrg 
Compose file. In the next iteration of update of the GTK+, all these 
compose sequences can make it in.


Simos







___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

Re: One key stroke --> two code-points

2008-06-09 Thread Javier SOLA


Thanks Simos !!

Actually, we have had these additions for a while in X11.

We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.

Thanks,

Javier

Simos Xenitellis wrote

O/H Javier SOLA έγραψε:

Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the keyboard) 
require two code-points, so one keystroke must generate two code points.


It used to be that we could do the conversion in KBX by generating a 
fictious code-point (Pablo Saratxaga explained this to us a few years 
ago), which was later translated to two real code-points by puting 
the conversion in the en-US locale file. I did work at the time.


But now this seems to have stopped working. Does anybody knows how we 
can fix this? 
These additions (pressing a single key and producing two codepoints), 
are located at

/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are defined
# in this file

:   "ុះ"
:   "ុំ"
:   "េះ"
:   "ោះ"
:   "ាំ"

GTK+ based applications duplicate the Compose file in the gtk+ 
library, and currently the version of the Compose file that exists in 
gtk+ does not include those specific compose sequences.

I think these are a recent addition.
Technically, it is possible for gtk+ to include compose sequences that 
produce more than one code points (requires small change in the code), 
however these recent Khmer digraphs are the only compose sequences 
using the facility now.


To cut the long story short, you can bypass for now the GTK+ version 
of the Compose file and use the Compose file that comes with X.Org 
(shown above) by setting the environment variable GTK_IM_MODULE to "xim".

This should not have adverse effect to the OLPC software.

It is important that if other keyboard layouts as well require compose 
sequences that produce
two or more codepoints (such as Serbian), to add them to the XOrg 
Compose file. In the next iteration of update of the GTK+, all these 
compose sequences can make it in.


Simos





___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

Re: One key stroke --> two code-points

2008-06-09 Thread Simos Xenitellis


O/H Javier SOLA έγραψε:

Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the keyboard) 
require two code-points, so one keystroke must generate two code points.


It used to be that we could do the conversion in KBX by generating a 
fictious code-point (Pablo Saratxaga explained this to us a few years 
ago), which was later translated to two real code-points by puting the 
conversion in the en-US locale file. I did work at the time.


But now this seems to have stopped working. Does anybody knows how we 
can fix this? 
These additions (pressing a single key and producing two codepoints), 
are located at

/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are defined
# in this file

:   "ុះ"
:   "ុំ"
:   "េះ"
:   "ោះ"
:   "ាំ"

GTK+ based applications duplicate the Compose file in the gtk+ library, 
and currently the version of the Compose file that exists in gtk+ does 
not include those specific compose sequences.

I think these are a recent addition.
Technically, it is possible for gtk+ to include compose sequences that 
produce more than one code points (requires small change in the code), 
however these recent Khmer digraphs are the only compose sequences using 
the facility now.


To cut the long story short, you can bypass for now the GTK+ version of 
the Compose file and use the Compose file that comes with X.Org (shown 
above) by setting the environment variable GTK_IM_MODULE to "xim".

This should not have adverse effect to the OLPC software.

It is important that if other keyboard layouts as well require compose 
sequences that produce
two or more codepoints (such as Serbian), to add them to the XOrg 
Compose file. In the next iteration of update of the GTK+, all these 
compose sequences can make it in.


Simos


___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

One key stroke --> two code-points

2008-06-08 Thread Javier SOLA


Hi,

I am working on Khmer localization (KhmerOS project).

In Khmer, some of the basic vowels (which we include in the keyboard) 
require two code-points, so one keystroke must generate two code points.


It used to be that we could do the conversion in KBX by generating a 
fictious code-point (Pablo Saratxaga explained this to us a few years 
ago), which was later translated to two real code-points by puting the 
conversion in the en-US locale file. I did work at the time.


But now this seems to have stopped working. Does anybody knows how we 
can fix this?


Thanks,

Javier
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

Re: One key stroke --> two code-points

One key stroke --> two code-points

11 matches

Site Navigation

Mail list logo

Footer information