Re: Static compose table in gtkimcontextsimple.c

2008-01-31 Thread Simos Xenitellis

On Tue, 2007-12-04 at 08:31 +0200, Tor Lillqvist wrote:
> > GDK_dead_circumflex, GDK_C, 0, 0, 0, 0x0108, /* 
> > LATIN_CAPITAL_LETTER_C_WITH_CIRCUMFLEX */
> > [...]
> > GDK_dead_circumflex, GDK_c, 0, 0, 0, 0x0109, /* 
> > LATIN_SMALL_LETTER_C_WITH_CIRCUMFLEX */
> > [...]
> 
> The sequences you list are exactly of the straightforward kind that in
> my opinion can and should be handled algorithmically. I.e. a "dead"
> accent followed by a letter can be mapped to the corresponding
> precomposed character without an explicit table. I have a patch in bug
> #321896 that implements such an algorithm (and which would handle your
> cases, too.) Basically it's waiting for a second opinion from the GTK+
> maintainers.

I made two small changes to the patch (now at #321896):
1. if diacritic marks belong to the same combining class, normalisation
does not reorder them, so we need to try out all permutations then
attempt to normalise again.
2. added a check if the compose sequence is overlong; otherwise one can
type up too many dead keys, and overflow the buffer.

I added a script at #321896 as well that parses UnicodeData.txt, checks
and counts all characters that can be taken care of by the algorithmic
function. They are about 1000 of them.

Simos

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Tor Lillqvist
I wrote:
> > a "dead"
> > accent followed by a letter can be mapped to the corresponding
> > precomposed character without an explicit table.

On 06/12/2007, Paul LeoNerd Evans <[EMAIL PROTECTED]> wrote:
> Really..? Last time I checked, the precomposed letters weren't in any
> particularly easy-to-find locations;

Well, obviously there has to be some tables somewhere (in GLib's case
I guess it's in the generated header files like gunicomp.h), but I
meant, the information shouldn't have to be effectively duplicated in
gtk.

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Owen Taylor

On Thu, 2007-12-06 at 17:30 +, Paul LeoNerd Evans wrote:
> On Thu, 06 Dec 2007 12:12:39 -0500
> Owen Taylor <[EMAIL PROTECTED]> wrote:
> 
> > Note also that loading /usr/share/X11/locale/en_US.UTF-8/Compose
> 
> That's not quite what I meant.
> 
> What I meant was, I thought that the X11 server did some of this work
> for us? So can we not ask it to do that?
> 
> Or have I misunderstood how it works, and that this is really a
> clientside thing done by Xlib?

The latter.

- Owen



signature.asc
Description: This is a digitally signed message part
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Paul LeoNerd Evans
On Thu, 06 Dec 2007 12:12:39 -0500
Owen Taylor <[EMAIL PROTECTED]> wrote:

> Note also that loading /usr/share/X11/locale/en_US.UTF-8/Compose

That's not quite what I meant.

What I meant was, I thought that the X11 server did some of this work
for us? So can we not ask it to do that?

Or have I misunderstood how it works, and that this is really a
clientside thing done by Xlib?

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Owen Taylor
On Thu, 2007-12-06 at 12:28 +, Paul LeoNerd Evans wrote:
> On Tue, 04 Dec 2007 05:38:56 +
> Simos Xenitellis <[EMAIL PROTECTED]> wrote:
> 
> > If you would like to help with bug 321896 it would be great. The current 
> > state is on how to make the table much smaller, even with the addition of
> > more keysyms. There is a script that converts en_US.UTF-8/Compose into a
> > series of arrays that should be easy for GTK+ to work on. 
> 
> OK, I've had a good read through that bug, and now I'm confused again.
> 
> Can someone explain why GTK has to have this large table compiled into
> it..? I thought X itself provided ways to perform input composition into
> Unicode strings. Otherwise, why do I have a file
> 
>   /usr/share/X11/locale/en_US.UTF-8/Compose
> 
> Can we just use that?

Note also that loading /usr/share/X11/locale/en_US.UTF-8/Compose causes
a large amount of per-process memory to be allocated, and quite a bit of
time spent parsing it. While the GTK+ table is "large", it is mapped
read-only so shared between all GTK+ applications. (*) (**)

I don't have any exact or recent numbers here; the Compose table was a
significant fraction of the per-process overhead when I measured it
before writing gtkimcontextsimple.c, and current UTF-8 table is much
bigger than anything I measured. On the other hand, it's possible that
optimization has been done within Xlib in the subsequent 5-6 years.

The original motivations in order of priority:

 1. Reliable compose sequences in non-UTF-8 locales
 2. Efficiency
 3. Cross-platform portability
 
1. is luckily no longer an issue, but the two still apply.

- Owen

(*) The Xlib problem could obviously be fixed by precompiling and
  mem-mapping the Compose tables, as we do for similiar things

(**) The one thing to be careful about when modifying
gtkimcontextsimple.c is not to save "size" by introducing relocations.
Arrays that include pointers to other arrays cannot be mapped read-only.
Other than that, go for it!



signature.asc
Description: This is a digitally signed message part
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Paul LeoNerd Evans
On Tue, 04 Dec 2007 05:38:56 +
Simos Xenitellis <[EMAIL PROTECTED]> wrote:

> If you would like to help with bug 321896 it would be great. The current 
> state is on how to make the table much smaller, even with the addition of
> more keysyms. There is a script that converts en_US.UTF-8/Compose into a
> series of arrays that should be easy for GTK+ to work on. 

OK, I've had a good read through that bug, and now I'm confused again.

Can someone explain why GTK has to have this large table compiled into
it..? I thought X itself provided ways to perform input composition into
Unicode strings. Otherwise, why do I have a file

  /usr/share/X11/locale/en_US.UTF-8/Compose

Can we just use that?

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Paul LeoNerd Evans
On Tue, 4 Dec 2007 08:31:30 +0200
"Tor Lillqvist" <[EMAIL PROTECTED]> wrote:

> The sequences you list are exactly of the straightforward kind that in
> my opinion can and should be handled algorithmically. I.e. a "dead"
> accent followed by a letter can be mapped to the corresponding
> precomposed character without an explicit table.

Really..? Last time I checked, the precomposed letters weren't in any
particularly easy-to-find locations; I looked them up by typing them in
xterm and seeing what unicode sequences were generated.

> I have a patch in bug #321896 that implements such an algorithm (and
> which would handle your cases, too.) Basically it's waiting for a
> second opinion from the GTK+ maintainers.

Perhaps we could subtly poke them here then to remind them? :)

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Matthias Clasen
On Dec 6, 2007 8:22 AM, Simos Xenitellis <[EMAIL PROTECTED]> wrote:

> I just compiled Tor's working patch which actually eliminates most of
> the compose sequences and it is amazing in the way it simplifies the work.
> I think it is the way to go once the small issues are resolved.

Thanks for staying on this issue for so long, SImos. It will be good
to have this finally
resolved.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-06 Thread Simos Xenitellis

On Thu, 2007-12-06 at 12:28 +, Paul LeoNerd Evans wrote:
> On Tue, 04 Dec 2007 05:38:56 +
> Simos Xenitellis <[EMAIL PROTECTED]> wrote:
> 
> > If you would like to help with bug 321896 it would be great. The current 
> > state is on how to make the table much smaller, even with the addition of
> > more keysyms. There is a script that converts en_US.UTF-8/Compose into a
> > series of arrays that should be easy for GTK+ to work on. 
> 
> OK, I've had a good read through that bug, and now I'm confused again.
> 
> Can someone explain why GTK has to have this large table compiled into
> it..? I thought X itself provided ways to perform input composition into
> Unicode strings. Otherwise, why do I have a file
> 
>   /usr/share/X11/locale/en_US.UTF-8/Compose
> 
> Can we just use that?

There are issues on GTK+ running on other platforms that require to have a 
separate copy. 
Having the file contents in the library as static data is good for performance 
and memory use.

I just compiled Tor's working patch which actually eliminates most of 
the compose sequences and it is amazing in the way it simplifies the work.
I think it is the way to go once the small issues are resolved.

Simos

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-03 Thread Tor Lillqvist
> GDK_dead_circumflex, GDK_C, 0, 0, 0, 0x0108, /* 
> LATIN_CAPITAL_LETTER_C_WITH_CIRCUMFLEX */
> [...]
> GDK_dead_circumflex, GDK_c, 0, 0, 0, 0x0109, /* 
> LATIN_SMALL_LETTER_C_WITH_CIRCUMFLEX */
> [...]

The sequences you list are exactly of the straightforward kind that in
my opinion can and should be handled algorithmically. I.e. a "dead"
accent followed by a letter can be mapped to the corresponding
precomposed character without an explicit table. I have a patch in bug
#321896 that implements such an algorithm (and which would handle your
cases, too.) Basically it's waiting for a second opinion from the GTK+
maintainers.

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Static compose table in gtkimcontextsimple.c

2007-12-03 Thread Simos Xenitellis

On Mon, 2007-12-03 at 17:08 +, Paul LeoNerd Evans wrote:
> I notice there's a large table of compose sequences in
> gtkimcontextsimple.c. Is there any particular logic to the exact
> sequences listed here, or would it be acceptable to add some more?

The table should be in sync with the one from Xorg, 
/usr/share/X11/locale/en_US.UTF-8/Compose

There is a bug report on this, 
"Synch gdkkeysyms.h/gtkimcontextsimple.c with X.org 6.9/7.0"
http://bugzilla.gnome.org/show_bug.cgi?id=321896

> I'd quite like to have some mappings of Esperanto characters added;
> namely:
> 
> GDK_dead_circumflex, GDK_C, 0, 0, 0, 0x0108, /* 
> LATIN_CAPITAL_LETTER_C_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_G, 0, 0, 0, 0x011D, /* 
> LATIN_CAPITAL_LETTER_G_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_H, 0, 0, 0, 0x0124, /* 
> LATIN_CAPITAL_LETTER_H_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_J, 0, 0, 0, 0x0134, /* 
> LATIN_CAPITAL_LETTER_J_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_S, 0, 0, 0, 0x015C, /* 
> LATIN_CAPITAL_LETTER_S_WITH_CIRCUMFLEX */
> 
> GDK_dead_circumflex, GDK_c, 0, 0, 0, 0x0109, /* 
> LATIN_SMALL_LETTER_C_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_g, 0, 0, 0, 0x011D, /* 
> LATIN_SMALL_LETTER_G_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_h, 0, 0, 0, 0x0125, /* 
> LATIN_SMALL_LETTER_H_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_j, 0, 0, 0, 0x0135, /* 
> LATIN_SMALL_LETTER_J_WITH_CIRCUMFLEX */
> GDK_dead_circumflex, GDK_s, 0, 0, 0, 0x015D, /* 
> LATIN_SMALL_LETTER_S_WITH_CIRCUMFLEX */
> 
> GDK_dead_caron,  GDK_U, 0, 0, 0, 0x01D3, /* 
> LATIN_CAPITAL_LETTER_U_WITH_CARON */
> 
> GDK_dead_caron,  GDK_u, 0, 0, 0, 0x01D4, /* 
> LATIN_SMALL_LETTER_U_WITH_CARON */
> 
> Should I submit a patch?

A quick glance at the compose file of Xorg shows that these sequences exist 
there which is good.

If you would like to help with bug 321896 it would be great. The current state 
is on how to make the table much smaller, even with the addition of more 
keysyms. There is a script that converts en_US.UTF-8/Compose into a series of 
arrays that should be easy for GTK+ to work on. 
Regarding Greek polytonic there is an optimisation suggested by Tor to reduce 
the sequences (current about 1000 sequences out of 5000).

Simos


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Static compose table in gtkimcontextsimple.c

2007-12-03 Thread Paul LeoNerd Evans
I notice there's a large table of compose sequences in
gtkimcontextsimple.c. Is there any particular logic to the exact
sequences listed here, or would it be acceptable to add some more?

I'd quite like to have some mappings of Esperanto characters added;
namely:

GDK_dead_circumflex, GDK_C, 0, 0, 0, 0x0108, /* 
LATIN_CAPITAL_LETTER_C_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_G, 0, 0, 0, 0x011D, /* 
LATIN_CAPITAL_LETTER_G_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_H, 0, 0, 0, 0x0124, /* 
LATIN_CAPITAL_LETTER_H_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_J, 0, 0, 0, 0x0134, /* 
LATIN_CAPITAL_LETTER_J_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_S, 0, 0, 0, 0x015C, /* 
LATIN_CAPITAL_LETTER_S_WITH_CIRCUMFLEX */

GDK_dead_circumflex, GDK_c, 0, 0, 0, 0x0109, /* 
LATIN_SMALL_LETTER_C_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_g, 0, 0, 0, 0x011D, /* 
LATIN_SMALL_LETTER_G_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_h, 0, 0, 0, 0x0125, /* 
LATIN_SMALL_LETTER_H_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_j, 0, 0, 0, 0x0135, /* 
LATIN_SMALL_LETTER_J_WITH_CIRCUMFLEX */
GDK_dead_circumflex, GDK_s, 0, 0, 0, 0x015D, /* 
LATIN_SMALL_LETTER_S_WITH_CIRCUMFLEX */

GDK_dead_caron,  GDK_U, 0, 0, 0, 0x01D3, /* 
LATIN_CAPITAL_LETTER_U_WITH_CARON */

GDK_dead_caron,  GDK_u, 0, 0, 0, 0x01D4, /* LATIN_SMALL_LETTER_U_WITH_CARON 
*/

Should I submit a patch?

-- 
Paul "LeoNerd" Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list