Re: [l10n-dev] gettext again

Danilo Šegan Mon, 05 Sep 2005 09:46:38 -0700

Hi Eike,

Today at 16:37, Eike Rathke wrote:


> Also OOo string resources already may have comment info: the language
> "x-comment" is reserved for adding comments to string resource entries.
> However, in first place it has to be used by the developers who
> add/modify the resource strings, and second be supported by tools that
> extract the strings and/or convert them to .po or other translation
> systems.

I am really used to the level of support translators get in Gnome:
when we have unclear message, we report it as a bug, and it either
gets reworded, commented, or both, by module maintainer.

It's nice to know that OOo already supports that in one way or
another, of course.

>> The big question is: how much effort would be needed to port OOo to
>> use gettext?
>
> An even bigger question is: would gettext be able to handle it? How does
> it scale? Would it be worth the effort?

It's used for Gnome (30+k translatable messages for the core, over
80k with a bunch of apps like Gimp, Gnumeric,...), KDE (I think it's
around the same for the core, grows again to over 80k with apps),
XFCE and many other applications and environments.

FWIW, Gnumeric (5-6k messages) is a hell of a lot faster and
responsive (personal feeling, not a real measure) than OO Calc on my
celeron2.3ghz, 636mb ram system, so I can be positive that gettext is
not going to be a bottleneck.  And yeah, it also combines some
translatable messages from other sources, such as Gtk+ stock items
(menus and buttons), libgnomeprint dialogs and stuff, etc.

I can also describe some of the implementation details, like MO files
being mmap()-able, containing alphabetically sorted strings (allowing
O(log N) search), but also having a hash table (allowing O(1) string
matching, alas, this is undocumented implementation detail on GNU
systems; I don't know about Solaris or other gettext implementations).

Yes, in my opinion, it would be worth the effort.

> If it still applies, the first obstacle that always came to my mind with
> any gettext implementation: the original string is in the source code.
> If that needs to be changed you need to recompile and link. _And_ you
> have to change the string in the corresponding gettext resource.

Yes, original strings are in the source code.  But if you wish, you
can treat them still as simply "keys" in cases where you don't want to
change a string in the source code (such as string freeze periods):
you can provide "en_US" translation or something like that to
introduce typo fixes, etc.

GNU gettext tools provide excellent programs to handle "corresponding
gettext resource" (i.e. MO and PO files), so you get automatic fuzzy
matching, translation reuse, and it's all done automatically without
programmers caring too much.


Also, I don't really see the value of this argument, since with
current OOo system, AFAIU, both original and translated strings end up
in the "source code".  Having only original ones there is clearly
better, no?


PO files and gettext performance allow us to set up statistics systems
updated every couple of hours DIRECTLY FROM CVS/SVN such as:

  http://l10n-status.gnome.org/
  http://i18n.kde.org/stats/gui/stable/index.php

Why don't we get those for OOo? (these are one of the most valuable
things translators have when working with PO files: it's not really
about PO files, but about how simple it is to create such web pages
working almost real-time: yeah, we are currently testing another
version for Gnome where updates will happen on cvs commits instead!)

Note that both of these are larger code bases than OOo, they are
regenerated with simple fuzzy matching algorithm from GNU msgmerge [so
it's not the fastest available method], and it takes two or three
hours for complete Gnome pages to be regenerated for ~100 languages.

> In the past we already had quite some discussion about gettext and came
> to no conclusion where it was feasible to switch to gettext, did
> anything change in that system?

Depending on what you consider feasible.  I understand that there is
some value to having all strings in resource files, but it inevitably
leads to many problems for both programmers and translators (did it
ever happen for a programmer to display a wrong string? or translator
to match a wrong translation with it since original has been updated?)

Of course, to be serious, I am definitely biased toward the format (I
have my own implementations of MO file parsers for PHP, C#, even Perl
for intltool), but only because I find it so natural as both a
programmer and translator!

> Nowadays where tools like oo2po exist I wouldn't say that switching to
> a gettext based approach would be necessary from this view. However,
> there are some features of gettext that are worth taking a look at, like
> language dependant plurals, but much more important for lowering the
> barrier would it be to separate all localization effort from the source
> tree, and not having to run a build in the entire source tree just to
> (re)assemble some strings and bitmaps.. this _could_ be done using
> gettext, but for text only, and at what cost?

As I said, I can't judge the cost, because I don't know what it would
take for OOo to switch.  I only want to add that the only cost there
will be is the cost of porting the code over, and there won't be any
penalty in performance or value, IMHO.

> People tend to only see the mere translation phase and simplicity of
> language packs regarding text only, and in these of course using gettext
> is much easier. For handling localized icons and such you'd still need
> another system, or did that change? 

No, gettext is a text handling locale system.  There is nothing in it
to support other kind of data, but the problem it solves, it solves so
well, that I don't see this as a counter argument.  I mean, it doesn't
either handle localised sounds or videos, but does OOo currently
handle that?  Or localised document templates *without* another system?

> And, at least years back when I took
> a look at it, gettext to me left an impression of a glued-together bunch
> of something working only under specific circumstances. This may have
> changed of course.

Ugh, do you want to hear my impression of the OOo system? I.e. "make
it harder for everybody" system, and which even then "hardly" works?
Number of assumptions in OOo translation system is much higher than
the number of them in gettext: you assume (i.e. compile-time setting)
MO file base path and "domain" name: two options hard-coded total (and
you can override both with documented LD_PRELOAD mechanism!).
Everything else is there for programmers and translators to play with.

I have translated north of 50k "messages" (ranging from single-word
entries, to complete paragraphs), I have written hundreds of KLOCS in
tens of languages, and I have found gettext format to be nicest to me
as a programmer, and nicest to me as a translator.  Of course, this is
just a personal opinion, and it may be only because I never tried a
better "way".  Indeed, even if I am not reluctant to learn a new way,
I have found OOo localisation much harder.


After all, Solaris also has a full gettext implementation , allowing
such experiments as those Tim Foster is doing using LD_PRELOAD
mechanism (available on all GNU systems as well).  And I think Tim
would probably recommend XLIFF over PO file for doing 
translations, but he still makes great use of gettext library!  Just
check out his blog about it:

  
http://blogs.sun.com/roller/page/timf/Weblog?catname=%2FTranslation%2C+language+and+tools

Now, I am not saying gettext is perfect: far from it.  It's only
better :)

Cheers,
Danilo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] gettext again

Reply via email to