Re: Merging translations from Ubuntu, keeping fuzzy strings

2012-05-13 Thread Chusslove Illich
 [: Åsmund Skjæveland :]
 I've received some Ubuntu translations and I'm wondering how to best merge
 them. The up-to-date translated strings look fine, but fuzzy strings in
 the gnome PO file are untranslated in the Ubuntu PO file, and also in the
 merged PO file. [...] Is there some merging trick that preserves fuzzies?

Maybe this would be sufficient:

  $ msgmerge ubuntu.po gnome.pot -C gnome.po --previous -o updated.po

In case both Ubuntu PO and Gnome PO have a message translated but
differently so, Ubuntu translation will be taken. Also the header will be
taken from Ubuntu PO. If gnome.pot is not available, gnome.po can be
repeated in its place providing all obsolete messages are removed from it.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Mistakes in doc translations

2012-04-19 Thread Chusslove Illich
 the build (less common with
 recent itstool improvements), then my choices are to fix them or to
 disable the translation. I'm not in the habit of building daily, so if I
 have to disable a translation, it probably means it's disabled in the
 release. And that would be a real shame.

The other way to look at this problem (if the answer to my initial question
is no) is how to efficiently find the problems and notify translators
about them, so that necessity for fixing/disabling from maintainer's side is
minimized.

For large projects such as Gnome, I thought it would be feasible to have and
maintain a project-specific verification and notification tool. It would
know types of PO files in the project (e.g. that one is for C code, that one
is for Mallard doc) and check all technical issues one can think of, above
msgfmt -c, down to particular messages in particular PO files which have
special constraints on translation.

So, within Pology, I made a couple of project-specific checkers, one of
which is for KDE translations. For example, it knows which PO files are
extracted from Docbook, and checks not only XML well-formedness in
translation, but also whether element names are known (cannot do full
validation for obvious reason), so that the translator is somewhat covered
even if he improves a bit on markup of the original. It can be run
standalone from Pology installation, and it is also run on servers to
produce such a weekly report:

http://l10n.kde.org/check-kde-tp-results/trunk/

Here's the punchline: consider the amount of errors reported here, together
with the fact that the amount of false positives is exactly zero (hard
requirement on the checker).

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Mistakes in doc translations

2012-04-18 Thread Chusslove Illich
 [: Shaun McCance :]
 The answer is plainly yes, if you use version control correctly. PO files
 might have some characteristics that make some things harder, but they're
 not so special that they're outside the realm of git.

But PO files are the furthest outwards in the realm of Git (version control
in general). I'm looking for ways to close them in.

 PO files are more line-oriented than XML files. Will you get diff noise
 from rewraps? Sure.

Documentation XML files may be slightly more special than program code, but
for the single reason you mention, text wrapping. And I've heard that
powerful diff tools that can work around it (Emacs I think). Also, I
personally never word-wrap text in XML files, so in my uses XML files are
exactly same as source code.

Wrapping in PO files causes much more noise because most translators use
dedicated PO editors, which usually rewrap all messages when saving a PO
file; there can be almost total line-level diff for one actual message
changed. Then, there are unfuzzied messages, where half a message becomes a
diff, even if even one word was changed. There are source reference
comments, which change in all subsequent messages when source lines in front
are moved. There is ordering of messages, which can change either due to
source perturbations or messages being obsoleted and shifted to end.

Here is a typical scenario. Translator works for some time on a PO file
obtained from somewhere (from repository incl. intltool-update, from DL),
and completes the translation. Some time afterwards, that PO file is
received by the committer (through email, through DL). The received PO file
is now arbitrarily different from the PO file in the repository, with the
baseline unknown. What is the committer supposed to do? If he doesn't want
to review the translation, he will just copy the received PO over the
current repository PO, run intltool-update and msgfmt -c, and commit. Here
maintainer's fix will be lost outright. If the committer does want to
review, he may run intltool-update over repository PO and over received PO
and diff that (or something to that effect, e. g. rely on DL). Here, given a
lot of garbage in line-level diff, it will require good concentration not to
miss maintainer's fix -- how many committers do this regularly?

With code (or documentation XML) the diff is much more meaningful and the
baseline is normally known, so the version control system (or a standalone
tool) can perform an effective 3-way merge and automatically bring up the
real conflicts. Something in the spirit of this would be needed for truly
non-locking PO workflow. But it would not be sufficient on its own:

 About a dozen people regularly commit to the same Mallard page files in
 gnome-user-docs. Not a single one of the files belongs to only one person.
 I regularly commit to files written by someone else. It does work, as long
 as you use version control correctly.

What is the difference between what is done for Mallard page files and what
programmers do with the code? By looking through Git log, I dont's see any.
For PO files, it goes like this.

By far the most frequent modification to PO files is translation update
after merging. This update will usually happen sometime near to release. For
n PO files and m active translators, most of the n * m file-translator
combinations are viable. If two translators update the same file at the same
time, there will be a lot of conflicts. These conflicts will be such that
one translator's work will simply have to be discarded. The net result is
that translators practically never rely on version control for work
synchronization, but almost always establish some sort of locking workflow
on the organizational level. This can be informal, e.g. through who will
now update what on a mailing list, or more formal, e.g. through web
assignment interfaces (like DL's reservations).

With code it is much rearer that two same persons will work on the same code
at the same time. They may work on the same file, but at different parts of
it. For n source files and m active programmers, only a small subset of
n * m file-programmer combinations is viable. The result is that clean
merges are possible most of the time, very little work is lost due to
overlapping, and hence version control can be relied upon for work
synchronization. Organizational locking is extremely rare.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Mistakes in doc translations

2012-04-17 Thread Chusslove Illich
 [: bruno :]
 Translators, proofreaders and commiters do hopefully not have to be in
 computer science domain. I am not a programmer and not used to git. If i
 have to read the git manual before committing, it would be very
 discouraging.

Source version control (with Git or any other tool) is here only
circumstantial to the actual issue, which has nothing to do with
programming. And this issue can be described with the following question:
can two people work on the same PO file at the same time, and if yes, how? I
consider this question as still being open.

If the answer is no, then authors should not touch PO files. They should
only disable PO files when invalid (whatever that means in the particular
context) and notify the translator in charge. For both of these actions,
means as automatic as possible should be sought. (And on translators' side,
locking mechanisms, either technical or organizational, should be
established.)

If the answer is yes, then the the how must be answered. It is not
sufficient to relegate how to standard version control procedures,
treating PO files as any other source file. Technically, because a PO file
is not pure source file, but half-derived half-source, and has only weak
line-level semantics; together this precludes line-level diffing, which is
the core of version control procedures. Organizationally, because true
source files are typically small and rarely under immediate interest of more
than one author, whereas many translators can meaningfully modify a given PO
file (unless it covers a topic which requires a specialized translator);
this makes PO file conflicts much more likely.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: itstool improvements

2012-03-29 Thread Chusslove Illich
 [: Shaun McCance :]
 So let's say we have this:

   pagep its:locNote=A different comment.../p
   pagep its:locNote=A common comment.../p
   notep its:locNote=A common comment.../p

 You'll see this:

   #. ## Message for page/p
   #. ## Message for note/p
   #.
   #. ## Comment for page/p
   #. A different comment
   #.
   #. ## Comment for page/p
   #. ## Comment for note/p
   #. A common comment

It would be nice if element path would be prefixed with something keyword-
like, so that tools too (e.g. validation) could more easily take it into
account. Also I don't see need for explictly marking the comment itself, or
separating them with empty lines. So, it would come out as e.g:

  #. tag-path: page/p
  #. tag-path: note/p
  #. A different comment
  #. A common comment

The ordering (interleaving) should not really matter. Where it would matter,
it would be a clear sign that messages need to be separated (msgctxt). Which
would be done when first confused translator reports in.

-- 
Chusslove Illich (Часлав Илић)
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Diffing POT files

2012-03-22 Thread Chusslove Illich
 [: Chris Leonard :]
 Does anyone know of such a tool? It would ideally be aware of PO file
 structure to treat string subunits of a PO file as a single chunk as
 opposed to a simple *nix diff which would be line-by-line.

You could try Pology, http://pology.nedohodnik.net/. It contains an actual
PO diffing script, poediff, where what PO diff means is documented in the
user guide. But it will not produce what you described as the ideal output;
this is very simple, and can be produced by the attached script using
Pology library.

-- 
Chusslove Illich (Часлав Илић)
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import sys

from pology.catalog import Catalog


old_pot, new_pot = sys.argv[1:]

old_cat = Catalog(old_pot)
new_cat = Catalog(new_pot)

old_uniq_pot = old_cat.name + -unique.pot
old_uniq_cat = Catalog(old_uniq_pot, create=True, truncate=True)
for msg in old_cat:
if msg not in new_cat:
old_uniq_cat.add_last(msg)
old_uniq_cat.sync()

new_uniq_pot = new_cat.name + -unique.pot
new_uniq_cat = Catalog(new_uniq_pot, create=True, truncate=True)
for msg in new_cat:
if msg not in old_cat:
new_uniq_cat.add_last(msg)
new_uniq_cat.sync()

common_pot = old_cat.name + - + new_cat.name + -common.pot
common_cat = Catalog(common_pot, create=True, truncate=True)
for msg in old_cat:
if msg in new_cat:
common_cat.add_last(msg)
common_cat.sync()



signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Bad translations of key names

2012-03-15 Thread Chusslove Illich
 [: Bastien Nocera :]
 For example:
 msgctxt keyboard label
 msgid Page_Down
 msgstr Page_Down

 When we clearly mention to:
 - translate the strings
 - Remove the underscores from the key names

In spite of providing the comment, you are still breaking semantics of the
Gettext-based translation: msgid must contain that which the C/POSIX locale
user will see; if that is not sufficient as the message key, anything else
should be put in msgctxt. Hence, semantically proper messages could look
like this:

  msgctxt keyboard label: Page_Down
  msgid Page Down
  msgstr 

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: avoid 'fixes' in translations

2012-02-09 Thread Chusslove Illich
 [: Kenneth Nielsen :]
 I just want to say that I wholeheartedly agree with you in this matter.
 And actually, without being condescending, I don't really think that it is
 a matter of an opinion, but simply the only way that it makes sense.
 Authors _author_, translators _translate_ not _interpret_. I often make
 comments along the lines you mention above when proofreading the
 translations.

I cannot say I disagree with what you said, but I likely disagree with what
you (and Ryan) actually meant :) In the general claim, and especially in the
instance that triggered the general claim.

When properly analyzed, translation is seen to be a fluid term. In order
to underline this, to be able to discuss it without preconceptions, in some
literature they call the process recreation instead. It depends on many
things what method will be used to recreate the textual content from the
source language into the target language.

For software user interfaces and documentation, the only hard rule should be
not to change the fact through translation; the fact which was established
by the author of the text in the source language. If the fact was wrongly
established by the first author of the text, than all of your points for
changing this fact through translation apply (can change immediatelly before
release, make bug report, add translator comment, etc.) But if the stated
fact in the source language is true and that fact is preserved in the target
language, everything else is free game. How it will be handled, depends on
the linguistical, cultural, and whatever other context of the target
language and target environment.

(I will give one example when this free game should very much be
exercised, but I observed many times it's not: the please add context --
verb or noun? request. This is a bad context request. What the translator
needs to know is what the text means and in what place it is used, and not
what grammar form is employed. The translator should pick the appropriate
grammar form for the target language, for that meaning and that place of use
of the text. In my language, for example, we will frequently use a noun
where in English a verb was used; hence it is not directly relevent to me if
it is a noun or verb in English.)

Coming to the actual trigger. Various standards are one thing; specific
projects' policies are another thing; the fact is that in *English* software
user interfaces, the acronym MB and pronunciation /megabyte/ is still
frequently (if not predominantly) used to denominate 2^20 bytes, alongside
the acronym MiB and pronunciation /mebibyte/ for this, and SI MB-/megabyte/
for 10^6 bytes. The only obligation upon the translator is to find out what
the original text meant by MB; when that is established, there is no longer
any constraint on how to represent it in the target language. In particular,
it is wholy possible that a small language-environment will be a lot
snappier at rooting out the misuse of SI prefixes, then the vast English
will be. In short, in my opinion, the translator is wholy in the line by
translating an MB that represents 2^20 bytes as MiB.

Now comes the practical problem: what if the original text switches not MB
to MiB in the text, but MB to 10^6 bytes in the meaning, thus not fuzzying
any PO messages? The answer is very simple. The blame rests only on the
programmer, because he did something that no efficent translation workflow
can handle: the meaning -- the fact -- was changed while the text was not.
This is a breach of Gettext-based translation conventions. The breach would
have been avoided had the programmer done exactly what the bug reporter
requested. Given the known ambiguity of MB in English, *every* use of MB
(and other units of the sort) should be accompanied by a disambiguation
context (C_(MB = 10^6, ...)). If at one point the meaning is changed, the
text does not change but the context does, making the message fuzzy and
stating the new meaning.

A possible practical issue with this solution: what, we should add context
to every message with a byte amount??. In short, yes. If there is a lot of
messages with byte amounts, centralize formatting of byte amounts. So that
instead of *printf(_(... %.2f MB ...), nbytes*1e-6) one can write
*printf(_(... %s ...), bytes_to_mb(nbytes). This is a win for everyone:
programmers make sure they remain consistent with the policy, and can switch
it later on; translators get to translate only a few byte-unit messages,
properly equipped with contexts; users may even be allowed to choose whether
they want to see byte amounts in SI MB or MiB.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: PO files headers and DL

2012-01-02 Thread Chusslove Illich
 [: Daniel Mustieles García :]
 I've been talking with with Claude about the possibility to add a header
 (maybe «X-Location»?) to PO files (both GUI and doc ones) containing the
 folder in which the PO file is located [...]

 [: Kenneth Nielsen :]
 [...]
 On another subject remember that it is also the plan to at some point make
 it possible to commit translations directly from damned lies, in which
 case this problem will become a lot smaller, so that probably makes it
 even less likely that we should solve it by imposing location rules on a
 lot of projects.

This topic is a recurring motif, e.g:

http://lists.gnu.org/archive/html/bug-gnu-utils/2008-11/msg00030.html

which I argued to go beyond the needs of a particular translation project
and therefore deserve full-fledged support from xgettext/msgmerge:

http://sourceforge.net/mailarchive/message.php?msg_id=6099532

Failing that, I would make translation-side automatization (Damned Lies in
this case) introduce an X-Source-Root custom header field, with some sort of
URL to which the source references in the PO file are relative to. Where
possible, this would be a web-accessible URL, for example:

  X-Source-Root: http://git.gnome.org/browse/gnome-games/plain/po\n;

This would enable any kind of scripting which needs to connect to the
location of the sources while working on an isolated, out-of-tree PO file.
This includes automatic committing desired by Daniel, but it does not stop
there. For example, a PO editor could use it to show source on demand for a
given message, with a single shortcut, reliably and without any preparation.
Hence it would be useful regardless of whether translation-side
automatization can automatically commit or not; and it would not impose any
restrictions on source trees of particular projects.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Q3 GNOME quarterly report

2011-11-27 Thread Chusslove Illich
 [: Christian Kirbach :]
 [...] very little conclusions can be drawn from looking at the numbers of
 commits [...] Is it not just the plain number of words in strings
 translated that matter? Or less precise, the number of strings translated?

Notwithstanding that this particular thread is concerned with giving numbers
for the sake of giving numbers (which gives me creeps :), certainly very
little can be seen from number of commits. But, on the other hand,
meaningful metrics (number of words translated) is not straightforward to
define either. Given an older and a newer PO file, one must differ between
completely new messages, modified messages (what is a modified message?),
and messages to which only translation was modified (review modification).
For example, if a paragraph-length message got only one-two words changed in
original, it should not be counted as translated from scratch. Then, more
basically, how to define a word, given that messages contain various
constructive elements (formatting directives, accelerator markers, markup),
which are not words proper but do require attention and hence add to
translation effort.

At any rate, I've tried to define a measure of nominal number of newly
translated words (NNTW) between paired PO messages (and also how to pair
messages from two PO files). Algorithmic details are somewhat convolved, but
the general idea is that NNTW represents equivalent effort of translating
clean text from scratch. The tool which calculates NNTW is part of Pology (
http://pology.nedohodnik.net/ ), a PO diffing script run in special mode on
two PO files or directories of PO files:

  $ poediff -U foofile-older.po foofile-newer.po
  $ poediff -U foodir-older/ foodir-newer/

It can also run directly on a Git repository:

  $ cd gnome-games
  $ rev1=$(git rev-list -n 1 --before='2011-07-01' master)
  $ rev2=$(git rev-list -n 1 --before='2011-10-01' master)
  $ poediff -U -c git -r $rev1:$rev2 po/
  nominal newly translated words: 10030

How PO messages are paired can be checked in the PO diffing section of
Pology user manual, and how NNTW is computed on pairs in
cats_update_effort() function in poediffpatch.py file in Pology sources.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Improve translations in Mallard

2011-10-24 Thread Chusslove Illich
 [: Chusslove Illich :]
 While intltool does this too, it shouldn't be done. PO processing tools,
 be it PO editors or whatever else, have no obligation to consider and
 parse such comments.

 [: Shaun McCance :]
 Are there any processing tools that actually have a problem with this?

For one, Gettext tools which have -F option (sort messages by source
reference) do not work properly. Oddly, they do not crash or output any
warning, only output messages in wrong order.

But quite obviously, anything that actually expects to find a file:number
there, as stated in Gettext manual, can work properly only by fluke.

 [: Chusslove Illich :]
 The problem I see with that is that the tag path is no longer tied to the
 file that defined it. That's fine when strings only appear once, but how
 about this:

   #: C/some-topic.page:42
   #: C/another-topic.page:17
   #. tag-path: title/gui
   #. tag-path: td/p

 Which is which? Maybe it doesn't matter that much. I don't know. I'm not a
 translator. I'm just trying to make things easier.

In this particular case, I cannot see how it would matter which is which. In
general, if pairing is important, that is why source references are there in
the first place -- to go into the source and check for oneself.

(On a side note, if two tag paths with different final element (here .../gui
and .../p) have same text in English, it is not unlikely that they may need
different translation. This is why I am genereally in favor of Gil's request
to put tag paths into msgctxt instead.)

 On the other other hand, gettext defines and treats PO files in a way
 that's not really nice to third-party tools. I'd love to put some info in
 #, comments (e.g. marking a message transliteration-only), but the fact
 that it's a controlled vocabulary prevents me.

But there is a good reason why #, comments are a controlled vocabulary (so
to say). Firstly, processing tools need to know what different flags mean,
so it would be bad that each extraction tool can add its own arbitrary
flags. An arbitrary flag could later conflict a Gettext flag. Secondly,
semantics of flags is they are purely technical, telling something about the
structure of the message. Processing tools can use them to validate the
syntax or to recognize formatting (e.g. when collecting statistics).

 I'd rather not put more and more stuff in #. comments, because it's going
 to get in the way of actual comments to translators at some point.

A manual author-to-translator #. comment also contains a keyword,
TRANSLATORS: by default. Other types of #. comments can follow the same
convention. That way any arbitrary property of the message can be set. For
example:

  #. TRANSLATORS: Blah, blah, blah,
  #. blah, blah, blah, blah, blah.
  #. TAG-PATH: td/p
  #. TRANSLITERATE-ONLY

(On another side note, I don't think transliterate-only deserves such a
semi-formal treatment. Normally it should be entirely upon translators
whether to transliterate or do whatever else. If the author wants to make
that choice instead, that should be explained by an ordinary TRANSLATORS:
comment.)

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Improve translations in Mallard

2011-10-24 Thread Chusslove Illich
 [: Shaun McCance :]
 I don't currently auto-prefix comments with TRANSLATORS:. Should I? I
 mean, I assume you know the comment is addressed to you, and don't need to
 be shouted at for each message.

Well, a prefix is conventional... frequently it is also translators: or
Translators: instead of TRANSLATORS:, but it is always there (i.e. when POT
is produced by xgettext). If there wouldn't be anything else in #. comments,
then we could argue if this prefix is useful or not. But, given the rest of
this very discussion (tag path, transliterate-only mark...), I would say
definitely put it.

 [: Chusslove Illich :]
 [...] I don't think transliterate-only deserves such a semi-formal
 treatment.

 [: Shaun McCance :]
 The impetus for this is people's names. They're common in PO files for
 documents because we put our names in XML elements for credits. [...]

Right, people names. That is something translators for sure do not need a
transliterate-only mark/note/info, as they already have conventions for
people's names.

 But, for example, the German team might decide they never want to
 translate those, and they could run a script that just sets msgstr to
 msgid for messages tagged as transliterate-only.

There are two aspects here.

First, names are peanuts compared to the total text to translate.
Translation editors have a shortcut for copying over into translation the
whole text of the original. Hence, automating this is negligable efficiency
gain, for the trouble of yet another thingy in the workflow.

Second, what is needed on translators' side is to know that the text is a
person's name; whether on manual translation or automatic processing. This
is one piece of information that may make sense to provide (e.g. as in
Docbook which has an element for that), rather than providing information on
what to do with the text.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Translating labels with inserted widgets

2011-08-22 Thread Chusslove Illich
 [: Andre Klapper :]
 Some dialogs are based on an English sentence structure, consisting of
 several dropdown widgets.
 [...]
 I am not aware of a way to change the order of the dropdown widgets
 depending on the language/locale.

 A friend came up with this basic idea of a framework, however I wonder if
 anybody knows how other systems handle this problem?

No other systems handle this problem.

There are actually two distnict problems, and both were implied in the said
bug report. The first is the problem Priscilla reported, of widget and label
ordering. The second problem is grammar congruence between labels and
widgets, as Friedel noted.

The problem of widget and label ordering could be handled as follows. Each
widgeted sentence can be split into N widgets and N + 1 labels. One or
more labels might be empty in English, but all of them should be exposed for
translation, with proper comments and contexts. Additionally there should be
a meta-message per widget, asking the translator to indicate the widget to
fit at that position. These label and widget selection messages should be
interweaved in proper order. For example, if the widgeted sentence is:

  Repeat each [Monday|Tuesday|...] in [January|February...]

the PO file would contain:

  #. TRANSLATORS: Assembly of widgeted sentence
  #.   Repeat each [weekday] in [month]
  #. This is the label before the first widget.
  msgctxt ws-repeateach-label-1
  msgid Repeat each
  msgstr 

  #. TRANSLATORS: Assembly of widgeted sentence
  #.   Repeat each [weekday] in [month]
  #. This is the first widget, translate as 1 for [weekday] or 2 for [month].
  msgctxt ws-repeateach-widget-1
  msgid 1
  msgstr 

  #. TRANSLATORS: Assembly of widgeted sentence
  #.   Repeat each [weekday] in [month]
  #. This is the label between the first and the second widget.
  msgctxt ws-repeateach-label-2
  msgid in
  msgstr 

  #. TRANSLATORS: Assembly of widgeted sentence
  #.   Repeat each [weekday] in [month]
  #. This is the second widget, translate as 1 for [weekday] or 2 for [month].
  msgctxt ws-repeateach-widget-2
  msgid 2
  msgstr 

  #. TRANSLATORS: Assembly of widgeted sentence
  #.   Repeat each [weekday] in [month]
  #. This is the label after the second widget.
  msgctxt ws-repeateach-label-3
  msgid  
  msgstr 

It would be nicer if widget selection messages would ask for literal
[weekday] and [month] instead of numbers, but then unwary translators
would likely screw it up. In this way, there is little to screw up for the
unwary. The empty label message needs one space, or else it would record as
untranslated in statistics.

The problem of grammar congruence between widgets and labels is the harder
one. In the above example, in my language the first label (Repeat each)
would have to change as the first widget selection ([weekday]) changes,
because the pronoun each has to conform to gender of the week day (some
are masculine, other are feminine). To handle this generally for all
languages, heavy artillery is required. Here comes the shameless plug, for
which I'm writing this reply in the first place:

http://sourceforge.net/mailarchive/message.php?msg_id=27932895

Specifically pertinent to widgeted sentences is the bit about runtime
contexts.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Consistency of formatting directives

2011-07-27 Thread Chusslove Illich
Hi,

I used latest Gnome UI translations to examine how frequently translators
modify formatting directives. There are only about 300 modifications in all
languages taken together, but over 90% of them are likely unintentional
(typos or unfuzzying glitches), so I thought of reporting. The list of all
messages with modified formatting directives is attached. To check if any
messages from your language are in there and need correction, just search
for -yourlang-.

Note that some modifications lead to loss of information or
comprehensibility. For example, in this message the quantity is too rounded
to be of use:

  #. TRANSLATOR: This is pressure in atmospheres
  #: ../libgweather/weather.c:981
  #, c-format
  msgid %.3f atm
  msgstr %.1f atmosfera

and in this one the arguments were not reordered as was intended:

  #. Translators: ...
  #: ../src/ui/theme-parser.c:202
  #, c-format
  msgid No \%s\ attribute on element %s
  msgstr Elemendil %2s pole \%1s\ atribuuti

(Not subscribed, CC if necessary.)

-- 
Chusslove Illich (Часлав Илић)


modfmtdirs.out.gz
Description: GNU Zip compressed data


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Statistics for each GNOME translator's work

2010-07-20 Thread Chusslove Illich
:
  if msgstr_old != msgstr_new:
  num_eqwords = len(proper_words(msgid_new))
  else:
  num_eqwords = 1
  # Modified translation only.
  elif msgstr_old != msgstr_new:
  words_add = proper_words(word_ediff_to_add(msg.msgstr[0]))
  words_rem = proper_words(word_ediff_to_rem(msg.msgstr[0]))
  num_eqwords = max(len(words_add), len(words_rem))

  # No changed proper words in translation.
  if num_eqwords == 0:
  if msgid_old != msgid_new:
  segs_add = word_ediff_to_add(msg.msgid, sep=None)
  segs_rem = word_ediff_to_rem(msg.msgid, sep=None)
  num_eqwords = max(len(segs_add), len(segs_rem))
  else:
  # Some other change (could be in translator comments).
  num_eqwords = 1

  num_eqwords_total += num_eqwords

  return num_eqwords_total

Unfortunately, Pology too is not yet a released piece of software, and
poediff is a bit on the slow side.

-- 
Chusslove Illich (Часлав Илић)


signature.asc
Description: This is a digitally signed message part.
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Bug 569118 - Use C_() instead of Q_() with context (orca)

2009-01-29 Thread Chusslove Illich
 [: Leonardo F. Fontenelle :]
 This might be more to the point:

   for file in po/*.po; do
   sed -i s+msgid \\([^|]*\)|+msgctxt \\1\\nmsgid \+ $file
   done

One could also make use of the as-of-yet-unreleased Pology package:

  $ posieve normctxt-sep -ssep:'|' po/*.po

Pology can be obtained and readied for use by:

  $ svn co svn://anonsvn.kde.org/home/kde/trunk/l10n-support/pology
  $ export PATH=$PWD/pology/scripts:$PATH

The crux of the conversion is in sieve/normctxt_sep.py, method process (line
66). I think it's rather self-explanatory and tuneable for more cleverness
(e.g. take into account extracted comments and/or msgstr). The API to
message object can be found at pology/doc/html/index.html, module
pology.file.message, class Message_base.

-- 
Chusslove Illich (Часлав Илић)


pgp7ngj891had.pgp
Description: PGP signature
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: Bug 569118 - Use C_() instead of Q_() with context (orca)

2009-01-27 Thread Chusslove Illich
 [: Willie Walker :]
 The current Orca code does the equivalent of this:

In MO files contexts are separated by U+0004 from the message proper, so I
used something equivalent to this in a code I work on:

  def C_ (c, s):
  s = _(c + \x04 + s)
  p = s.find(\x04)
  if p = 0: # satisfied only if no translation found
  s = s[p + 1:]
  return s

-- 
Chusslove Illich (Часлав Илић)


pgp5eM7keb1WI.pgp
Description: PGP signature
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: switch from context to msgctxt?

2008-01-25 Thread Chusslove Illich
Hi Kenneth, folks,

 [: Kenneth Nielsen :]
 [...] I recently proofread a KDE po-file and discovered something
 disconcerting. In this file it _seemed_ that these contexts had been
 automatically generated and the context provided was simply the name of
 the kind of widget or GUI-element they were in, (they could off course
 also have been made by hand in this way, which makes it even more
 strange). [...] Obviously we want to make sure that none of our developers
 get this kind of idea. [...]

Those context markers were indeed added by hand, and there are ideas to do
it automatically where possible (e.g. in Qt's Designer XML, a counterpart to
Glade). As an original perpetrator of this crud :) I'll add a few pointers
for completness.

The guide for KDE programmers on this topic is here:

http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics

and the discussions which forged its current form:

http://lists.kde.org/?t=11777056302r=1w=2

http://lists.kde.org/?t=11820906764r=1w=2

An example of a POT almost fully equipped with context markers:

http://websvn.kde.org/*checkout*/trunk/l10n-kde4/templates/messages/kdebase/dolphin.pot?revision=HEAD

For the ~2,500 messages retrofitted with context markers in the meantime,
the duplication is as follows:

  due to context marker only: +7%
  due to context at all (i.e. non-empty when marker removed): +17%

Exploiting the --previous option of msgmerge, there is a script to unfuzzy
messages fuzzied only due to change/addition of msgctxt, or just of the
context marker within it, at translators leisure (possibly adding
unreviewed-context manual comment).

-- 
Chusslove Illich (Часлав Илић)


pgpLimakL0HYY.pgp
Description: PGP signature
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n


Re: l10n/i18n architecture proposal, RFC, presentation at FOSDEM

2007-02-25 Thread Chusslove Illich
 [: Axel Hecht :]
 While looking at our status quo at Mozilla, and looking at other
 attempts, I'm seeing limitations in both what we can do and what
 others can do, and came up with an alternative proposal [...]
 [...] a solution for plain strings of course, but plurals and
 declensions, too. [...]
 This proposal is done with Mozilla on my mind, but it is in no way
 limited to Mozilla [...]

I'm brewing a similar concept for some time, to be used in KDE4, albeit at 
a less ambitious scale: on top of existing gettext universe, with as 
little disturbance as possible, bolt a system which provides message 
arguments with the text itself, and a known scripting language to operate 
on them at runtime.

Here is a bit outdated proposal, but the concept is still the same:

http://caslav.gmxhome.de/writings/ktranscript.html

Section Using The Scripting System is Gettext-translator oriented, while 
others are more on KDE implementation (though, the Performance 
considerations might also be interesting). However, there is nothing KDE 
essential to the system, it could be made as an independent library (ie. 
only Gettext + scripting engine dependent) with language bindings.

In the meantime, as opposed to the document above, the scripting engine 
will likely switch from Guile to KJS, a JavaScript implementation already 
built into kdelibs. And the new KDE4 i18n API, which can support this 
system under the hood, can be seen at:

http://api.kde.org/cvs-api/kdelibs-apidocs/kdecore/html/classKLocalizedString.html

From programmer's point of view, the API is no more complicated than 
ordinary gettext (section General Usage); note that QStrings in the 
examples are run-of-the-mill Qt/KDE strings, they have nothing to do with 
the i18n system. The more complicated calls (section Specialized Usage) 
are present for handling some rare cases described therein.

Also note that there is no mention of translation scripting in the i18n API 
itself -- these are now sufficiently decoupled, so that any changes to 
translation-side system can be made without breaking binary compatibility 
of KDE library. KDE4 can thus field test the viability of these new, 
live I would call them, translation concepts.

 All of this is new enough to take Localization from version 1.0 to 2.0
 (yeah, I'm all web 2.0), so I took the freedom to codename this l20n.
 Pronounced l-twenty, I drop the 'n'.

Aren't we all in codenames -- I call mine Transcript :)

(On a side note, I too started with an idea replacing venerable PO format 
itself, but have been wisely suggested to go along a cohabitative path; 
this was my first take, late 2003: 
http://caslav.gmxhome.de/writings/cotras-intro.html )

-- 
Chusslove Illich (Часлав Илић)


pgpM4FOI1piwg.pgp
Description: PGP signature
___
gnome-i18n mailing list
gnome-i18n@gnome.org
http://mail.gnome.org/mailman/listinfo/gnome-i18n