On Jul 18, 2009, at 5:06 PM, Hannes Magnusson wrote:
On Sat, Jul 18, 2009 at 12:44, pedram
salehpoor<[email protected]> wrote:
Hi
I wanted to know how the changes for PO files are progressing and
if there
is anything that I can do to ease the change?
There was a recent discussion about this recently.. It seemed that
people had great fears of needing to read over every single snippet
(thousands, probably hundreds of thousands) and verify their
correctness.
I didn't sense this fear, or likely ignored it. The conversion default
is fuzzy but we can easily change that.
A larger fear is lost content because the conversion is not perfect. A
rough example (which could easily be way off) shows Japanese at 78%
translated via PO files post test conversion... so about a 20% loss.
Assuming this is correct, it's a real problem but is a one time deal
and likely could be better with improved conversion methods. If
something is moderately up to date and follows the same structure as
en/ (like, the same number of <para>'s), then it should
[theoretically] convert fine. This conversion deserves better testing/
debugging.
I don't know what the exact situation is, but I think we should
probably cast a vote on this.. do people want to keep using Docbook
XML for translations (and continue with all the problems that has;
broken builds being the most annoying part) or o people want to switch
to po (with the biggest disadvantage of being non-contextual)?
How about more time before choosing any method. Yesterday I chatted
with the transifex folks, and the possibility of them taking charge of
this design came up. They live and breath this sort of thing so it
seems natural. However, their service aspect is a business so they
seek some sponsorship to properly promise dedicated time towards this.
What do people think about this idea? It does not appear we have
people in-house who want to lead such a charge, and I'm certainly not
ideal for designing this. And now thanks to Nilgun, current DocBook
translations with SVN seem to be working fine now so we're not in a
huge rush. I don't think this discussion should stop translators from
translating today.
If we vote for po, I would recommend to try scripting the conversion
to mark "up2date translations" as OK - i.e. error on "OK" rather then
"fuzzy" to ease the pain of needing to sanitycheck way to much text.
Sounds reasonable.
Do be honest, I don't really understand how the build process will
work with .po files. Will the "core files" (english) be automatically
generated? Will those files be in SVN? Does PhD need changes? ...
Building EN will not change, but building translations is a different
story. I believe it goes like:
foreach (english docbook file as enfile) {
if (po file exists for enfile) {
build_file = turn_po_into_docbook(enfile);
} else {
build_file = enfile;
}
use_this_for_build(build_file);
}
Where turning a PO file into DocBook is our main change, and is done
by external tools (like po4a or po2xml). Of course there are other
considerations like dealing with entities but the above is a
simplified flow.
I'm unsure how exactly POT files come into play here, which are
basically English only PO files (templates). They are most useful for
starting a new translation for a file or determining if translations
are outdated (en/ strings are compared).
But as we progress I reckon we'll figure out these finer details
because I imagine it'll be important for us to track which files
changed since the last build, so we won't have to convert every PO
file to XML on every run. I don't know if we want updated POT files in
SVN because that's a pain but maybe we do. We could explore magic
where these POT commits are automagically done for us, but that seems
odd. And add QA checks that do full uncached builds about every week.
I imagine us avoiding too much magic.
As far as I have gathered there is some work going on by a 3rd party
called "transferex" (or something similar) that offer web based system
for translation work.. Currently phpdoc is larger then their system is
capable of - but there exists a good chunk of desktop applications
that are used by others to translate these files...
A few systems come to mind and are being tested, which are:
- Pootle : Offers online editing, and various statistics
--- http://translate.php.net/
--- For the most part working now (kudos to Michael) but it seems buggy
- Transifex : Offers various statistics, and collaboration with other
projects
--- http://www.transifex.net/projects/php/
--- Will also offer online editing soon
- Our online editor (beta): Knows our docbook files
--- http://doc.php.net/editor
--- Can add po related tools in the future
Transifex is hosted offsite and essentially a place that many projects
gather. Unfortunately we are too large for them today but they are
looking into it. Transifex is also Open Source software that we could
host ourselves but I think it's better that we live on their server
and live with other projects there (which hopefully means additional
translators). I consider transifex.net to be an optional addition to
our translation process, and certainly not a requirement for anyone to
use. We may or may not allow transifex to commit to our repository on
behalf of translators there.
Whatever the path, I'm hopeful we can make useful translation related
tools available to us. This includes TM (translation memory), CAT
(computer assisted translation), and other such tools.
Also, translators will choose to edit online or download/translate/
commit PO files themselves, depending on the situation or desire.
I am not a translator, and have never really looked into gettext and
po in any serious way, so I really lack the experience on the topic.
I, and without a doubt Philip too (who has been looking into this the
most), would greatly appreciate feedback from all translators here
(especially people like Masahiro and Nilgün!)...
This is true, and all thought and feedback by people is welcome and
needed.
Regards,
Philip