[GENERAL] Initial ugly reverse-translator

Craig Ringer Sat, 19 Apr 2008 07:53:15 -0700

Hi all

I've chucked together a quick and very ugly script to read the .po filesfrom the backend and produce a simple database to map translations backto the original strings and their source locations. It's a very dirty.po reader that doesn't try to parse the format properly, but it doesthe job. There's no search interface yet, this is just intended to getto the point where useful queries can be run on the data and the mosteffective queries can be figured out.

Right now queries against errors without format-string substitutionswork ok, if not great, with pg_tgrm based lookups, eg:


test=# SELECT message_id, is_format, message, translation

test-# FROM po_translation INNER JOIN po_message ONpo_translation.message_id = po_message.id INNER JOINtest-# WHERE 'el valor de array debe comenzar con «{» o información dedimensión' % translationtest-# ORDER BY similarity('el valor de array debe comenzar con «{» oinformación de dimensión', translation) desc;

message_id | is_format |message | translation

------------+-----------+------------------------------------------------------------+---------------------------------------------------------------------

4470 | f | array value must start with \"{\" or dimensioninformation | el valor de array debe comenzar con «{» o información dedimensión"4437 | f | argument must be empty or one-dimensionalarray | el argumento debe ser vacío o un array unidimensional"

(2 rows)

test=# SELECT DISTINCT srcfile, srcline FROM po_location WHEREmessage_id = 4437;

                          srcfile                           | srcline
-------------------------------------------------------------+---------
/a/pgsql/HEAD/pgtst/src/backend/utils/adt/array_userfuncs.c |     121
utils/adt/array_userfuncs.c                                 |      99
utils/adt/array_userfuncs.c                                 |     121
utils/adt/array_userfuncs.c                                 |     124
(4 rows)

It's also useful for format-string based messages, but more thought isneeded on how best to handle them. A LIKE query using the format-stringmessage as the pattern (after converting the pattern syntax to SQLstyle) would be (a) slow and (b) very sensitive to formatting and othervariation. I haven't spent any time on that bit yet, but if anybody hasany ideas I'd be glad to hear them.


Anyway, the initial version of the script can be found at:

http://www.postnewspapers.com.au/~craig/poread.py

Consider running it in a new database as it's extremely poorly tested,written very quickly and dirtily, and contains DDL commands. The schemacan be found inline in the script. The psycopg2 Python module isrequired, and the pg_tgrm contrib module must be loaded in the databaseyou use the script with.

Once I'm happy with the queries for translation lookups I'll bangtogether a quick web interface for the script and clean it up. At thatpoint it might start being useful to people here.


--
Craig Ringer

--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Initial ugly reverse-translator

Reply via email to