About multilingual models

Marc Garcia Mon, 17 Aug 2009 11:54:39 -0700

Hi folks,

finally I had no time to start coding on multilingual models, as part of 
my GSoC project. I did some more analysis on the problem, and possible 
solutions; let me share them with you.


Basically, I arrived to the conclusion that there are two different 
approaches, both valid, and everyone more suitable depending on the 
website. Let me name these methods "model based" and "gettext like".

Summarizing, the model based idea is two define in every model the 
structure for translating necessary fields. The gettext like method 
would implement a catalog, and the translations would be decoupled from 
the models.

Let's explain both methods in more detail:

model based method
---------------------------------

This method is specially interesting in websites where all translations 
are provided at the same time. The idea is that doesn't exist a main 
language, and we don't want to show another language if the string 
doesn't have a value for current language. Imagine you have a virtual 
shop build in Django, and you sell products to the US and China. I don't 
think it's useful displaying Chinese texts to Americans, or English 
texts to Chinese users. Person inputting data on Django probably will 
have product name and description in both languages in paper, Excel... 
or any other media, so it makes more sense filling all data (in all 
languages) at the same time, than entering the product in English, and 
then translate in another place.

In this case the admin should allow filling all translations at the same 
time, and if a field is required, it should be required for all languages.

In this case I would specify this syntax to let Django know that we want 
this field translated:

class MyModel(models.Model):
    my_i18n_field = models.CharField(max_length=32, translate=True)

Main advantage of this method is that we have the translate property 
together with the field definition. This makes easy to know if a field 
will be translated or not after coding the models.

 From the database point of view I would create an extra table for every 
model, with next structure:

* id
* main_table_id
* language_code
* field1
* field2
* ...

So, to get data would be necessary to join both tables filtering by 
current language code. That would make easy to filter, sort or search by 
any of the translated fields.

gettext like method
-------------------------------

This method would be more suitable for websites where we provide a 
content in one language, and then, we want to offer this content in as 
much languages as possible. Imagine a kind of wiki. We write articles in 
English, and then we allow users, or we hire translators, to make this 
articles available in other languages.

In this case we pretty much emulate the way gettext works. We provide 
the content in the main language (on the admin for example), and then 
translators access those contents to provide translators. In some cases 
it won't be strictly like in gettext, where you usually don't care much 
what the text is used for. It would be great having the ability to 
provide a link on every article saying "translate it to your language" 
if it's not.

While the other method would also work for marking fields as able to be 
translated, in this case I would choose something more decoupled from 
models. I would use a syntax more close to the admin one. Just 
specifying outside the models, which ones we want to translate, and 
which fields. Main advantage of this syntax is that we can translate 
fields from existing applications without modifying them.

class MyModelTranslation(multilingual.Translation):
    translate = ('my_i18n_field',)

multilingual.register(MyModel, MyModelTranslation)

A database structure to support this functionality could be just having 
a table named "catalog" where all translations are set. It would be like 
a .po file:
 * language_code
 * msgid
 * msgstr

also it would be interesting to provide information about the places 
where this string is located:

* msgid
* model/field/id

There are two important problems with this structure. First one is that 
filter, sort by translatable fields will be almost impossible. Searching 
would be possible (but slow). Second problem is that we would have to 
store all values as strings, or just allow translating strings, because 
same field would be used to store all translations on the system.

Main advantage of this method is that is quite easy to decouple the 
whole translation engine from Django. Modifying an existing application 
to allow translating database content could be set up in minutes, 
without modifying the existing code.

----------------------------------------

These are my thoughts about that. Both ideas still need more discussion 
and improvements.

Regards,
  Marc

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django I18N" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/django-i18n?hl=en
-~----------~----~----~----~------~----~------~--~---

About multilingual models

Reply via email to