Hi Marc, hi Gonzalo, hi django-dev,

I'm writing this email, because I need some kind of future proof model
translation in django in the near future. I tried many solutions and
came up with two of my own (not released anywhere), but nothing seems to
fit all needs translations might impose.

So I thought about how model translations in django could work, and how
this should be used by the developer. I send this email directly to
Marc, because he is currently working on some i18n-stuff (GSoC). Gonzalo
seems to be interested in an official solution
(http://groups.google.com/group/django-developers/msg/0be7de2c154aa49d?utoken=lffASzEAAABAWiol0apP9nFxQklB-_Jkc1-9iUSAb45YspdDeB6HhPWDe8H4lvYuxY4dyvcAHxqMKN2o9YZfQ4UFyks_AAUi),
so you might be interested in some discussion about this. Perhaps
Malcolm should be asked about how my proposal fits into the django ORM,
as he has (re)written most of the ORM (-> queryset-refactor).

I know Django 1.1 is on its way and developers might not have the time
to really thnink about this now, but I needed to write it down. If
someone answers and we can start working on this for post-1.1 would be
great, of course. I, personally, don't need to wait. Anyway, according
to the bugtracker theres not much delaying Django 1.1:
http://code.djangoproject.com/query?status=new&status=assigned&status=reopened&component=!Translations&component=!Documentation&milestone=1.1&order=priority

First to summarize the needs. It has to ...
a) be fast, no heavy database-load should be needed
b) work for third-party apps, as third party apps get unusable if you
need i18n and they don't support it (this also means the existing
third-party code must still work for translated models)
c) be transparent for the user, you shouldn't need to think where your
fields come from
d) support missing translations without skipping the whole database-row,
if you for example JOIN two tables (optional)
e) be searchable through the normal ORM, translated fields should not be
hidden in some serialized format
f) be extendable, new fields shouldn't be a big problem
g) convert existing data without a hassle (like sync_transmeta_db)
h) keep context, if I fetch a german model from the database all
relations should fetch german models, too (default-context should come
from request and/or settings)
i) be optional, not every project needs model translations, this is
especially important for third-party-apps, supporting translations
should not mean you have to use it
j) be generic, different people have different needs
k) support translating central fields (like slug), this goes
hand-in-hand with easy searchability of the translated fields
l) support getting all translations for every object in the database
m) integrate well into the admin

All existing solutions fail at some of these points, my own two
solutions failed many of these points, too. So first, lets take a look
at what exists:

1. Put every translation in extra fields inside one big table:
   I'm not a fan of this, tends to get messy, especially for big sites
with many translations. Developers need to think where their fields come
from (Book.objects.filter(title_de=...)). Does not support third party apps.

2. Put all fields into its own model, use normal ORM for access.
   I you look at this from a database perspective this is the best
solution. But the django ORM does not really support you here. Access to
the fields need an extra query, for every object. Even worse: If you use
some translated fields in your query, you have to do this extra query
anyway (Book.objects.filter(translations__slug='foobar',
translations__language='de'), to get the german translation you have to
execute an extra query, although the JOIN was already done).

3. Use serialization to save all fields into one big BLOB.
   Do I need to say anything here? Not searchable, no way to really work
with the translations on the DB-layer.

4. Special case: Create an own model for translations (like 2), but use
the original table for some kind of DEFAULT_LANGUAGE. (pluggable-model-i18n)
   I kind of like this, not only because you are able to enhance
third-party-apps. Independent from that I think translations.register()
seems to be a nice idea. But(!) this complicates things to much. You
have to choose whether you want a field out of the DEFAULT_LANGUAGE or
not for every field access/query.

5. gettext like approach.
   Save a translation for possible string in the database, fallback to
gettext. DB might explode here, as DB-load should be heavy, really heavy.

So, what to make of these ideas people put into their code here, all
projects have some advantage, while delivering some disadvantages. From
a DB-perspective the second solutions seems to be best, from the
usability-perspective the fourth solutions looks great. The first
solution really shown its advantages when you look at queries
(Book.objects.filter(name_en='this is easy')).

First I want to throw one of my own solutions in, as I think this is
interesting when looking at the admin. What I did was reversing the way
we currently look at solution two, meaning the translation itself gets
the model you work with and the rest is put into some "common" model:

class BookCommon(models.Model):
        some_field = models.CharField(...)

class Book(models.Model):
        common = models.ForeignKey(BookCommon)
        language = models.CharField(...)
        translated_field = ...

Whats interesting here are two things:
 * We can optimize DB-access using select_related():
   Book.objects.select_related('common')
   ...to we can cache the common attributes, great benefit
 * The admin lists every translation for every objects, common fields
need to be "copied" to BookForm. New translations can be created by just
changing the language and hit "save as".

Anyway, this solution did not work well, as relations between objects
get a mess now (you normally have a ForeignKey to/from the Common-Model,
so you are left with all problems of solution two when accessing related
objects). Additionally you cannot access objects that have no
translation in your current language, as
Book.objects.filter(language='de') filters out all rows that doesn't
have a translation. But I think this shows how the second solution can
be enhanced to get this - from a DB-perspective good solution - to be
more usable. It also is the solution that brought me to my current idea,
even if not directly.

So, we have this nice solution, that your database likes, but we have no
way of using it in a way we can benefit from it. Making your database
happy just isn't enough, but how could be use solution two the right
way, when just dealing with SQL? The answer is pretty easy, use a JOIN:
SELECT ... FROM book OUTER JOIN book_trans ON
(book.id=book_trans.book_id AND book_trans.language='en') WHERE ...

Let's explain this, most of it is quite obvious. But why the language
inside the JOIN, could we not just use WHERE language='en'? Thats true,
if we want to force our result to have a translation in the selected
language. If you don't need this (for example because the fields are
optional, the the whole translation is optional) you cannot use WHERE.
But you can actually just change the query above to use a LEFT INNER JOIN:
SELECT ... FROM book LEFT INNER JOIN book_trans ON
(book.id=book_trans.book_id AND book_trans.language='en') WHERE ...
Not you get a result, even if there is no row for the translation in
book_trans. I think this is something model translations needs to
support, solution two kind of supports this, too.

Now lets digg into what pluggable-model-i18n does, as we want to support
third-party apps, right? Why not just _remove_ the fields from the
original model and dynamically create a new translation-model? This way
you could change your database-layout if and only if needed, while
keeping defining models easy. I think of some kind of
translation.register() like pluggable-model-i18n uses. One example:

class SomeObj(models.Model):
        foo = models.CharField(...)
        bar = models.CharField(...)
class SomeObjTranslation(translation.ModelTranslation):
        class Meta:
                fields = ('foo',)
translation.register(SomeObj, SomeObjTranslation)

could be converted to:
class SomeObj(models.Model):
        bar = models.CharField(...)
class SomeObjTranslation(models.Model):
        language = models.CharField(max_length=5, \
                choices=settings.LANGUAGES, \
                default=settings.LANGUAGE_CODE)
        object = models.ForeignKey(SomeObj, related_name='translations')
        foo = models.CharField(...)
        class Meta:
                # only one translation per language and object
                unique_together = (('language', 'object'),)

This was you can change third-party apps without needing the developers
to even _care_ about translations, while still keeping the news models
pretty easy. Of course you have to provide some kind of convert-script
that manages to copy all values from the old table to the new one, but
that should not be a big problem.

No lets look at what usage should look like. As we want third-party apps
to keep working no fields should be "renamed" ('field' ->
'translations__field'). In addition I think a query normally should get
one translation of your object, this could be some DEFAULT_LANGUAGE or
the request language by default. The model-object itself needs to allow
transparent access to the fields, too. As obj.title will be translated
you could say, your object represents one language at a time. I would
suggest adding obj.switch_language('en') to load and transparently
replace alle attributes (while keeping the old ones in some cache).
Saving the objects must of course save all translations, too.

Now you might notice, that this sounds familiar. Yes, model inheritance
works similar. We have fields, that transparently are rewritten to the
right table on queries, save() UPDATEs many tables and attributes just
live in one object (obj.parent_field, instead of
obj.parent.parent_field). If you look at this right, you will see, that
the proposed translations are something like models using "reverse
inheritance", meaning behavior is like with inheritance, but the
semantics are reverse. The biggest difference is the changed JOIN, but
django should provide for most of the technics for this, even they need
to be enhanced.

So, what about the other stuff django model translation should provide:
a) fast: Only one JOIN involved, as you only need one language most of
the time. Otherwise its like solution two.
b) third-party-apps: Work like a charm, no fields changed and - because
of some default language in the query - only return objects that are in
the current site language.
c) transparent: If done like inheritance, this should be like
inheritance, so perfectly transparent.
b) missing translation: Supported by using LEFT JOIN.
e) searchable: Like inheritance.
f) extendable: Like normal models, south, django-evolution or similar
perhaps needed.
g) convert: Script needed for this, like sync_transmeta_db does.
h) keep context: Relations need to mind obj.language, should definitely
be possible.
i) optional: If you don't use translation.register() no translation is
done, not even the table is created.
j) generic: If you have some use-case I'm missing tell me.
k) central fields: slug can be translated, access is simple as fwhen
using inheritance.
l) all translations: Just leave the "AND xxx.language='yy'" out of the
JOIN and you get every translation. Similar to using Book.objects.all()
with my approach.
m) admin: Like solution two, I think people have come up with something
here. I still like the idea of viewing every possible translation ans
being able to edit this like one distinct object. But there might be
better solutions.

I have attached some sample usage example, perhaps this gives you some
more detail on the API I suggest.

I have looked into the code and think implementing this should be
possible, but needs some changes in the django-ORM itself. If should be
possible to implement this creating some TranslationQuery-object, but
you would have to copy many code to keep behavior in sync with the
normal Query.

If you read down until here, thank you. I know this is a lot of text
(hey, it only took me about 2 hours to write this down, after thinking
about a solution for the last weeks). I would like to get some input on
this topic, about what you think model translations could look like.

Marc, I don't know if you have some proposal of your own. Perhaps we can
share ideas and even start implementing this together. I am willing to
spend some time with this topic, because I need some solution flexible
enough (aka "fits my needs") for a client. Additionally I think django
would very much benefit from a official solution on this topic.

Greetings, David Danier


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

=> Just some basic idea
=> Needs to rewrite things down till the django Query objects, so implementing
   this is non-trivial

>>> class SomeObj(models.Model):
>>>     foo = models.CharField(...)
>>>     bar = models.CharField(...)
>>> 
>>> class SomeObjTranslation(translation.ModelTranslation):
>>>     class Meta:
>>>             fields = ('foo',)
>>>             # use INNER or LEFT JOIN (LEFT JOIN means you get None back,
>>>             # if selected translation does not exist, INNER JOIN means
>>>             # objects without a translation get skipped)
>>>             allow_empty = True
>>> 
>>> translation.register(SomeObj, SomeObjTranslation)

=> Removes field 'foo' from SomeObj
=> Creates a new class with this field (+ language + object)
=> Sets up some useful methods, see below
=> Better: Change field 'foo' to point to SomeObjTranslation, like
   model inheritance

--> Gets rewritten to:

>>> class SomeObj(models.Model):
>>>     @property # has setter, too of course
>>>     def foo(self):
>>>             return self.translations....foo
>>>     @property
>>>     def language(self): ...
>>>     bar = models.CharField(...)
>>> 
>>> class SomeObjTranslation(models.Model):
>>>     language = models.CharField(max_length=5, choices=settings.LANGUAGES,
>>>             default=settings.LANGUAGE_CODE)
>>>     object = models.ForeignKey(SomeObj, related_name='translations')
>>>     foo = models.CharField(...)
>>>     class Meta:
>>>             # only one translation per language and object
>>>             unique_together = (('language', 'object'),)

-->

>>> SomeObj.objects.get(pk=1)
=> SELECT ... FROM someobj LEFT JOIN someobjtranslation ON
                (someobj.id = someobjtranslation.object_id AND
                 someobjtranslation.language={{ DEFAULT_LANGUAGE }})
                WHERE ...
=> Cache for SomeObjTranslation is filled
=> DEFAULT_LANGUAGE = request language or settings.LANGUAGE_CODE
=> LEFT JOIN as allow_empty is True

>>> SomeObj.objects.language('en').get(pk=1)
=> SELECT ... FROM someobj LEFT JOIN someobjtranslation ON
                (someobj.id = someobjtranslation.object_id AND
                 someobjtranslation.language='en')
                WHERE ...
=> You can easily use some other language without needing to think about the SQL
=> Cache for SomeObjTranslation is filled

-->

>>> class SomeOtherObj(models.Model):
>>>     someobj = models.ForeignKey(SomeObj, related_name='other')
>>>     foo = models.CharField(...)
>>> 
>>> class SomeOtherObjTranslation(translation.ModelTranslation):
>>>     def some_method(self): ...
>>>     class Meta:
>>>             fields = ('foo',)
>>>             # see above (SomeObjTranslation)
>>>             allow_empty = False
>>> 
>>> translation.register(SomeOtherObj, SomeOtherObjTranslation)

--> Gets rewritten to:

>>> class SomeOtherObj(models.Model):
>>>     someobj = models.ForeignKey(SomeObj, related_name='others')
>>>     @property # has setter, too of course
>>>     def foo(self):
>>>             return self.translations....foo
>>>     @property
>>>     def language(self): ...
>>>     def some_method(self): # all attributes of translation get mirrored here
>>>             self.translations....some_method()
>>> 
>>> class SomeOtherObjTranslation(models.Model):
>>>     language = models.CharField(max_length=5, choices=settings.LANGUAGES,
>>>             default=settings.LANGUAGE_CODE)
>>>     object = models.ForeignKey(SomeOtherObj, related_name='translations')
>>>     foo = models.CharField(...)
>>>     def some_method(self): ... # like when you define this on normal models
>>>     class Meta:
>>>             # only one translation per language and object
>>>             unique_together = (('language', 'object'),)

-->

>>> obj = SomeObj.objects.get(pk=1)
>>> obj.others
=> SELECT ... FROM someotherobj OUTER JOIN someotherobjtranslation ON
                (someotherobj.id = someotherobjtranslation.object_id AND
                 someotherobjtranslation.language={{ obj.language }})
                WHERE ...
=> Language "context" of obj is used
=> OUTER JOIN as allow_empty is False, so this query may raise
   SomeOtherObj.DoesNotExist on get(), even if this works for other languages

>>> obj = SomeObj.objects.get(pk=1)
>>> obj.others.language('en')
=> SELECT ... FROM someotherobj OUTER JOIN someotherobjtranslation ON
                (someotherobj.id = someotherobjtranslation.object_id AND
                 someotherobjtranslation.language='en')
                WHERE ...
=> You can overwrite the language used on related managers, too

>>> obj.switch_language('en')
=> SELECT ... FROM someobjtranslation WHERE
                someobjtranslation.object_id={{ obj.pk }} AND
                someobjtranslation.language='en'
=> Fill translation cache with 'en' (already loaded translation gets _not_
   overwritten)
>>> obj.others
=> SELECT ... FROM someotherobj LEFT JOIN someotherobjtranslation ON
                (someotherobj.id = someotherobjtranslation.object_id AND
                 someotherobjtranslation.language={{ obj.language }})
                WHERE ...
=> context is used again, but changed to 'en' (query construction is the same)

-->

>>> SomeObj.objects.get(foo='bar')
=> SELECT ... FROM someobj LEFT JOIN someobjtranslation ON
                (someobj.id = someobjtranslation.object_id AND
                 someobjtranslation.language={{ DEFAULT_LANGUAGE }})
                WHERE ... AND
                someobjtranslation.foo='bar'
=> Does select right table without needing to worry about where the field comes
   from

>>> SomeObj.objects.language('en').get(foo='bar')
=> SELECT ... FROM someobj LEFT JOIN someobjtranslation ON
                (someobj.id = someobjtranslation.object_id AND
                 someobjtranslation.language='en')
                WHERE ... AND
                someobjtranslation.foo='bar'
=> You can use normal queries without needing to worry about anything, even
   language switching works

-->

>>> obj = SomeObj.objects.get(pk=1)
>>> obj.save()
=> Does save SomeObj _and_ SomeObjTranslation

-->

>>> class BrokenObj(models.Model):
>>>     foo = models.CharField(...)
>>>     bar = models.CharField(...)
>>>     class Meta:
>>>             unique_together = (('foo', 'bar'),)
>>> 
>>> class BrokenObjTranslation(translation.ModelTranslation):
>>>     class Meta:
>>>             fields = ('foo',)
>>> 
>>> translation.register(BrokenObj, BrokenObjTranslation)
=> Fails as unique_together will not work anymore
=> There may be other places where splitting up the model produces side effects

Reply via email to