Re: Model translation

David Danier Sat, 07 Aug 2010 13:34:26 -0700

Hi all,

sorry if this gets very long, but I will try to write down my current
opinion and experience with so called "model translation". I have put up
multiple sites using translatable content and I have written some apps
to help me doing so (none of which are public so far, as I hated my
first approaches after a few days/weeks and I'm not sure about these
apps so far). Anyway I tried to focus on some kind of 80%-solution,
which does only work for some (most) cases, but only needed 20% of the
work (http://en.wikipedia.org/wiki/Pareto_principle). I'll write a
little about these solutions at the end, but it is not be the focus of
this email.


Currently I think there are two completely different approaches for
doing model translations:

1. You have an object for each language. This object contains some
language-attribute which makes it easy to filter stuff. Admin and views
are no-brainers.

Of course this solution is only suitable for some special cases. News
for example might only exist in one language, so its perfectly fine (and
even preferred) to have different content for each language.

Language switching might become a problem, if you want to link to the
current object in a different language. Fixing this can be done by
adding a some kind of group-model, to group all the translations of one
object into one translation group. This can even be done using generic
foreign keys, which makes this an easy and reusable solution. I created
one generic app for this, but so far this is not public and I'm pretty
sure it missed many things.

Of course having one object for each language will become nasty if you
need common data to be equal for each object translations. You could
sync this, but...

2. Having some kind of common data, which needs to be equal for every
translation should really be solved inside the database. There exist
many solutions for this, which all fix some problems and create new. The
three most common solutions use language-suffixes, putting translations
into separate models or using some dict/pickle approach.

First I will try to write down what I think is important for providing a
solid solution.

a) Getting out of the users way

If I want to fetch some object I don't want to care about translations.
This is even true if I need to filter/order by some translated
attribute. I don't want to write stuff like (cur_lang ==
translation.get_language()):
Entry.objects.get(**{'slug_%s' % cur_lang: 'something'})
or:
Entry.objects.get(common__language=cur_lang, common__slug='something')
What I want is the plain old:
Entry.objects.get(slug='something')
or perhaps:
Entry.objects.localize().get(slug='something')

The same thing is true for accessing the attributes, but most approached
solve this, so no need to bring this up.

btw dict/pickle solutions fail to provide access to the data in the
query, regardless of how hard you try. So they fail big here.

b) The should not be too much overhead involved

You currently can choose between loading all translations, needing an
additional SQL query or unpickling some data. None of this is ideal. I
personally think a JOIN could be acceptable, but of course this also is
some overhead.

c) It should allow (not support) special cases

Sometime you need strange things like some field is optional in one
language while being needed in some other language. There might even be
fields that do not need to exist at all for one out of 10 languages. The
needs might of course be much simpler. As this cases are somewhat
esoteric they should not be a show-stopper for model translations. But
having heard about this might prevent some solution being to might
tightened up.

This btw is one thing about the "put all common data into it's own model
and JOIN away" I don't like. All common data needs to follow the same
rules, this may not be possible in all cases.

d) Managing languages should be easy

I don't think this needs to be the huge problem everybody likes to call
it. For me south solves this pretty well. If we get something like south
to be in core or the so called "official solution" managing changes in
translations becomes easy.

e) One might add it should be possible to add translations for third
party apps or create translations for your apps without changing the
basis. I think this is only part true.

As this adds a ton of new dependencies and side-effect I personally
think you should be able to do something to use translateable models. Of
course the changes you need should be as minimal as possible. 

Third-party-apps are a special case as you probably cannot maintain your
own copy. But I think really thinking avout third-party-apps should be
done when a solid solution is ready. Trying to solve all problems at
once just makes you go crazy. (Of course keeping third-party-apps in
mind is preferred)

e) Model relations should not become to nasty. Creating a translatable
Many2Many/ForeignKey inside your translations will get ugly with most
solutions. I think currently this is only easy when adding all fields to
one model (suffix). But reverse relations will suck this way. Usecase:
Every translations needs different tags.

=> So this only leaves adding fields with language suffixes, at least if
we look at what Django currently provides (I'll write down some things
about this later). It is easy, fast enough and yes, it will eat your
RAM.

So, about the other solutions: dict/pickle fails for using the fields
inside queries, so for me they are out of question. I think many people
use them (I like to call this the "gettext" approach), but I certainly
do not see any point in this. Of course it might probably be handy when
translating legacy apps.

What about "putting common data into its own model"? I like this
solution, I even like this solution so much I tried to implement it
several times. BUT you cannot get it to use a nice query. Most of the
time you will need to fetch the translation inside an separate query as
select_related() cannot fetch the translation even if the JOIN is unique
(qs.filter(common__language='xx') will create a unique JOIN). This
certainly could be improved.

Of course there's the thing about "getting into my way", which currently
every implementation of using multiple models currently has. I don't
think we should need to think about the different models here. Actually
model inheritance solves this, so perhaps the best approach might be to
get model inheritance more generic, so it could be used for other
things, too. Allowing users to define their own JOINs while keeping all
attributes inside the same object and not needing to do something
special inside queries is definitely a nice feature (and there might be
more use-cases for this, versioning models for example).

SQL could be something like:
SELECT ... FROM entry OUTER JOIN entry_trans ON
(entry.id=entry_trans.entry_id AND entry_trans.language='en') WHERE ...

I don't know the Django internals enough, but if this could be done
externals model translation should be possible without much hassle.


Other Django enhancements:

Add some LanguageField! Why not add some Language field? This should be
pretty easy. Currently I use some field in every project which basicly
only is a CharField with predefined max_length. This would certainly
make things more easy and allow multiple (third-party-)apps to share
some generics.

Virtual fields? Adding support for some kind of virtual fields might
enhance things. This just came into my mind, so it might be wrong.

Extendible QuerySets: I prefer to put new filters into QuerySets (and
adding an Manager for each new method), so I can choose to use
Entry.localized.all() or Entry.objects.localize() how I want. Adding
there methods to the QuerySet also allows to use it with related
managers (User.entries.localize()), which really is great. But having
some Manager for every possible QuerySet while allowing stacking of
QuerySets gets complicated fast. This probably only is true if you need
to add parameters to your Manager which get passed to the QuerySet.


Further problems:

Language selection: This is about how Django detects the user language
and how the user is able to select the language. Django could provide
more defaults here, which might be detecting the language based on the
request path, request domain or some other practical informations. I
currently use the request path for translations.

Only having the option to change language by cookie is bad for most
cases. Every public site needs to provide different URLs, so people can
link to one translation, search engines can crawl all translations, ...

...which brings me to i18n URLs. I currently have an urls.py for every
translation and use something like {% url foo_bar language=... %}. This
could certainly be improved, I think.


My solution(s):

Currently I have two apps, which help me do translations.

The first one allows me to group translations together, this is only
useful when having different content for each language (->
language-attribute). The app itself is pretty easy, but helps me get
translations organized (admin integration) and enhance the user
experience (language links go directly to translations).

The second app is my solution to adding language-suffix-fields to a
model. It is as simple as it gets, by not providing any help adding
these fields, you have to define all fields yourself (which is useful,
as all fields may have different options and even type). The app
provides a class to implement the "access the right attribute" glue
(name_en = CharField(...)\n name = I18NAttribute()).

In addition I have developed a QuerySet which provides a
localize()-method, that does:
 * If the model has some language-field it just returns 
   filter(language=cur_lang)
 * If you have I18NAttributes inside your model it will rewrite
   calls to filter/order_by() to use the right field:
   filter(name='...') -> filter(name_xx='...')
   filter(name__contains='...') -> filter(name_xx__contains='...')
   order_by('name') -> order_by('name_xx')
   ...models.Q-filters do not work of course

These apps are as simple as I could implement them, but they both helped
me a lot more than any other full blown solution. This is why I think we
should create better tools for doing such things inside Django instead
of trying to provide a solution to solve everything.

I hope I haven't missed something essential. Model translations really
touches most of the parts of Django (urls.py, QuerySet, views and of
course models). I intentionally have left out some aspects, because they
are not relevant to most users (for example translated content and full
text search (haystack)).


Thanks for reading this far,
David Danier

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Model translation

Reply via email to