Re: Model translation

JK Laiho Thu, 05 Aug 2010 03:22:22 -0700

Hi all,

Having popped my head in to the previous model translation thread in
December, I'll do so here as well. I apologize for the length of this
post, but the issue is complex, so it can't really be helped.


Last time around I mentioned having some ideas on how to maybe do
model translation in a different way than the currently available
alternatives. In the intervening time, I've started hacking on a proof-
of-concept type project, tentatively named django-modelinguistic, but
it's only partially functional and nowhere near a releasable state.

I'd like to present some general considerations here for public
scrutiny, as well as describe the approach django-modelinguistic is
currently taking. The project, while having started promisingly, has
been stuck for a good while due to my limited understanding of Django
internals.

First, here's an incomplete list of things a theoretical optimal model
translation approach should achieve (with the assumption that it's a
reusable app instead of a Django core component, in line with what
Jacob said):

1. It should Just Work as a drop-in component in any existing project,
no matter what apps that project is composed of, with minimal
configuration. It must not be mandatory to build your app from scratch
with model translation in mind. You need to be able to translate the
models of translation-unaware third party reusable apps as well as
your own.

2. It must not require changes to existing models. No extra fields,
nothing. One obvious approach is the admin-style (and django-
modeltranslation-style) registration of models, where translation
functionality is added dynamically to live alongside the untranslated
bits in some way.

3. Reads need to be transparent by default. Fetching the data a of
translated model field should return the language version
corresponding to the active language. In case a model instance doesn't
have translated data for a field in the active language, it must
gracefully fall back to the default language. Of course, sometimes
you'll want to retrieve a specific language version regardless of the
active language, so that must also be possible.

4. Writes need to be intuitive by default. Creating new model
instances and updating existing ones must work sensibly and without
breaking translation-unaware apps.

5. It must work well with schema migration tools, which in practice
means South.

6. It needs to integrate well with contrib.admin.

Some specific issues and examples follow.

Regarding point 1: it's unlikely that any translation solution could
really work with all existing projects and combinations of third-party
apps, especially those that do some funky model-level hackery
themselves. I have a feeling that the best one can do is to attempt an
80/20 solution that works in the common case. For example, the use of
raw SQL is one thing that a translation solution based around the ORM
really can't work around in any way that I can see.


Regarding point 2: crucially, you don't want to start tweaking the
model classes of third-party apps that you've probably installed into
a virtual environment with pip and have no desire to fork. You need to
be able to translate them, but altering their models is not the way to
go. Maintaining your translation-related model changes with upstream
changes would be horrible.


Regarding point 3: some examples are in order. Say we have a model
class called Animal with a "name" CharField, and the default language
is English. The instance with a PK of 1 is a dog, thus "name" equals
"Dog" in the default language.

The "name" field of Animal is then marked for translation into Swedish
and Finnish, and the dog instance is updated with new language
versions using whatever mechanism is appropriate (TBD).

After this, if you activate Swedish, Animal.objects.get(pk=1).name
will return "Hund". Activate Finnish, and it'll return "Koira".

In the case of filtering, if the active language is Finnish,
Animal.objects.filter(name="Koira") should return the correct Animal
instance. This probably means that .filter(name="Dog") will return an
empty set when the active language is not English (workarounds to get
the correct object through any language version may be possible).

Should you want a specific language version instead of the active one,
that can be done with a custom manager that the translation app can
provide for registered models. An example of this follows later.


Regarding point 4: this is TBD as far as my forays into the topic and
django-modelinguistic go. I haven't yet thought through the
relationship of the active language and what gets written where.


Regarding point 5: I had a discussion about this with Andrew Godwin on
the South Users mailing list. I'll summarize the main points here. At
work, we've used django-modeltranslation on a few sites that use the
same internally developed apps, but different project-level language
configurations. South migrations are app-level, and if you know django-
modeltranslation, you may guess where this is going.

Two of the sites (call them A and B) use Finnish and English, and one
of them (C) only uses Finnish. A is the master site against which the
main development is done, including migrations. The same migrations
apply cleanly on B, but fail on C.

The reason? Imagine a model called Product with a CharField called
"name" that is marked for translation. With django-modeltranslation's
dynamic field generation approach, Product has the fields "name",
"name_fi" and "name_en" on A and B, but just "name" and "name_fi" on
C. The migrations are done on A and therefore refer to "name_en",
which doesn't exist on C. South quite obviously doesn't like this, and
porting new stuff from A to C always means nasty hackery.

In our case, we could just have django-modeltranslation also create
"name_en" on C and just leave it empty for all model instances, but
that's beside the point: the problem is that with django-
modeltranslation, project-level language settings affect app-level
table schemas and therefore South migrations. This is bad for reusable
apps in general, and a proper model translation approach can't do
this. For the Product model, the translation data simply cannot live
as dynamically generated name_* fields in the appname_product database
table.


Regarding point 6: this is really hard. Good translation interfaces
are not trivial to create. One of django-modeltranslation's advantages
is that the translated fields are visible to the add/change view of a
model instance: "name_fi" and "name_en" are right there along with
"name". We've hacked a DOM-altering active language switching UI into
the change view using custom admin JS/CSS so that only one name field
is visible at a time, and it works OK. But if the translation data is
to live outside the main model table, a completely different approach
is needed. If Django is to be modified in a way to make translation
apps feasible, some sort of admin hooks for translation interfaces may
be necessary.

So that's the ideal, theoretical solution. More requirements for such
a beast probably exist, but those are the ones I could think of right
now.

The long-dormant django-modelinguistic is not anywhere near that. In
its current larval stage it achieves parts of goals 1, 2, and 3. This
post is already too long, but I'll describe the current approach and
an alternative that seems interesting but which I don't know how to
do.

Modelinguistic relies on an admin-like registration approach. It
creates language-specific copies of all the registered model classes,
replaces their managers (custom ones, too) with descriptors that can
retrieve correct language versions transparently. It also adds a
"callable descriptor" (a wrapper around a "manager factory" callable,
really), used like this: Animal.translated_objects('fi').get(...),
which gets you a Finnish Animal object regardless of what the active
language is. Animal.objects.get() would get you the active language
version transparently, as would Animal.my_custom_manager.get().

The translated model class copies and the original managers live in a
global translation registry dictionary keyed by the original model
class. Thanks to ModelBase metaclass magic, the type() invocation to
create the class copies register the new models in Django's app cache,
through which they can be seen by South, syncdb, sqlall etc.

In the database, the model copies live as suffixed extra tables.
animals_animal is the default English table, animals_animal_fi its
Finnish version that may or may not have translated data in it. All
the fields are copied, not just the translated ones, which is
wasteful, unfortunately.

So, if you do Animal.objects.get(pk=1) with Finnish active, you
actually get an Animal_fi instance, with all the untranslated field
data the same as in the Animal instance, but the translated field
data, well, translated. Yes, you need not even mention the problems of
writing and updating data across these table copies. I know.

That's django-modelinguistic right now. It's got a bunch of TDD
developed code that works in a very limited set of read-only
circumstances. I hate how hacky it is, and I hate not being capable of
making it better. I probably won't ever complete it, but if someone is
interested in the approach, I can publish the code somewhere for what
little it's worth as a jumping-off point. The good part is that it can
be dropped in with existing code and won't require model changes.


But.

Jacob mentioned the possibility of making changes to Django to make
model translation apps feasible. One thing that could *possibly*
enable a more elegant translation solution would be the ability of
inherited models to shadow the fields of their parents.

OneToOneField is almost there. I'd try and subclass it to allow for
shadowing, but the code of related fields is too complex and I don't
understand it. But I love how the OneToOne relation between, say,
auth.User and a Customer model that inherits from it enables
transparent access to User fields through a Customer instance.

Assuming the shadowing-enabled subclass of OneToOneField was called
ShadowingOneToOneField, something like this could happen:

--

>>> class Animal(models.Model):
...    name = models.CharField(max_length=255)
...    trinomial_name = models.CharField(max_length=255)

>>> class AnimalTranslationOptions(TranslationOptions):
...    translated_fields = ('name',)

>>> register(Animal, AnimalTranslationOptions)

# The register() function living in the hypothetical translation app
# would create an in-memory model in the app cache that corresponds to
a model
# like this, represented in the database as the animals_animal_fi
table:
#
# class Animal_fi(models.Model):
#     name = models.ShadowingOneToOneField(Animal)

>>> animal = Animal.objects.create(name='Dog',
...    trinomial_name="Canis lupus familiaris")

# ... time passes, the Animal instance gets a Finnish and Swedish
translation
# for the "name" field, perhaps through a custom admin interface ...

>>> activate('en-us')
>>> animal = Animal.objects.get(name='Dog')
>>> animal.name
"Dog"
>>> activate('fi')
>>> animal.name
"Koira"
>>> activate('sv')
>>> animal.name
"Hund"
>>> animal.trinomial_name # not marked for translation, so not in Swedish here
"Canis lupus familiaris"
>>> from django.ponies import pony; pony.fly()
"Whee!"
--

There would need to be a lot of descriptor action or something going
on there so that "name" would resolve to either Animal.name,
Animal_fi.name or Animal_sv.name depending on the active language.

Sadly, I'm not sure if the South migration problem described earlier
is solvable with this approach, either.

Anyway, no need to pile on me calling me stupid for all the
shortcomings that my ideas inevitably have :-). Just throwing things
out there, maybe someone smarter will be inspired to create something
that actually works.

In a perfect world, databases wouldn't suck this much as a means of
holding a variable number of translated versions of a column's data.
Instead, a TRANSLATED_VARCHAR(255) column called "name" could have any
number of translations stored along with the default language, all of
which could be 255 characters long, and you could access them with
standard syntax: "SELECT `name` IN 'fi' FROM animals_animal WHERE
id=1;" or something, and the ORM could just work with that. One can
dream. Perhaps NoSQL databases and their Django backends will make
something like this possible one day.

- JK Laiho

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Model translation

Reply via email to