Hi all, Having popped my head in to the previous model translation thread in December, I'll do so here as well. I apologize for the length of this post, but the issue is complex, so it can't really be helped.
Last time around I mentioned having some ideas on how to maybe do model translation in a different way than the currently available alternatives. In the intervening time, I've started hacking on a proof- of-concept type project, tentatively named django-modelinguistic, but it's only partially functional and nowhere near a releasable state. I'd like to present some general considerations here for public scrutiny, as well as describe the approach django-modelinguistic is currently taking. The project, while having started promisingly, has been stuck for a good while due to my limited understanding of Django internals. First, here's an incomplete list of things a theoretical optimal model translation approach should achieve (with the assumption that it's a reusable app instead of a Django core component, in line with what Jacob said): 1. It should Just Work as a drop-in component in any existing project, no matter what apps that project is composed of, with minimal configuration. It must not be mandatory to build your app from scratch with model translation in mind. You need to be able to translate the models of translation-unaware third party reusable apps as well as your own. 2. It must not require changes to existing models. No extra fields, nothing. One obvious approach is the admin-style (and django- modeltranslation-style) registration of models, where translation functionality is added dynamically to live alongside the untranslated bits in some way. 3. Reads need to be transparent by default. Fetching the data a of translated model field should return the language version corresponding to the active language. In case a model instance doesn't have translated data for a field in the active language, it must gracefully fall back to the default language. Of course, sometimes you'll want to retrieve a specific language version regardless of the active language, so that must also be possible. 4. Writes need to be intuitive by default. Creating new model instances and updating existing ones must work sensibly and without breaking translation-unaware apps. 5. It must work well with schema migration tools, which in practice means South. 6. It needs to integrate well with contrib.admin. Some specific issues and examples follow. Regarding point 1: it's unlikely that any translation solution could really work with all existing projects and combinations of third-party apps, especially those that do some funky model-level hackery themselves. I have a feeling that the best one can do is to attempt an 80/20 solution that works in the common case. For example, the use of raw SQL is one thing that a translation solution based around the ORM really can't work around in any way that I can see. Regarding point 2: crucially, you don't want to start tweaking the model classes of third-party apps that you've probably installed into a virtual environment with pip and have no desire to fork. You need to be able to translate them, but altering their models is not the way to go. Maintaining your translation-related model changes with upstream changes would be horrible. Regarding point 3: some examples are in order. Say we have a model class called Animal with a "name" CharField, and the default language is English. The instance with a PK of 1 is a dog, thus "name" equals "Dog" in the default language. The "name" field of Animal is then marked for translation into Swedish and Finnish, and the dog instance is updated with new language versions using whatever mechanism is appropriate (TBD). After this, if you activate Swedish, Animal.objects.get(pk=1).name will return "Hund". Activate Finnish, and it'll return "Koira". In the case of filtering, if the active language is Finnish, Animal.objects.filter(name="Koira") should return the correct Animal instance. This probably means that .filter(name="Dog") will return an empty set when the active language is not English (workarounds to get the correct object through any language version may be possible). Should you want a specific language version instead of the active one, that can be done with a custom manager that the translation app can provide for registered models. An example of this follows later. Regarding point 4: this is TBD as far as my forays into the topic and django-modelinguistic go. I haven't yet thought through the relationship of the active language and what gets written where. Regarding point 5: I had a discussion about this with Andrew Godwin on the South Users mailing list. I'll summarize the main points here. At work, we've used django-modeltranslation on a few sites that use the same internally developed apps, but different project-level language configurations. South migrations are app-level, and if you know django- modeltranslation, you may guess where this is going. Two of the sites (call them A and B) use Finnish and English, and one of them (C) only uses Finnish. A is the master site against which the main development is done, including migrations. The same migrations apply cleanly on B, but fail on C. The reason? Imagine a model called Product with a CharField called "name" that is marked for translation. With django-modeltranslation's dynamic field generation approach, Product has the fields "name", "name_fi" and "name_en" on A and B, but just "name" and "name_fi" on C. The migrations are done on A and therefore refer to "name_en", which doesn't exist on C. South quite obviously doesn't like this, and porting new stuff from A to C always means nasty hackery. In our case, we could just have django-modeltranslation also create "name_en" on C and just leave it empty for all model instances, but that's beside the point: the problem is that with django- modeltranslation, project-level language settings affect app-level table schemas and therefore South migrations. This is bad for reusable apps in general, and a proper model translation approach can't do this. For the Product model, the translation data simply cannot live as dynamically generated name_* fields in the appname_product database table. Regarding point 6: this is really hard. Good translation interfaces are not trivial to create. One of django-modeltranslation's advantages is that the translated fields are visible to the add/change view of a model instance: "name_fi" and "name_en" are right there along with "name". We've hacked a DOM-altering active language switching UI into the change view using custom admin JS/CSS so that only one name field is visible at a time, and it works OK. But if the translation data is to live outside the main model table, a completely different approach is needed. If Django is to be modified in a way to make translation apps feasible, some sort of admin hooks for translation interfaces may be necessary. So that's the ideal, theoretical solution. More requirements for such a beast probably exist, but those are the ones I could think of right now. The long-dormant django-modelinguistic is not anywhere near that. In its current larval stage it achieves parts of goals 1, 2, and 3. This post is already too long, but I'll describe the current approach and an alternative that seems interesting but which I don't know how to do. Modelinguistic relies on an admin-like registration approach. It creates language-specific copies of all the registered model classes, replaces their managers (custom ones, too) with descriptors that can retrieve correct language versions transparently. It also adds a "callable descriptor" (a wrapper around a "manager factory" callable, really), used like this: Animal.translated_objects('fi').get(...), which gets you a Finnish Animal object regardless of what the active language is. Animal.objects.get() would get you the active language version transparently, as would Animal.my_custom_manager.get(). The translated model class copies and the original managers live in a global translation registry dictionary keyed by the original model class. Thanks to ModelBase metaclass magic, the type() invocation to create the class copies register the new models in Django's app cache, through which they can be seen by South, syncdb, sqlall etc. In the database, the model copies live as suffixed extra tables. animals_animal is the default English table, animals_animal_fi its Finnish version that may or may not have translated data in it. All the fields are copied, not just the translated ones, which is wasteful, unfortunately. So, if you do Animal.objects.get(pk=1) with Finnish active, you actually get an Animal_fi instance, with all the untranslated field data the same as in the Animal instance, but the translated field data, well, translated. Yes, you need not even mention the problems of writing and updating data across these table copies. I know. That's django-modelinguistic right now. It's got a bunch of TDD developed code that works in a very limited set of read-only circumstances. I hate how hacky it is, and I hate not being capable of making it better. I probably won't ever complete it, but if someone is interested in the approach, I can publish the code somewhere for what little it's worth as a jumping-off point. The good part is that it can be dropped in with existing code and won't require model changes. But. Jacob mentioned the possibility of making changes to Django to make model translation apps feasible. One thing that could *possibly* enable a more elegant translation solution would be the ability of inherited models to shadow the fields of their parents. OneToOneField is almost there. I'd try and subclass it to allow for shadowing, but the code of related fields is too complex and I don't understand it. But I love how the OneToOne relation between, say, auth.User and a Customer model that inherits from it enables transparent access to User fields through a Customer instance. Assuming the shadowing-enabled subclass of OneToOneField was called ShadowingOneToOneField, something like this could happen: -- >>> class Animal(models.Model): ... name = models.CharField(max_length=255) ... trinomial_name = models.CharField(max_length=255) >>> class AnimalTranslationOptions(TranslationOptions): ... translated_fields = ('name',) >>> register(Animal, AnimalTranslationOptions) # The register() function living in the hypothetical translation app # would create an in-memory model in the app cache that corresponds to a model # like this, represented in the database as the animals_animal_fi table: # # class Animal_fi(models.Model): # name = models.ShadowingOneToOneField(Animal) >>> animal = Animal.objects.create(name='Dog', ... trinomial_name="Canis lupus familiaris") # ... time passes, the Animal instance gets a Finnish and Swedish translation # for the "name" field, perhaps through a custom admin interface ... >>> activate('en-us') >>> animal = Animal.objects.get(name='Dog') >>> animal.name "Dog" >>> activate('fi') >>> animal.name "Koira" >>> activate('sv') >>> animal.name "Hund" >>> animal.trinomial_name # not marked for translation, so not in Swedish here "Canis lupus familiaris" >>> from django.ponies import pony; pony.fly() "Whee!" -- There would need to be a lot of descriptor action or something going on there so that "name" would resolve to either Animal.name, Animal_fi.name or Animal_sv.name depending on the active language. Sadly, I'm not sure if the South migration problem described earlier is solvable with this approach, either. Anyway, no need to pile on me calling me stupid for all the shortcomings that my ideas inevitably have :-). Just throwing things out there, maybe someone smarter will be inspired to create something that actually works. In a perfect world, databases wouldn't suck this much as a means of holding a variable number of translated versions of a column's data. Instead, a TRANSLATED_VARCHAR(255) column called "name" could have any number of translations stored along with the default language, all of which could be 255 characters long, and you could access them with standard syntax: "SELECT `name` IN 'fi' FROM animals_animal WHERE id=1;" or something, and the ORM could just work with that. One can dream. Perhaps NoSQL databases and their Django backends will make something like this possible one day. - JK Laiho -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.