Hi Marc, hi Gonzalo, hi django-dev, I'm writing this email, because I need some kind of future proof model translation in django in the near future. I tried many solutions and came up with two of my own (not released anywhere), but nothing seems to fit all needs translations might impose.
So I thought about how model translations in django could work, and how this should be used by the developer. I send this email directly to Marc, because he is currently working on some i18n-stuff (GSoC). Gonzalo seems to be interested in an official solution (http://groups.google.com/group/django-developers/msg/0be7de2c154aa49d?utoken=lffASzEAAABAWiol0apP9nFxQklB-_Jkc1-9iUSAb45YspdDeB6HhPWDe8H4lvYuxY4dyvcAHxqMKN2o9YZfQ4UFyks_AAUi), so you might be interested in some discussion about this. Perhaps Malcolm should be asked about how my proposal fits into the django ORM, as he has (re)written most of the ORM (-> queryset-refactor). I know Django 1.1 is on its way and developers might not have the time to really thnink about this now, but I needed to write it down. If someone answers and we can start working on this for post-1.1 would be great, of course. I, personally, don't need to wait. Anyway, according to the bugtracker theres not much delaying Django 1.1: http://code.djangoproject.com/query?status=new&status=assigned&status=reopened&component=!Translations&component=!Documentation&milestone=1.1&order=priority First to summarize the needs. It has to ... a) be fast, no heavy database-load should be needed b) work for third-party apps, as third party apps get unusable if you need i18n and they don't support it (this also means the existing third-party code must still work for translated models) c) be transparent for the user, you shouldn't need to think where your fields come from d) support missing translations without skipping the whole database-row, if you for example JOIN two tables (optional) e) be searchable through the normal ORM, translated fields should not be hidden in some serialized format f) be extendable, new fields shouldn't be a big problem g) convert existing data without a hassle (like sync_transmeta_db) h) keep context, if I fetch a german model from the database all relations should fetch german models, too (default-context should come from request and/or settings) i) be optional, not every project needs model translations, this is especially important for third-party-apps, supporting translations should not mean you have to use it j) be generic, different people have different needs k) support translating central fields (like slug), this goes hand-in-hand with easy searchability of the translated fields l) support getting all translations for every object in the database m) integrate well into the admin All existing solutions fail at some of these points, my own two solutions failed many of these points, too. So first, lets take a look at what exists: 1. Put every translation in extra fields inside one big table: I'm not a fan of this, tends to get messy, especially for big sites with many translations. Developers need to think where their fields come from (Book.objects.filter(title_de=...)). Does not support third party apps. 2. Put all fields into its own model, use normal ORM for access. I you look at this from a database perspective this is the best solution. But the django ORM does not really support you here. Access to the fields need an extra query, for every object. Even worse: If you use some translated fields in your query, you have to do this extra query anyway (Book.objects.filter(translations__slug='foobar', translations__language='de'), to get the german translation you have to execute an extra query, although the JOIN was already done). 3. Use serialization to save all fields into one big BLOB. Do I need to say anything here? Not searchable, no way to really work with the translations on the DB-layer. 4. Special case: Create an own model for translations (like 2), but use the original table for some kind of DEFAULT_LANGUAGE. (pluggable-model-i18n) I kind of like this, not only because you are able to enhance third-party-apps. Independent from that I think translations.register() seems to be a nice idea. But(!) this complicates things to much. You have to choose whether you want a field out of the DEFAULT_LANGUAGE or not for every field access/query. 5. gettext like approach. Save a translation for possible string in the database, fallback to gettext. DB might explode here, as DB-load should be heavy, really heavy. So, what to make of these ideas people put into their code here, all projects have some advantage, while delivering some disadvantages. From a DB-perspective the second solutions seems to be best, from the usability-perspective the fourth solutions looks great. The first solution really shown its advantages when you look at queries (Book.objects.filter(name_en='this is easy')). First I want to throw one of my own solutions in, as I think this is interesting when looking at the admin. What I did was reversing the way we currently look at solution two, meaning the translation itself gets the model you work with and the rest is put into some "common" model: class BookCommon(models.Model): some_field = models.CharField(...) class Book(models.Model): common = models.ForeignKey(BookCommon) language = models.CharField(...) translated_field = ... Whats interesting here are two things: * We can optimize DB-access using select_related(): Book.objects.select_related('common') ...to we can cache the common attributes, great benefit * The admin lists every translation for every objects, common fields need to be "copied" to BookForm. New translations can be created by just changing the language and hit "save as". Anyway, this solution did not work well, as relations between objects get a mess now (you normally have a ForeignKey to/from the Common-Model, so you are left with all problems of solution two when accessing related objects). Additionally you cannot access objects that have no translation in your current language, as Book.objects.filter(language='de') filters out all rows that doesn't have a translation. But I think this shows how the second solution can be enhanced to get this - from a DB-perspective good solution - to be more usable. It also is the solution that brought me to my current idea, even if not directly. So, we have this nice solution, that your database likes, but we have no way of using it in a way we can benefit from it. Making your database happy just isn't enough, but how could be use solution two the right way, when just dealing with SQL? The answer is pretty easy, use a JOIN: SELECT ... FROM book OUTER JOIN book_trans ON (book.id=book_trans.book_id AND book_trans.language='en') WHERE ... Let's explain this, most of it is quite obvious. But why the language inside the JOIN, could we not just use WHERE language='en'? Thats true, if we want to force our result to have a translation in the selected language. If you don't need this (for example because the fields are optional, the the whole translation is optional) you cannot use WHERE. But you can actually just change the query above to use a LEFT INNER JOIN: SELECT ... FROM book LEFT INNER JOIN book_trans ON (book.id=book_trans.book_id AND book_trans.language='en') WHERE ... Not you get a result, even if there is no row for the translation in book_trans. I think this is something model translations needs to support, solution two kind of supports this, too. Now lets digg into what pluggable-model-i18n does, as we want to support third-party apps, right? Why not just _remove_ the fields from the original model and dynamically create a new translation-model? This way you could change your database-layout if and only if needed, while keeping defining models easy. I think of some kind of translation.register() like pluggable-model-i18n uses. One example: class SomeObj(models.Model): foo = models.CharField(...) bar = models.CharField(...) class SomeObjTranslation(translation.ModelTranslation): class Meta: fields = ('foo',) translation.register(SomeObj, SomeObjTranslation) could be converted to: class SomeObj(models.Model): bar = models.CharField(...) class SomeObjTranslation(models.Model): language = models.CharField(max_length=5, \ choices=settings.LANGUAGES, \ default=settings.LANGUAGE_CODE) object = models.ForeignKey(SomeObj, related_name='translations') foo = models.CharField(...) class Meta: # only one translation per language and object unique_together = (('language', 'object'),) This was you can change third-party apps without needing the developers to even _care_ about translations, while still keeping the news models pretty easy. Of course you have to provide some kind of convert-script that manages to copy all values from the old table to the new one, but that should not be a big problem. No lets look at what usage should look like. As we want third-party apps to keep working no fields should be "renamed" ('field' -> 'translations__field'). In addition I think a query normally should get one translation of your object, this could be some DEFAULT_LANGUAGE or the request language by default. The model-object itself needs to allow transparent access to the fields, too. As obj.title will be translated you could say, your object represents one language at a time. I would suggest adding obj.switch_language('en') to load and transparently replace alle attributes (while keeping the old ones in some cache). Saving the objects must of course save all translations, too. Now you might notice, that this sounds familiar. Yes, model inheritance works similar. We have fields, that transparently are rewritten to the right table on queries, save() UPDATEs many tables and attributes just live in one object (obj.parent_field, instead of obj.parent.parent_field). If you look at this right, you will see, that the proposed translations are something like models using "reverse inheritance", meaning behavior is like with inheritance, but the semantics are reverse. The biggest difference is the changed JOIN, but django should provide for most of the technics for this, even they need to be enhanced. So, what about the other stuff django model translation should provide: a) fast: Only one JOIN involved, as you only need one language most of the time. Otherwise its like solution two. b) third-party-apps: Work like a charm, no fields changed and - because of some default language in the query - only return objects that are in the current site language. c) transparent: If done like inheritance, this should be like inheritance, so perfectly transparent. b) missing translation: Supported by using LEFT JOIN. e) searchable: Like inheritance. f) extendable: Like normal models, south, django-evolution or similar perhaps needed. g) convert: Script needed for this, like sync_transmeta_db does. h) keep context: Relations need to mind obj.language, should definitely be possible. i) optional: If you don't use translation.register() no translation is done, not even the table is created. j) generic: If you have some use-case I'm missing tell me. k) central fields: slug can be translated, access is simple as fwhen using inheritance. l) all translations: Just leave the "AND xxx.language='yy'" out of the JOIN and you get every translation. Similar to using Book.objects.all() with my approach. m) admin: Like solution two, I think people have come up with something here. I still like the idea of viewing every possible translation ans being able to edit this like one distinct object. But there might be better solutions. I have attached some sample usage example, perhaps this gives you some more detail on the API I suggest. I have looked into the code and think implementing this should be possible, but needs some changes in the django-ORM itself. If should be possible to implement this creating some TranslationQuery-object, but you would have to copy many code to keep behavior in sync with the normal Query. If you read down until here, thank you. I know this is a lot of text (hey, it only took me about 2 hours to write this down, after thinking about a solution for the last weeks). I would like to get some input on this topic, about what you think model translations could look like. Marc, I don't know if you have some proposal of your own. Perhaps we can share ideas and even start implementing this together. I am willing to spend some time with this topic, because I need some solution flexible enough (aka "fits my needs") for a client. Additionally I think django would very much benefit from a official solution on this topic. Greetings, David Danier --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---
=> Just some basic idea => Needs to rewrite things down till the django Query objects, so implementing this is non-trivial >>> class SomeObj(models.Model): >>> foo = models.CharField(...) >>> bar = models.CharField(...) >>> >>> class SomeObjTranslation(translation.ModelTranslation): >>> class Meta: >>> fields = ('foo',) >>> # use INNER or LEFT JOIN (LEFT JOIN means you get None back, >>> # if selected translation does not exist, INNER JOIN means >>> # objects without a translation get skipped) >>> allow_empty = True >>> >>> translation.register(SomeObj, SomeObjTranslation) => Removes field 'foo' from SomeObj => Creates a new class with this field (+ language + object) => Sets up some useful methods, see below => Better: Change field 'foo' to point to SomeObjTranslation, like model inheritance --> Gets rewritten to: >>> class SomeObj(models.Model): >>> @property # has setter, too of course >>> def foo(self): >>> return self.translations....foo >>> @property >>> def language(self): ... >>> bar = models.CharField(...) >>> >>> class SomeObjTranslation(models.Model): >>> language = models.CharField(max_length=5, choices=settings.LANGUAGES, >>> default=settings.LANGUAGE_CODE) >>> object = models.ForeignKey(SomeObj, related_name='translations') >>> foo = models.CharField(...) >>> class Meta: >>> # only one translation per language and object >>> unique_together = (('language', 'object'),) --> >>> SomeObj.objects.get(pk=1) => SELECT ... FROM someobj LEFT JOIN someobjtranslation ON (someobj.id = someobjtranslation.object_id AND someobjtranslation.language={{ DEFAULT_LANGUAGE }}) WHERE ... => Cache for SomeObjTranslation is filled => DEFAULT_LANGUAGE = request language or settings.LANGUAGE_CODE => LEFT JOIN as allow_empty is True >>> SomeObj.objects.language('en').get(pk=1) => SELECT ... FROM someobj LEFT JOIN someobjtranslation ON (someobj.id = someobjtranslation.object_id AND someobjtranslation.language='en') WHERE ... => You can easily use some other language without needing to think about the SQL => Cache for SomeObjTranslation is filled --> >>> class SomeOtherObj(models.Model): >>> someobj = models.ForeignKey(SomeObj, related_name='other') >>> foo = models.CharField(...) >>> >>> class SomeOtherObjTranslation(translation.ModelTranslation): >>> def some_method(self): ... >>> class Meta: >>> fields = ('foo',) >>> # see above (SomeObjTranslation) >>> allow_empty = False >>> >>> translation.register(SomeOtherObj, SomeOtherObjTranslation) --> Gets rewritten to: >>> class SomeOtherObj(models.Model): >>> someobj = models.ForeignKey(SomeObj, related_name='others') >>> @property # has setter, too of course >>> def foo(self): >>> return self.translations....foo >>> @property >>> def language(self): ... >>> def some_method(self): # all attributes of translation get mirrored here >>> self.translations....some_method() >>> >>> class SomeOtherObjTranslation(models.Model): >>> language = models.CharField(max_length=5, choices=settings.LANGUAGES, >>> default=settings.LANGUAGE_CODE) >>> object = models.ForeignKey(SomeOtherObj, related_name='translations') >>> foo = models.CharField(...) >>> def some_method(self): ... # like when you define this on normal models >>> class Meta: >>> # only one translation per language and object >>> unique_together = (('language', 'object'),) --> >>> obj = SomeObj.objects.get(pk=1) >>> obj.others => SELECT ... FROM someotherobj OUTER JOIN someotherobjtranslation ON (someotherobj.id = someotherobjtranslation.object_id AND someotherobjtranslation.language={{ obj.language }}) WHERE ... => Language "context" of obj is used => OUTER JOIN as allow_empty is False, so this query may raise SomeOtherObj.DoesNotExist on get(), even if this works for other languages >>> obj = SomeObj.objects.get(pk=1) >>> obj.others.language('en') => SELECT ... FROM someotherobj OUTER JOIN someotherobjtranslation ON (someotherobj.id = someotherobjtranslation.object_id AND someotherobjtranslation.language='en') WHERE ... => You can overwrite the language used on related managers, too >>> obj.switch_language('en') => SELECT ... FROM someobjtranslation WHERE someobjtranslation.object_id={{ obj.pk }} AND someobjtranslation.language='en' => Fill translation cache with 'en' (already loaded translation gets _not_ overwritten) >>> obj.others => SELECT ... FROM someotherobj LEFT JOIN someotherobjtranslation ON (someotherobj.id = someotherobjtranslation.object_id AND someotherobjtranslation.language={{ obj.language }}) WHERE ... => context is used again, but changed to 'en' (query construction is the same) --> >>> SomeObj.objects.get(foo='bar') => SELECT ... FROM someobj LEFT JOIN someobjtranslation ON (someobj.id = someobjtranslation.object_id AND someobjtranslation.language={{ DEFAULT_LANGUAGE }}) WHERE ... AND someobjtranslation.foo='bar' => Does select right table without needing to worry about where the field comes from >>> SomeObj.objects.language('en').get(foo='bar') => SELECT ... FROM someobj LEFT JOIN someobjtranslation ON (someobj.id = someobjtranslation.object_id AND someobjtranslation.language='en') WHERE ... AND someobjtranslation.foo='bar' => You can use normal queries without needing to worry about anything, even language switching works --> >>> obj = SomeObj.objects.get(pk=1) >>> obj.save() => Does save SomeObj _and_ SomeObjTranslation --> >>> class BrokenObj(models.Model): >>> foo = models.CharField(...) >>> bar = models.CharField(...) >>> class Meta: >>> unique_together = (('foo', 'bar'),) >>> >>> class BrokenObjTranslation(translation.ModelTranslation): >>> class Meta: >>> fields = ('foo',) >>> >>> translation.register(BrokenObj, BrokenObjTranslation) => Fails as unique_together will not work anymore => There may be other places where splitting up the model produces side effects