Hello all
After I read some more mails and write some comments myself, IMHO it is time to write a newer hopefully better proposal. Not all is new. But I add some new thoughs and some parts from some comments. In this proposal I have combined the decentralized translations, and also the central repository. And this all without a delay in the translator to user path. Not all parts are turned into stone. I need some comments and decision on some parts. Maybe you can help. One quote from a mail from Raphael Hertzog: I find that having translations is far better that having not a single one and refusing to add them because we can't have the perfect solution right now. Add Translations of the Package Description in the Debian Distribution (c) Michael Bramer <[EMAIL PROTECTED]> 1.) use all the time _gettext_! All know gettext and all use this. Why should we use gettext to add the translated description in the debian describution? Because of this. Gettext is *the* technic for translations. All know it, you need not teach a maintainer, you need not teach a user (a important point). If a user already use a system with locale enviroment, he just will have translated descriptions in future. gettext make all the work and gettext is tested (and is useing in many programes). With this you need only some little pachtes. (We show a -9/+30 patch for dselect/dpkg and hopefully a apt patch will it not much bigger.) Gettext show never outdated translation (a big point) and have other nice features (see below). Maybe the release manager will allowing this patch in woody, but this is a other story. If apt and dpkg is patched and the user have a nice .mo file in /usr/share/desc-trans/<locale>/ all output of _all_ package management programs is transled. (dpkg and APT use a patch, other programs (like deity, etc.) use APT) gettext support already fallback languages. See [1] for more informations. If I understand the gettext source code in the right way, the fallback is per message and not per .mo file. With this someone can set LANGUAGE=hu:sl:cz and get a hungarian->slovak->czech->->english fallback path. (If a description is translated in slovak but not in hungarian, the user will see the slovak description.) This is all nice, and we have only one problem. How will the user get a nice .mo file? First on comment on this question: You have this problem all the time with the description. You must download the descriptions and the translations first. Only and after this, you can use (see) it and install the real programs/packages. With the normal (english) Descriptions we use the Packages files (with apt or dselect (the old methodes)) We must use somethink like this with the translations too... 2.) get the .po/.mo files on the system If we will use gettext, we must get one .mo file on the system. The .mo file is generatted from a .po file and it is itself a binary data file. If you have some sources (like ftp.debian.de and a local mirror with own packages) you will have some translations and some .mo/.po files. The best way is, that you download the .po files, merge this files with a tool and make from this one big .po file a .mo file and use this file. (maybe you must only make a 'cat *.po > master.po', I have not test this now, but this is only a technical question and problem) I propose the dir /usr/share/desc-trans/<locale>/desc-trans.d/ to store all .po files. If you make a apt-get update (or a other funktion like this in deity and co), you have (maybe) new and changed description in the apt database. And now you need a newer, better .po file. Because of this, I propose to download the .po like file (see below) with apt by the update process. What is the size of all this? Ok. we have now in sid/main/i386 (see [2]) 7000 Packages and the descriptions of all this packages is 2660993 bytes big. We get a description size per package of 384 bytes. With gzip we will get (maybe) 130 bytes. With this the size on the system is like the Package files from apt. If you have some sources you will have some (5-20) Megabytes in /usr/share/desc-trans/<locale>/desc-trans.d/ and a collect .mo file per language. But the admin of the system must pay this price, if he will see translated descriptions. (and it don't care if we use gettext or a other technic, with gettext we have only the extra .mo file.) But what file should apt download? The first thought is maybe a translated Packages-XX file. But the first thought is not the best way all the time. We have _now_ 316 Packages* (see [3]) files on ftp-master with 141 MByte of size. If we translate this all in (only) 10 languages we need 1,4 GByte. With more Packages and more Languages more and more. Ok, harddisk are cheap, but not free. This is not the right way. In a Packages file is not only the Description. You know, it include all other tags from the control file. If we delete this tags and put only the Description in one file and make Descriptions-XX files, we save 50% of size. And if we save one Description-XX file per dist and not per arch, we save more. With this we need only 30 Descriptions files per languages [4]. This should only 14 MByte per languages (if all descripions are translated). This files have only the package name and the translated Description (and maybe the Version) in it. The APT process can generate some .po files from the normal Packages file and all downloades Descriptions files. If we don't like this process on the client all the time, we can produce Descriptions-XX.po files and the clinet must only download this file and save this in the right dir. But this file will include the orignal description and with this it has the double size and download time. With the Descriptions-XX[.po] file the admin must only download the needed languages and not all languages. As the first step (and litle hack), we can produce desc-trans-XX.deb with only the .po file. A user can download this file, install it, and have translated description. If we have patch katie etc. and we have the Descriptions files in all the mirrors, we don't need this deb and can remove this from the archiv. 4.) How get katie (or the desc-trans-XX.deb) the translation? Katie get the translation from the deb package itself (see next point) or from a override file as fallback. The ddts (Debian Description Translation Server) can produce the override file. Normal the translator get the untranslated description from this server and send the translation to this server. The server make the whole work. If a description changed, it send mails to the translator of this translation, send new descritions to the translator and send notifications to the maintainer. The maintainer has a veto and can remove a translation from the ddts db. He can send improvements to the translator, etc. He is not out of the loop. He only outsource the translations to the ddtp. If a maintainer don't like the ddtp, he can translated the description hisself, find own translators etc. This is not a real problem. The ddtp is only a service for the maintainer and prevent work on this site. 5.) translated descriptions in the package. Now, this is the difficult part. We need a way to add the translated description in the normal package. In the last mails, we see some proposals. In privat packages or if the maintainer know some langauges and make the translation hisself, it is a good way to include the translation in the package. I'm not convinced that this is a ok in the normal debian archiv. I see only one problem: the size. We have now 80446 .deb packages and 7643 source packages in the debian archiv on ftp-master. If we include the translation in the deb, we must store this in the source and in every deb package. check this calculation: If in all sources are only one desription with 130 (geziped) bytes of description we get 1 MByte per languages. If we use po files in the source (see below), we get 2 MBytes per languages And all deb packages have only one description with 130 (geziped) bytes. This make 10 MByte per languages. If we store the description as po file, we will use 20 MByte per languges. 11/22 MByte per languages, with only 10 languages we will get 110/220 MBytes. With more Packages, ports, languages, this will grow. This bytes must all be downloaded, uploaded and synced with the time. And on the local system the descriptions and the translations of all languages from the package will stored on the local harddisk (without gzip). Count: With 10 languages, 1000 installed Packages and 380 Bytes per description and per translation you get additional 4/8MBytes on the local disk. Is this all usefull in a 'normal' deb package from the debian project? Maybe yes. We must decide this. (I personal don't find the real pro about this. But we can add it and I don't have a real problem with this. I see only the size problem, and this is not a big problem.) In all the cases I propose: store the description in the source as .po file in the /debian/ dir (one per languages). This is the only real good way to store the translations. (no encodeing problem, no outdated text, no debconf-mergetemplate hack, ...) But how get the maintainer the translation? We have some cases: - The maintainer translate the description hisself - He find some own translator (like now with debconf) - He use the ddtp - He can ask the ddts and get all translations of the package - He can use the override file of katie - He use the notification mails from the ddts (In future the server will use the decided format in this mail. With this, the maintaner must only copy this file in the source.) Now the technique part: The proposal with the biggest patch, is the 'put the translation in a own element in the deb ar'. Maybe this is nice and feasible. But this is not a fast way. Because of this I propose some solutions: 1.) (very fast) put the translation as normal .po file in the /usr/share/desc-trans/<locale>/desc-trans.d/ dir. finish. This don't need some extra work on dpkg etc. 2.) Put the translation in the control.tar.gz of the deb. Maybe as desc-trans.tar.gz with all translation. We can put this as real po file or as description file (without orignal description). dpkg --info can use this and show all included and translated descriptions. If the package only include the translated description (and no po file), a gettext like process must assure, that no outdates translation will include in the package! While the package installation dpkg should move this files to the /usr/share/desc-trans/<locale>/desc-trans.d/ dir. (If the translation is not in the po file format, dpkg generate a po file from the translation and the orignal description) 3.) (the long way, if possible) Add the desc-trans.tar.gz in the deb ar as a own new element. The other points are like 2.). But this has the big feature, that some process on ftp-master can edit the .deb on the fly and change and/or add some translations. Maybe this has some other problems. All the time we should use a dh_*-script. With this we can start with 1.) and can switch to 2.) or 3.) later. And maybe this script can get the translation from some source itself. 6.) Transition to a debian with translations - We have the first translations and the first step is a newer, patched dpkg and apt. Please can we have the opinion of Wichert and Jason for dpkg and apt about the use of gettext for the translation of the description?! - The next step is a decision of the format in the deb file. - The last step is the download of the translated description with apt by the update process and the patch of katie to produce the Description or Description.po files. Maybe we get the first step with woody and the others with woody+1. Appendix [1] from the ABOUT-NLS from gettext source: ... Not all programs have translations for all languages. By default, an English message is shown in place of a nonexistent translation. If you understand other languages, you can set up a priority list of languages. This is done through a different environment variable, called `LANGUAGE'. GNU `gettext' gives preference to `LANGUAGE' over `LANG' for the purpose of message handling, but you still need to have `LANG' set to the primary language; this is required by other parts of the system libraries. For example, some Swedish users who would rather read translations in German than English for when Swedish is not available, set `LANGUAGE' to `sv:de' while leaving `LANG' to `sv_SE'. In the `LANGUAGE' environment variable, but not in the `LANG' environment variable, `LL_CC' combinations can be abbreviated as `LL' to denote the language's main dialect. For example, `de' is equivalent to `de_DE' (German as spoken in Germany), and `pt' to `pt_PT' (Portuguese as spoken in Portugal) in this context. ... [2] [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|grep ^Descrip|wc 6922 48709 372777 [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|wc 50806 406596 2660993 $ bc -l 2660993/6922 384.42545507078878936723 [3] [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp$ find -name "Package*"|wc 316 316 15774 [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp$ find -name "Package*"|xargs cat|wc 3266668 14826546 148475135 [4] unstable/main /contrib /non-free frozen/main /contrib /non-free frozen-proposed-updates/main /contrib /non-free stable/main /contrib /non-free stable-proposed-updates/main /contrib /non-free [5] [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp$ find -name "*deb" -type f|wc 80446 80446 4308241 [EMAIL PROTECTED]:/org/ftp-master.debian.org/ftp$ find -name "*tar.gz" -type f|wc 7643 7643 414031 Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Like sex in high school, everyone's talking about Linux, but is anyone doing it?" -- Computer Currents
pgpXCDUK5cIlH.pgp
Description: PGP signature