Hi Martin, On Monday, 2009-08-17 22:06:07 +0200, Martin Rosenau wrote:
> Many developers are adding new languages to OpenOffice. Who does so? Addition of new languages should be coordinated, i.e. by me, to prevent possible mis-assignments, especially if languages/locales are involved that have no LCID/LangID assigned by Microsoft, because we have to assign one of the user space that follows a certain schema and identical values must not be assigned to different locales. This is important in case documents are saved in MS file formats. > Last week I submitted bug #104249: By opening an .odt file containing > such a language with an "unpatched" OpenOffice version language > information gets lost. See my comment there. > To solve this problem I suggest the following behavior: > > I didn't have a look at the code, yet, but I assume it looks like this: > > const struct _languages { > const char *iso_code; /* ISO language code */ > int microsoft_code; /* Microsoft Office language code */ > const char *name; /* User visible name */ > } languages[]={ > ... > {"de-DE",0x407,"Deutsch"}, > ... > }; Well, theoretically yes, but practically no. There exists a mapping between ISO codes and LangIDs, yes, but the UI visible names are mapped in from the localized string resources. See i18npool/source/isolang/isolang.cxx and svtools/source/misc/langtab.cxx > I propose to use a variable-size array that is initialized from a > fixed-size array containing the "built-in" languages. Additional > languages may be loaded from a configuration file. > > This makes adding new languages that are used only by few people easier > because only the configuration file must be modified (which may be done > by a macro or a GUI). How should these people choose the proper MS-LangID? > If you open an .odt file that contains unknown languages code (e.g. > "ay-PE" and "quz-PE" that I used for testing) an invalid Microsoft > language ID (e.g. 0x7FF, 0x7FE, ...) should be chosen and > the following entries should be added temporarily (until OpenOffice is > closed) to the list: > {"ay-PE",0x7FF,"Unknown (ay-PE)"} > {"quz-PE",0x7FE,"Unknown (quz-PE)"} This does not work, because the core needs unique LangIDs to work with. We could temporarily assign unused IDs of the user space though, and generate the UI string for the language list from the ISO codes. Yet that would not allow to permanently store the ID in an MS file format document, the assigned values would had to be stored in the configuration as well, must not be changed by the user and would have to be exchanged with other users who want to open such files. I could imagine some extension though that would handle the configuration part. > By opening an MS Office document containing an unknown ID (e.g. 0xABCD) > an invalid ISO code (e.g. "x01-XX") should be chosen and following entry > should be created: > {"x01-XX",0xABCD,"Unknown (0xABCD)"} You would not be able to save the document as a proper ODF file then, because in ODF only valid ISO codes are allowed. Plus, in future as of ODF 1.2, valid RFC4646 language tags. Anyway, this may be handled semi automatically, but the user would have to provide the correct ISO codes. > This should also be done when the configuration file contains only the > ISO code or only the MS code for one language. As said, automatically it would work only as long as the document isn't saved in a different format. > - Then language information is not lost when opening a document > containing unknown languages, > - in the "Format -> Character -> Language" field the language is > displayed ("unknown xxx" instead of "none"), > - any user can add a rare minority language as long as he knows the ISO > code by changing the configuration file. The problem with this approach is, entries should not be changed after documents using the data were written. And configuration must be kept in sync between different users. Users would not be allowed to assign ISO/ID pairs with different values, which in the case of automatically assigned LangIDs is more a problem for MS file formats than the ISO codes for ODF files, that at least have a defined value. Another obstacle comes with updates of built-in known ISO/ID pairs that clash with user assigned entries. For MS file formats then two different IDs would be in the wild, same if the user made an error when assigning ISO codes, two codes would designate the same language. Currently the proper way to handle this is to create an RFE issue for addition of the new language, usually it will be available with the next minor release. An exemption are dialects and variants that can't be supported yet. See http://wiki.services.openoffice.org/wiki/Adding_a_new_language_or_locale For the future I could imagine some web service to request new ISO/ID assignments and update new known entries. > - However by converting MS Office or RTF to OpenOffice and vice versa > you'll get bad language codes. (Maybe the export filters could be > adapted to write "no language" in the case of conversion.) ISO 639 code 'zxx', should already be written for language "None". But this is no solution to the problem. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. SunSign 0x87F8D412 : 2F58 5236 DB02 F335 8304 7D6C 65C9 F9B5 87F8 D412 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the e...@sun.com account, which I use for mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks.
pgppn1H2xZdz2.pgp
Description: PGP signature