Hi Martin,

On Monday, 2009-08-17 22:06:07 +0200, Martin Rosenau wrote:

> Many developers are adding new languages to OpenOffice.

Who does so? Addition of new languages should be coordinated, i.e. by
me, to prevent possible mis-assignments, especially if languages/locales
are involved that have no LCID/LangID assigned by Microsoft, because we
have to assign one of the user space that follows a certain schema and
identical values must not be assigned to different locales. This is
important in case documents are saved in MS file formats.

> Last week I  submitted bug #104249: By opening an .odt file containing
> such a  language with an "unpatched" OpenOffice version language
> information  gets lost.

See my comment there.


> To solve this problem I suggest the following behavior:
>
> I didn't have a look at the code, yet, but I assume it looks like this:
>
> const struct _languages {
>   const char *iso_code; /* ISO language code */
>   int microsoft_code; /* Microsoft Office language code */
>   const char *name; /* User visible name */
> } languages[]={
>   ...
>   {"de-DE",0x407,"Deutsch"},
>   ...
> };

Well, theoretically yes, but practically no. There exists a mapping
between ISO codes and LangIDs, yes, but the UI visible names are mapped
in from the localized string resources. See
i18npool/source/isolang/isolang.cxx and svtools/source/misc/langtab.cxx

> I propose to use a variable-size array that is initialized from a  
> fixed-size array containing the "built-in" languages. Additional  
> languages may be loaded from a configuration file.
>
> This makes adding new languages that are used only by few people easier  
> because only the configuration file must be modified (which may be done  
> by a macro or a GUI).

How should these people choose the proper MS-LangID?

> If you open an .odt file that contains unknown languages code (e.g.
> "ay-PE" and "quz-PE" that I used for testing) an invalid Microsoft
> language ID (e.g. 0x7FF, 0x7FE, ...) should be chosen and
> the following entries should be added temporarily (until OpenOffice is  
> closed) to the list:
>    {"ay-PE",0x7FF,"Unknown (ay-PE)"}
>    {"quz-PE",0x7FE,"Unknown (quz-PE)"}

This does not work, because the core needs unique LangIDs to work with.
We could temporarily assign unused IDs of the user space though, and
generate the UI string for the language list from the ISO codes. Yet
that would not allow to permanently store the ID in an MS file format
document, the assigned values would had to be stored in the
configuration as well, must not be changed by the user and would have to
be exchanged with other users who want to open such files. I could
imagine some extension though that would handle the configuration part.

> By opening an MS Office document containing an unknown ID (e.g. 0xABCD)
> an invalid ISO code (e.g. "x01-XX") should be chosen and following entry  
> should be created:
>    {"x01-XX",0xABCD,"Unknown (0xABCD)"}

You would not be able to save the document as a proper ODF file then,
because in ODF only valid ISO codes are allowed. Plus, in future as of
ODF 1.2, valid RFC4646 language tags.

Anyway, this may be handled semi automatically, but the user would have
to provide the correct ISO codes.

> This should also be done when the configuration file contains only the  
> ISO code or only the MS code for one language.

As said, automatically it would work only as long as the document isn't
saved in a different format.

> - Then language information is not lost when opening a document  
> containing unknown languages,
> - in the "Format -> Character -> Language" field the language is  
> displayed ("unknown xxx" instead of "none"),
> - any user can add a rare minority language as long as he knows the ISO  
> code by changing the configuration file.

The problem with this approach is, entries should not be changed after
documents using the data were written. And configuration must be kept in
sync between different users. Users would not be allowed to assign
ISO/ID pairs with different values, which in the case of automatically
assigned LangIDs is more a problem for MS file formats than the ISO
codes for ODF files, that at least have a defined value.

Another obstacle comes with updates of built-in known ISO/ID pairs that
clash with user assigned entries. For MS file formats then two different
IDs would be in the wild, same if the user made an error when assigning
ISO codes, two codes would designate the same language.

Currently the proper way to handle this is to create an RFE issue for
addition of the new language, usually it will be available with the next
minor release. An exemption are dialects and variants that can't be
supported yet. See
http://wiki.services.openoffice.org/wiki/Adding_a_new_language_or_locale

For the future I could imagine some web service to request new
ISO/ID assignments and update new known entries.

> - However by converting MS Office or RTF to OpenOffice and vice versa  
> you'll get bad language codes. (Maybe the export filters could be  
> adapted to write "no language" in the case of conversion.)

ISO 639 code 'zxx', should already be written for language "None".
But this is no solution to the problem.

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the e...@sun.com account, which I use for
 mailing lists only and don't read from outside Sun. Use er...@sun.com Thanks.

Attachment: pgppn1H2xZdz2.pgp
Description: PGP signature

Reply via email to