Hi Stephan, On Friday, 2010-04-23 12:05:21 +0200, Stephan Bergmann wrote: > On 04/22/10 17:03, Eike Rathke wrote: >> As a quick solution I'd come up with: > > This scares me somewhat. ;) Pragmatic implications that apparently > lead to this proposal aside, why reserve a Variant subfield for "Unix > locale string" (whatever that is; e.g., SUSv3 is extremely vague on how > exactly those strings would look; appears what is meant here is how > glibc uses those strings), but not for any other system (other OSs, > Java, ...) that has its own locale identification concept?
Apparently those are needed by the osl process locale stuff, or why would that want to append the Variant field as it is currently used? I also don't know in what cases an rtl_Locale passed down would actually have content in the Variant field, other than when it was parsed in osl context. If we guarantee that a cssl.Locale never gets down to rtl_Locale to be used in _compose_locale() then I'm fine to drop that Unix variant stuff from cssl.Locale.Variant. > Also, com.sun.star.Locale and rtl_Locale both require Language to be > two-letter ISO-639 codes. rtl_Locale currently technically does (to be changed i111019), cssl.Locale does only in its semantics given in the IDL description. There's no technical reason to not allow RFC 3066 content. Btw, we're violating that IDL description's semantics since years as we allow ISO 639-2/3 3 letter codes. Time to update the IDL comment.. > Why, violate this by allowing it to be > "x-bcp47", instead of violating it by allowing it to contain full BCP 47 > <language> syntax (which would be more direct)? Using Language for anything else than something that looks like a RFC 3066 Primary-subtag calls for trouble. The original Java Locale was designed after RFC 1766. "x-bcp47" follows RFC 1766. However, instead of using "x-bcp47" we could also use for example "qbp", which is a ISO 639-2/3 code reserved for local use, for a shortest and maybe cleanest as possible form. > Statements like "using such a Locale with Java might lead to > unpredictable results" I added that because I don't know. Java is picky in many areas. Having a BCP47 string in the Language field will lead to trouble with a 90% chance, I guess. > and "there is one combination of language of 'az' > and country of 'cyrillic' which has escaped out into the file format as > fo:language='az' fo:country='cyrillic', so az-cyrillic would have to be > accepted in addition though it's not valid BCP-47" Accepted while reading documents, which doesn't mean we wouldn't remap it to a sane value. > for me show the crux of the problem: Fixing the semantics of > com.sun.star.Locale and rtl_Locale (and fixing them is obviously > necessary), you obviously cannot directly use them in areas that use > other concepts (ODF, Java, glibc, ...). There need to be translation > steps. Which for ODF and ICU I'd address with a Bcp47 class. For glibc we may get away with the old rtl_Locale approach (enhanced to allow 3 character ISO 639 codes) if we get to know usages. But, there is no translation step to Java or any other language or existing external extension that uses UNO. That's the only reason I try to stay as compatible as possible. Otherwise I'd say to completely remove cssl.Locale from the API and use cssu.Locale that follows the OpenJDK design or some such. > The most sane > decision to fix the semantics of com.sun.star.Locale and rtl_Locale > indeed appears to be to specify that they encode BCP 47 <Language-Tag>s. > If this makes certain translation steps lossy, then maybe its > acceptable that they are lossy? I can't say what impact a lossy _compose_locale() would have. For services that do not expect the new usage it would be lossy anyway, even if Language and Country are preserved, the script and variant would not be considered. > Trying to make them non-lossy by adding > to the semantics of com.sun.star.Locale and rtl_Locale in ad-hoc ways > (see the "Unix locale string" Variant subfield) IMO leads down a > slippery slope. I'm fine with specifying that Language contains BCP47 up to and including the script subtag, Country contains the region, and Variant contains the rest. Would ease things a lot. If you tell me we drop interoperability with Java and existing extensions for the cases it matters, if at all, but maybe we'll never know. Or drop cssl.Locale and start fresh, my favorite. Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. SunSign 0x87F8D412 : 2F58 5236 DB02 F335 8304 7D6C 65C9 F9B5 87F8 D412 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the [email protected] account, which I use for mailing lists only and don't read from outside Sun. Use [email protected] Thanks.
pgpGPRN5Hxmbc.pgp
Description: PGP signature
