On 04/22/10 17:03, Eike Rathke wrote:
As a quick solution I'd come up with:* Devide Variant into three subfields, separated by ':' colon. * First subfield is either a 4 letter script code followed by '-', or only '-' to indicate absence of script. * This enables the extraction with rVariant.getToken(0, '-', nIndex). * Second subfield is a full BCP47 string in case Language is "x-bcp47" or a BCP47 variant is involved, otherwise empty. * Third subfield is the _full_ Unix locale string, or empty. * _compose_locale() could extract this with rVariant.getToken(2, ':', nIndex) * Variant can be empty. * Extraction of script code still delivers a null string. * _compose_locale() in this case will have to concatenate Language-Country as it currently does. * If only a BCP47 variant is involved, with or without script, we could add the variant to the first subfield, having '-' [script] '-' [variant] for easier extraction with rVariant.getToken(1, '-', nIndex).
This scares me somewhat. ;) Pragmatic implications that apparently lead to this proposal aside, why reserve a Variant subfield for "Unix locale string" (whatever that is; e.g., SUSv3 is extremely vague on how exactly those strings would look; appears what is meant here is how glibc uses those strings), but not for any other system (other OSs, Java, ...) that has its own locale identification concept?
Also, com.sun.star.Locale and rtl_Locale both require Language to be two-letter ISO-639 codes. Why, violate this by allowing it to be "x-bcp47", instead of violating it by allowing it to contain full BCP 47 <language> syntax (which would be more direct)?
Statements like "using such a Locale with Java might lead to unpredictable results" and "there is one combination of language of 'az' and country of 'cyrillic' which has escaped out into the file format as fo:language='az' fo:country='cyrillic', so az-cyrillic would have to be accepted in addition though it's not valid BCP-47" for me show the crux of the problem: Fixing the semantics of com.sun.star.Locale and rtl_Locale (and fixing them is obviously necessary), you obviously cannot directly use them in areas that use other concepts (ODF, Java, glibc, ...). There need to be translation steps. The most sane decision to fix the semantics of com.sun.star.Locale and rtl_Locale indeed appears to be to specify that they encode BCP 47 <Language-Tag>s. If this makes certain translation steps lossy, then maybe its acceptable that they are lossy? Trying to make them non-lossy by adding to the semantics of com.sun.star.Locale and rtl_Locale in ad-hoc ways (see the "Unix locale string" Variant subfield) IMO leads down a slippery slope.
-Stephan --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
