On Thu, 2010-04-22 at 17:03 +0200, Eike Rathke wrote: > Actually the aImplIsoNoneStdLangEntries can never be the result of > a conversion as valid ISO code combinations exist for all LangIDs in > aImplIsoLangEntries. The corresponding code in both > MsLangId::convertLanguageToIsoNames() methods is moot. I don't recall if > it was ever used that way since we write XML, but I doubt it.
> Accepting makes of course sense, but a conversion will always result > in the corresponding ISO codes. Set some text to Azeri (Cyrillic) in writer with 3.2 and save as .odt, the result is <style:text-properties fo:language="az" fo:country="cyrillic"/> > > The need for conversion from a Unix locale string in rtl to a > > rtl_Locale and back again is what bothers me, e.g. de_BE.Euctw > > which currently will give > > rtl_Locale of... > > Language = de > > Country = BE > > Variant = Euctw > > > > If a future iso-15924 adds Euctw as a script code, then there's a > > problem. > > They should not, ISO 15924 alpha is defined to be a 4 letter code. > Anyway, a script code in the BCP47 context would have to be registered > with IANA, and they certainly (hopefully..) would reject a non-4-letter > code. Woops, right, I used an invalid 5 letter example. Anyway, checking for 4 letter encodings which plausibly could show up in a Unix locale, take LANG=ja_JP.Sjis as a better example. > > The other consideration is that if you enforce a script code as the > > first tag in a Variant, it becomes trivial to pull out the script tag > > from a Variant string with a two liner without any other processing, > > e.g. > > > > sal_Int32 nIndex = 0; > > rtl::OUString aScriptSubtag = rVariant.getToken(0, '-', nIndex); > > That's indeed neat. But again, see my previous mail, not all BCP47 tags > would fulfill this requirement if they contained extlang subtags. I had sort of imagined something like zh-cmn-Latn-CN would appear as Language = zh-cmn Country = CN Variant = Latn > As a quick solution I'd come up with: > > * Divide Variant into three subfields, separated by ':' colon. > * First subfield is either a 4 letter script code followed by '-', or > only '-' to indicate absence of script. > * This enables the extraction with rVariant.getToken(0, '-', nIndex). > * Second subfield is a full BCP47 string in case Language is "x-bcp47" > or a BCP47 variant is involved, otherwise empty. > * Third subfield is the _full_ Unix locale string, or empty. > * _compose_locale() could extract this with > rVariant.getToken(2, ':', nIndex) > * Variant can be empty. > * Extraction of script code still delivers a null string. > * _compose_locale() in this case will have to concatenate > Language-Country as it currently does. Sounds good. > * If only a BCP47 variant is involved, with or without script, we could > add the variant to the first subfield, having > '-' [script] '-' [variant] > for easier extraction with rVariant.getToken(1, '-', nIndex). Sounds like gilding the lily. Do we really need to easily extract that, and anyway can't there be multiple BCP47 variant tags as opposed to only one script tag ? > And, maybe, using such a Locale with Java might lead to unpredictable > results, I don't know. It would definitely help if anyone knew what on earth the java Variant field ever gets used for. C. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.org For additional commands, e-mail: dev-h...@openoffice.org