Hi Mikhail
I agree totally with Eike, UTF-8 is the way to go.
As far as I know most or all other platforms (i.e. solaris , glibc) use
UTF-8 for all locales.
A good example of the advantage of this is what happened when the Euro
switchover happened. ISO8859-1 doesn't have the Euro symbol so any
platform using theis charset had to change to one which did.
UTF-8 future proofs us , it also makes maintenance easier as Eike said.
Peter
Eike Rathke ha scritto:
Hi Mikhail,
On Wed, Feb 02, 2005 at 12:26:15 -0500, Mikhail Teterin wrote:
[Dear developers! This is my conversation with Eike regarding an encoding
used
for the translation files in OOo.
To clarify: this was about i18npool's *.xml locale data files, not
resource files.
I'm advocating the use of 8-bit native
charsets, while Eike insists on using UTF-8 for all. Eike suggested, I take
this to your list.]
So, the same for computers, but harder for people. Sounds like my way is
better. UTF-8 only makes sense, when charsets need to be mixed -- not in
this case.
Changing encodings would also make use of ref=... references harder,
one would always have to check that encodings match, and changing
encoding of one file might affect others, which is not a desirable
situation.
Sorry, I don't understand this. Can you explain?
The locale data files use a ref=... mechanism to refer data of other
locales, for example the gl_ES.xml contains
LC_CTYPE ref=es_ES/
LC_COLLATION ref=en_US/
LC_SEARCH ref=en_US/
LC_INDEX ref=es_ES/
LC_CURRENCY ref=es_ES/
LC_TRANSLITERATION ref=en_US/
LC_NumberingLevel ref=en_US/
LC_OutLineNumberingLevel ref=en_US/
Now if gl_ES.xml and en_US.xml or es_ES.xml used different encodings
this might not work anymore if also replaceTo=... was used (it isn't
in the case of gl_ES) and the maintainer copied an encoded
replaceFrom=... value from the referred file without noticing it was
a different encoding. This may sound hypothetical but it is possible and
can be prevented by sticking to one encoding only.
The uniformity here is hardly advantageous -- these files are, by their
very nature, maintained by different people,
which in itself, viewed in context of ref=... uses, almost forbids any
other encoding than UTF-8
Why? Western Europeans will use iso8859-1, Eastern -- some KOI8 derivative,
etc. They will almost never need to cooperate -- within one file
Yes, almost never. Which makes it a perfect candidate for always be
prepared for it.
Installation of the GNU recode package should be always possible, even
on the oldest machine.
Everything is possible, of course. I maintain, that gratuitious use of UTF is
inelegant -- if the file format allows to stick to 8-bit encodings, using a
multibyte one is wrong.
Now please take a look at my situation as a maintainer of all these
files, if I would have to switch back and forth between encodings for
each and every file I edit it would soon annoy me.
If I can not `vi' it, it ain't a text-file :-)
Use vim, that handles utf-8 ;-)
Eike
P.S.: Please consider to subscribe to the mailing lists you're posting to.
By doing so you won't miss replies that are directed to the list only.
Please reply only to the list, not to my personal account. Thanks.
--
Peter Nugent,
Software Engineer,
Sun Microsystems Ireland Ltd,
Hamilton House,
East Point Business Park,
Dublin 3,
Ireland.
Tel +353.1.8199522
Email: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]