Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

Thorsten Glaser Wed, 08 Apr 2009 10:54:38 -0700

Giacomo A. Catenazzi dixit:

> The locale C is already a UTF-8 compatible locale.


It is UTF-8 transparent but that's its pro and con.
It does not tell the system that UTF-8 encoding is to be used.
It basically says the encoding is none/unknown.


> Why build need to depend to a locale?
[...]
> For testing? So why not test various locales (UTF-8, but also other non
> ascii based encodings)

> to go to the point: what is the problem in mksh?
> At which level it fails?
[...]
> But if mksh don't work on "C", I'm very worried.
> The problems are on inputs or on outputs?

I think you misunderstand the mksh part of the problem.

mksh has two modi: a legacy mode, in which it does not make any
assumptions about charsets or encodings and is 8-bit clean and
mostly 8-bit transparent, safe a few mostly past bugs and imple-
mentation shortcomings, and a unicode mode, in which it assumes
its input is UTF-8 (although, with ^V, you can still enter non-
UTF-8 sequences, and tabcomplete filenames in legacy encodings
as well). The unicode mode is enabled with "mksh -U" or "set -U".
However, mksh has a feature which automatically enables the uni-
code mode if
- the current CODESET is UTF-8 (or the locale ends in .utf8 or
  .UTF-8 or something similar, in some cases), or
- the input begins with a UTF-8 BOM.

The regression test suite merely checks for this feature. To do
so, it needs a way to set the checked mksh process' CODESET to
UTF-8, which is only possible by setting a non-C/POSIX locale.


Andrew McMillan dixit:

>The proposal, at this stage is only that the C.UTF-8 locale is
>*installed* and *available* by default.  Not that it *be* the default,
>but that it *be there* as a default.

This is about what I was to propose, indeed.


Andrew McMillan dixit:

>Once this minimum step is made, and we've all calmed down, we can think
>further on radical and dramatic changes over coming years where more
>significant shifts are made, like:
>
>* The default locale at installation is C.UTF-8 rather than C.

That would be nice.

>* If a locale is set which doesn't specify an encoding, the system
>defaults to assuming UTF-8.


Andrew McMillan dixit:

>[...] and indeed Steve
>Langasek has already suggested a seemingly reasonable workaround for the
>immediate problem which was, funnily enough, that mksh wants to have a
>UTF-8 locale *available* in order for it to *test the build*...

Yes, his suggestion and searching for someone to actually use it
(Daniel Jacobowitz does) helped that part of the problem. However,
the mksh regression test suite is only one of the manifestations.
Even as a mere user, I'd like to have, see above, a UTF-8 locale
available and, if possible, default. Well, maybe not a UTF-8 locale,
just UTF-8 encoding (especially when I ssh from a MirBSD system to
a Debian system, since on MirBSD there is *only* UTF-8¹), but glibc
defines encodings exclusively via locales, which is why I'm in fa-
vour of C.UTF-8 for myself, but setting LC_CTYPE only has the same
effect (and I often set LC_MESSAGES to en_GB.UTF-8 for gcc's bene-
fit).


Giacomo A. Catenazzi dixit:

> "Debian will use as default unicode, encoded according to UTF-8", but
> not *assume*.  It is again portability.

I agree too. You cannot simply assume things.

> Let (old) programs to works
> also on the future Debian.

These need to export LC_ALL=C already, since you've been able to
choose a locale in d-i for a while, so no change there.


bye,
//mirabilos
-- 
23:22⎜«mikap:#grml» mirabilos: und dein bootloader ist geil :)
23:29⎜«mikap:#grml» und ich finds saugeil dass ich ein bsd zum booten mit
     ⎜  grml hab, das muss ich dann gleich mal auf usb-stick installieren
-- Michael Prokop von grml.org über MirGRML und MirOS bsd4grml



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

Reply via email to