Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Thorsten Glaser
Russ Allbery dixit:

I agree with others in this thread that having a UTF-8 locale without the
collation changes implied by en_US is very useful for various software
packages such as automated test suites that want reproducible results and
were originally written for the C locale.

Same for testsuites that are written for UTF-8 but don’t care about
anything other than LC_CTYPE. And for people to whom en_US.UTF-8 is
too fat or “politically incorrect” (though the latter is usually be
fixed by en_GB.UTF-8 which has metric and ISO A4 paper) and others,
like apparently Hurd.

To me, strictly spoken, it doesn’t matter which one as long as there
is one, for the mksh testsuite, but as user, being able to run a
command with 'env LC_ALL=C.UTF-8 foo' on a “hostile” system (e.g.
my cow-orkers insist on installing systems in German *shudder*)
simply rocks.

If nobody beats me, I’ll digest-and-write-a-proposal as suggested.

bye,
//mirabilos
-- 
I believe no one can invent an algorithm. One just happens to hit upon it
when God enlightens him. Or only God invents algorithms, we merely copy them.
If you don't believe in God, just consider God as Nature if you won't deny
existence.  -- Coywolf Qi Hunt



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/pine.bsm.4.64l.1009031259050.1...@herc.mirbsd.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Giacomo A. Catenazzi

On 03.09.2010 01:46, Russ Allbery wrote:

Samuel Thibaultsthiba...@debian.org  writes:


Well, it's mostly



- some people saying it's useless,
- while other people saying I need it,



and also



- en_US.UTF-8 is just fine vs.
- en_US.UTF-8 sucks, we really need C.UTF-8 instead



without any convergence.


I think the way to get past that is to make a specific proposal.

With my Lintian maintainer hat on, I need a UTF-8 locale that's guaranteed
to always be available.  Right now, we're doing something complicated and
annoying (and fragile on Ubuntu) to generate one on the fly (en_US.UTF-8
just because it's probably always there), and we would love to stop doing
that.

I agree with others in this thread that having a UTF-8 locale without the
collation changes implied by en_US is very useful for various software
packages such as automated test suites that want reproducible results and
were originally written for the C locale.


BTW I think we should wait some more time. Last week I was on 
debian-glibc list a bug: printf fails if it find an invalid UTF-8

character (when the locale uses UTF-8). Note it is allowed in POSIX,
which distinguish raw strings and parts which uses locale definitions.
So I don't think a C.UTF-8 is safe.
But a good release goal for squeeze+1.

ciao
cate



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c80f797.5050...@debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Samuel Thibault
Thorsten Glaser, le Fri 03 Sep 2010 13:02:31 +, a écrit :
 Russ Allbery dixit:
 I agree with others in this thread that having a UTF-8 locale without the
 collation changes implied by en_US is very useful for various software
 packages such as automated test suites that want reproducible results and
 were originally written for the C locale.
 
 Same for testsuites that are written for UTF-8 but don’t care about
 anything other than LC_CTYPE.

A sequence of remarks here: one could think that it'd be just enough to
unset LC_ALL and set LC_CTYPE to achieve the same.  However, even
LC_CTYPE has differences between locales, transliterations notably.  For
the transliterations alone we'd probably better go with a stable C.UTF-8
which doesn't depend on transliteration fixes in whichever locale would
be chosen to provide a UTF-8 variant.

 If nobody beats me, I’ll digest-and-write-a-proposal as suggested.

I'd say go on :)
(of course we'll need to wait for libc to provide the locale
(post-squeeze I guess) before changing the policy).

Samuel



-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100903134313.gl5...@const.bordeaux.inria.fr



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Samuel Thibault
Giacomo A. Catenazzi, le Fri 03 Sep 2010 15:26:47 +0200, a écrit :
 BTW I think we should wait some more time. Last week I was on 
 debian-glibc list a bug: printf fails if it find an invalid UTF-8
 character (when the locale uses UTF-8). Note it is allowed in POSIX,
 which distinguish raw strings and parts which uses locale definitions.
 So I don't think a C.UTF-8 is safe.

It's not safe as a system default, yes.  But we're not talking about
making the system default a UTF-8 locale.  We're talking about providing
one for those packages which need it.  Such package should know what
they are doing already, and should probably actually prefer to get such
error properly.

 But a good release goal for squeeze+1.

I wasn't planning to push it for Squeeze actually, unless glibc people
think it's ok to add it.

Samuel



-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100903134543.gm5...@const.bordeaux.inria.fr



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Roger Leigh
On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote:
 Russ Allbery, le Thu 02 Sep 2010 16:24:56 -0700, a écrit :
  Generally what that means is that someone needs to digest the discussion
  in the thread
 
 Well, it's mostly
 
 - some people saying it's useless,
 - while other people saying I need it,
 
 and also
 
 - en_US.UTF-8 is just fine vs.
 - en_US.UTF-8 sucks, we really need C.UTF-8 instead
 
 without any convergence.

I think reading back through the entire log, people who were initially
rather opposed to the proposal did come around once they appreciated
exactly what the changes would be, and why they were needed.  The
conversation was mostly constructive bar some initial misunderstandings
about what the changes actually meant--it did flesh out some of the
issues WRT standards conformance and what might break if the default
was changed, but this bug isn't really about the default, it's about
having a standard UTF-8 locale available.

Andrew Macmillan's message in
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776#167
is a rather good look at a summary of the issues and the
big picture behind the motives for changing.

Introducing a C.UTF-8 is a trivial change to make and does not
impact any existing software.  It doesn't mandate a specific
national locale, nor does it alter the existing C locale.  To quote:

The proposal, at this stage is only that the C.UTF-8 locale is
*installed* and *available* by default.  Not that it *be* the default,
but that it *be there* as a default. People will naturally continue to
be free to uninstall it, or to leave their locale to 'C'.

There were no objections to having a UTF-8 locale installed and
available by default, just to it *being* the default.  Taking this
first small step is IMO important to do, preferably for squeeze if
possible.  Since it's a tiny one-liner change, this should be no
trouble in getting this done.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature


Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Thorsten Glaser
Samuel Thibault dixit:

believe that's something that shouldn't break Squeeze at all.

I also believe it cannot possibly do that.

bye,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/pine.bsm.4.64l.1009031621090.1...@herc.mirbsd.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Aurelien Jarno
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
 Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
  On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote:
   without any convergence.
  
  I think reading back through the entire log,
 
 Thanks for having done it!
 
  people who were initially
  rather opposed to the proposal did come around once they appreciated
  exactly what the changes would be, and why they were needed.
 
 Ok.  There was still a question of en_US.UTF-8 vs C.UTF-8, but I believe
 the en_US.UTF-8 is fine enough argument doesn't hold any more since
 some other people say that it isn't for them.
 
  There were no objections to having a UTF-8 locale installed and
  available by default, just to it *being* the default.  Taking this
  first small step is IMO important to do, preferably for squeeze if
  possible.  Since it's a tiny one-liner change, this should be no
  trouble in getting this done.
 
 I believe so too, I just didn't want to push it too much, but yes, I
 believe that's something that shouldn't break Squeeze at all.
 

That's not something allowed anymore at this period of the freeze, you
will have to get an exception from the release team first.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100903171640.gb26...@hall.aurel32.net



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Russ Allbery
Ben Finney ben+deb...@benfinney.id.au writes:

 Would a less confusing way to make this distinction be to say something
 like: “The minimal Debian installation must have a locale available that
 uses the UTF-8 character encoding.”?

The other angle here is that it can't just be any UTF-8 locale, since that
isn't very helpful to software that needs to choose a UTF-8 locale on an
automated basis.  Lintian, for example, just needs *some* locale that's
UTF-8, but I don't want to have to try en_US.UTF-8 and then fr.UTF-8 and
then pt_BR.UTF-8 and then

I think we need to explicitly require a *specific* UTF-8 locale be
available.  C.UTF-8 has a lot of appeal since it's the minimal UTF-8
locale and it doesn't get into issues of favoring one particular language
and its corresponding collation rules, etc.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87tym6wv4d@windlord.stanford.edu



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2010-09-03 Thread Samuel Thibault
Aurelien Jarno, le Fri 03 Sep 2010 19:16:40 +0200, a écrit :
 On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
  Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
   There were no objections to having a UTF-8 locale installed and
   available by default, just to it *being* the default.  Taking this
   first small step is IMO important to do, preferably for squeeze if
   possible.  Since it's a tiny one-liner change, this should be no
   trouble in getting this done.
  
  I believe so too, I just didn't want to push it too much, but yes, I
  believe that's something that shouldn't break Squeeze at all.
 
 That's not something allowed anymore at this period of the freeze, you
 will have to get an exception from the release team first.

Ok.  I don't feel any urgency so I won't ask for it myself.

Samuel



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100903223707.ga5...@const