Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Russ Allbery dixit: I agree with others in this thread that having a UTF-8 locale without the collation changes implied by en_US is very useful for various software packages such as automated test suites that want reproducible results and were originally written for the C locale. Same for testsuites that are written for UTF-8 but don’t care about anything other than LC_CTYPE. And for people to whom en_US.UTF-8 is too fat or “politically incorrect” (though the latter is usually be fixed by en_GB.UTF-8 which has metric and ISO A4 paper) and others, like apparently Hurd. To me, strictly spoken, it doesn’t matter which one as long as there is one, for the mksh testsuite, but as user, being able to run a command with 'env LC_ALL=C.UTF-8 foo' on a “hostile” system (e.g. my cow-orkers insist on installing systems in German *shudder*) simply rocks. If nobody beats me, I’ll digest-and-write-a-proposal as suggested. bye, //mirabilos -- I believe no one can invent an algorithm. One just happens to hit upon it when God enlightens him. Or only God invents algorithms, we merely copy them. If you don't believe in God, just consider God as Nature if you won't deny existence. -- Coywolf Qi Hunt -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/pine.bsm.4.64l.1009031259050.1...@herc.mirbsd.org
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
On 03.09.2010 01:46, Russ Allbery wrote: Samuel Thibaultsthiba...@debian.org writes: Well, it's mostly - some people saying it's useless, - while other people saying I need it, and also - en_US.UTF-8 is just fine vs. - en_US.UTF-8 sucks, we really need C.UTF-8 instead without any convergence. I think the way to get past that is to make a specific proposal. With my Lintian maintainer hat on, I need a UTF-8 locale that's guaranteed to always be available. Right now, we're doing something complicated and annoying (and fragile on Ubuntu) to generate one on the fly (en_US.UTF-8 just because it's probably always there), and we would love to stop doing that. I agree with others in this thread that having a UTF-8 locale without the collation changes implied by en_US is very useful for various software packages such as automated test suites that want reproducible results and were originally written for the C locale. BTW I think we should wait some more time. Last week I was on debian-glibc list a bug: printf fails if it find an invalid UTF-8 character (when the locale uses UTF-8). Note it is allowed in POSIX, which distinguish raw strings and parts which uses locale definitions. So I don't think a C.UTF-8 is safe. But a good release goal for squeeze+1. ciao cate -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4c80f797.5050...@debian.org
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Thorsten Glaser, le Fri 03 Sep 2010 13:02:31 +, a écrit : Russ Allbery dixit: I agree with others in this thread that having a UTF-8 locale without the collation changes implied by en_US is very useful for various software packages such as automated test suites that want reproducible results and were originally written for the C locale. Same for testsuites that are written for UTF-8 but don’t care about anything other than LC_CTYPE. A sequence of remarks here: one could think that it'd be just enough to unset LC_ALL and set LC_CTYPE to achieve the same. However, even LC_CTYPE has differences between locales, transliterations notably. For the transliterations alone we'd probably better go with a stable C.UTF-8 which doesn't depend on transliteration fixes in whichever locale would be chosen to provide a UTF-8 variant. If nobody beats me, I’ll digest-and-write-a-proposal as suggested. I'd say go on :) (of course we'll need to wait for libc to provide the locale (post-squeeze I guess) before changing the policy). Samuel -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100903134313.gl5...@const.bordeaux.inria.fr
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Giacomo A. Catenazzi, le Fri 03 Sep 2010 15:26:47 +0200, a écrit : BTW I think we should wait some more time. Last week I was on debian-glibc list a bug: printf fails if it find an invalid UTF-8 character (when the locale uses UTF-8). Note it is allowed in POSIX, which distinguish raw strings and parts which uses locale definitions. So I don't think a C.UTF-8 is safe. It's not safe as a system default, yes. But we're not talking about making the system default a UTF-8 locale. We're talking about providing one for those packages which need it. Such package should know what they are doing already, and should probably actually prefer to get such error properly. But a good release goal for squeeze+1. I wasn't planning to push it for Squeeze actually, unless glibc people think it's ok to add it. Samuel -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100903134543.gm5...@const.bordeaux.inria.fr
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote: Russ Allbery, le Thu 02 Sep 2010 16:24:56 -0700, a écrit : Generally what that means is that someone needs to digest the discussion in the thread Well, it's mostly - some people saying it's useless, - while other people saying I need it, and also - en_US.UTF-8 is just fine vs. - en_US.UTF-8 sucks, we really need C.UTF-8 instead without any convergence. I think reading back through the entire log, people who were initially rather opposed to the proposal did come around once they appreciated exactly what the changes would be, and why they were needed. The conversation was mostly constructive bar some initial misunderstandings about what the changes actually meant--it did flesh out some of the issues WRT standards conformance and what might break if the default was changed, but this bug isn't really about the default, it's about having a standard UTF-8 locale available. Andrew Macmillan's message in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776#167 is a rather good look at a summary of the issues and the big picture behind the motives for changing. Introducing a C.UTF-8 is a trivial change to make and does not impact any existing software. It doesn't mandate a specific national locale, nor does it alter the existing C locale. To quote: The proposal, at this stage is only that the C.UTF-8 locale is *installed* and *available* by default. Not that it *be* the default, but that it *be there* as a default. People will naturally continue to be free to uninstall it, or to leave their locale to 'C'. There were no objections to having a UTF-8 locale installed and available by default, just to it *being* the default. Taking this first small step is IMO important to do, preferably for squeeze if possible. Since it's a tiny one-liner change, this should be no trouble in getting this done. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Samuel Thibault dixit: believe that's something that shouldn't break Squeeze at all. I also believe it cannot possibly do that. bye, //mirabilos -- “It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2 -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/pine.bsm.4.64l.1009031621090.1...@herc.mirbsd.org
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote: Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit : On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote: without any convergence. I think reading back through the entire log, Thanks for having done it! people who were initially rather opposed to the proposal did come around once they appreciated exactly what the changes would be, and why they were needed. Ok. There was still a question of en_US.UTF-8 vs C.UTF-8, but I believe the en_US.UTF-8 is fine enough argument doesn't hold any more since some other people say that it isn't for them. There were no objections to having a UTF-8 locale installed and available by default, just to it *being* the default. Taking this first small step is IMO important to do, preferably for squeeze if possible. Since it's a tiny one-liner change, this should be no trouble in getting this done. I believe so too, I just didn't want to push it too much, but yes, I believe that's something that shouldn't break Squeeze at all. That's not something allowed anymore at this period of the freeze, you will have to get an exception from the release team first. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100903171640.gb26...@hall.aurel32.net
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Ben Finney ben+deb...@benfinney.id.au writes: Would a less confusing way to make this distinction be to say something like: “The minimal Debian installation must have a locale available that uses the UTF-8 character encoding.”? The other angle here is that it can't just be any UTF-8 locale, since that isn't very helpful to software that needs to choose a UTF-8 locale on an automated basis. Lintian, for example, just needs *some* locale that's UTF-8, but I don't want to have to try en_US.UTF-8 and then fr.UTF-8 and then pt_BR.UTF-8 and then I think we need to explicitly require a *specific* UTF-8 locale be available. C.UTF-8 has a lot of appeal since it's the minimal UTF-8 locale and it doesn't get into issues of favoring one particular language and its corresponding collation rules, etc. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87tym6wv4d@windlord.stanford.edu
Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale
Aurelien Jarno, le Fri 03 Sep 2010 19:16:40 +0200, a écrit : On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote: Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit : There were no objections to having a UTF-8 locale installed and available by default, just to it *being* the default. Taking this first small step is IMO important to do, preferably for squeeze if possible. Since it's a tiny one-liner change, this should be no trouble in getting this done. I believe so too, I just didn't want to push it too much, but yes, I believe that's something that shouldn't break Squeeze at all. That's not something allowed anymore at this period of the freeze, you will have to get an exception from the release team first. Ok. I don't feel any urgency so I won't ask for it myself. Samuel -- To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100903223707.ga5...@const