Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread INADA Naoki
>> If we chose "Always use UTF-8 for fs encoding", I think >> PYTHONFSENCODING envvar should be >> added again. (It should be used from startup: decoding command line >> argument). > > Last time I implemented PYTHONFSENCODING, I had many major issues: > https://mail.python.org/pipermail/python-de

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread INADA Naoki
On Fri, Jan 13, 2017 at 12:12 AM, Victor Stinner wrote: > 2017-01-12 1:23 GMT+01:00 INADA Naoki : >> I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. > > The use case is to be able to write a Python 3 program which works > work UNIX pipes without failing with encoding

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Chris Barker
On Thu, Jan 12, 2017 at 7:50 AM, Stephen J. Turnbull wrote: > > So I see no downside to using utf-8 when the C locale is defined. > > You don't have much incentive to look for one, and I doubt you have > the experience of the edge cases (if you do, please correct me), so > that does not surprise

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephen J. Turnbull
Chris Barker writes: > 2) There are non-ascii file names, etc. on this supposedly ASCII system. In > which case, do folks expect their Python programs to find these issues and > raise errors? They may well expect that their Python program will not let > them try to save a non ASCII filename, f

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephen J. Turnbull
Stephan Houben writes: > I think this may be a minor concern ultimately, but it would be > nice if we had some API to at least reliable answer the question > "can I safely output non-ASCII myself?" You can't; stdout might be a TTY, pipe, or socket in which case you have no way to determine tha

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Victor Stinner
2017-01-12 1:23 GMT+01:00 INADA Naoki : > I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. The use case is to be able to write a Python 3 program which works work UNIX pipes without failing with encoding errors: https://www.python.org/dev/peps/pep-0540/#producer-consum

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-12 Thread Stephan Houben
Hi Petr, 2017-01-11 12:22 GMT+01:00 Petr Viktorin : > > For example, this may mean that a built-in Python string sort will give you >> a different ordering than invoking the external "sort" command. >> I have been bitten by this kind of issues, leading to spurious "diffs" if >> you try to use sort

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Nick Coghlan
On 11 January 2017 at 17:05, Stephen J. Turnbull wrote: > Anyway, I need to look more carefully at the actual PEPs and see if > there's something concrete to worry about. But remember, we have > about 18 months to chew over this if necessary FWIW, I'm hoping to backport whatever improved handlin

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Chris Barker
It seems to me that having a C locale can mean two things: 1) It really is meant to be ASCII 2) It's mis-configured (or un-configured), meaning the system encoding is unknown. if (2) then utf-8 is a fine default. if (2), then there are two options: 1) Everything on the sytsem really is ASCII -

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
> My PEP 540 is different than Nick's PEP 538, even for the POSIX > locale. I propose to always use the surrogateescape error handler, > whereas Nick wants to keep the strict error handler for inputs and > outputs. > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler > > The surro

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Victor Stinner
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. > Nick was trying to address the situation where the locale is > set to "C", or rather not set at all (in which case the lib C > defaults to the "C" locale). The latter is a fairly standard > s

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
On Wed, Jan 11, 2017 at 7:46 PM, Stephan Houben wrote: > Hi INADA Naoki, > > (Sorry, I am unsure if INADA or Naoki is your first name...) Never mind, I don't care about name ordering. (INADA is family name). > > While I am very much in favour of everything working "out of the box", > an issue is

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Petr Viktorin
On 01/11/2017 11:46 AM, Stephan Houben wrote: Hi INADA Naoki, (Sorry, I am unsure if INADA or Naoki is your first name...) While I am very much in favour of everything working "out of the box", an issue is that we don't have control over external code (be it Python extensions or external proces

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Stephan Houben
Hi INADA Naoki, (Sorry, I am unsure if INADA or Naoki is your first name...) While I am very much in favour of everything working "out of the box", an issue is that we don't have control over external code (be it Python extensions or external processes invoked from Python). And that code will on

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
> > > (I wonder if we can use LC_CTYPE=UTF-8...) > > Syntactically incorrect: that means the language UTF-8. > "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required, > the region and encoding are optional. Thus ja_JP, ja.UTF-8 are OK, > but .UTF-8 is not. I'm sorry. I know it, but

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread Stephen J. Turnbull
INADA Naoki writes: > Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL. You can also use LC_COLLATE=C. > (I wonder if we can use LC_CTYPE=UTF-8...) Syntactically incorrect: that means the language UTF-8. "LC_TYPE=.UTF-8" might work, but IIRC the language tag is required,

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-11 Thread INADA Naoki
Here is one example of locale pitfall. --- # from http://unix.stackexchange.com/questions/169739/why-is-coreutils-sort-slower-than-python $ cat letters.py import string import random def main(): for _ in range(1_000_000): c = random.choice(string.ascii_letters) print(c) mai

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread INADA Naoki
> > That kind of thing makes me very nervous, and I think justifiably so. > And it's only *sufficient* to justify a change to Python's defaults if > Python checks for and accurately identifies when it's in a container. > In my company, we use network boot servers. To reduce boot image, the image i

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread Stephen J. Turnbull
INADA Naoki writes: > When talking about general Docker image, using C locale is OK for > most cases. In other words, images using C locale is properly > configured. s/properly/compatibly/. "Proper" has strong connotations of "according to protocol". Configuring LC_CTYPE for ASCII expecting

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread INADA Naoki
> > Of course. The question is not "should cb@noaa properly configure > docker?", it's "Can docker properly configure docker (soon enough)? > And if not, should we configure Python?" The third question depends > on whether fixing it for you breaks things for others. When talking about general Do

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes: > >> How common is this problem? > > > > Last 2 or 3 years, I don't recall having be bitten by such issue. > > We just got bitten by this on our CI server. Granted, we could fix it > by properly configuring docker, but it would have been nice if it " > ju

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-10 Thread Chris Barker - NOAA Federal
>> How common is this problem? > > Last 2 or 3 years, I don't recall having be bitten by such issue. We just got bitten by this on our CI server. Granted, we could fix it by properly configuring docker, but it would have been nice if it " just worked" -CHB

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread INADA Naoki
> > The problem is if people have locales set for non-UTF-8, which Chinese > people often do ("GB18030 isn't just a good idea, it's the law"). > Especially forcing stdout to something other than the locale is likely > to mess things up. Oh, I didn't know non-UTF-8 is used for LC_CTYPE in these yea

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Stephan Houben
Hi Stephen, 2017-01-09 19:42 GMT+01:00 Stephen J. Turnbull < turnbull.stephen...@u.tsukuba.ac.jp>: > > Private sector may be up to date, but academic sector > (and from the state of e-stat.go.jp, government in general, I suspect) > is stuck in the Jomon era. > I went to that page, checked the HTM

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Stephen J. Turnbull
INADA Naoki writes: > But when I see non UTF-8 text, I don't change locale to read such > text. Nobody does. The problem is if people have locales set for non-UTF-8, which Chinese people often do ("GB18030 isn't just a good idea, it's the law"). Especially forcing stdout to something other tha

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-09 Thread Barry Warsaw
On Jan 06, 2017, at 11:08 PM, Steve Dower wrote: >Passing universal_newlines will use whatever locale.getdefaultencoding() There is no locale.getdefaultencoding(); I think you mean locale.getpreferredencoding(False). (See the "Changed in version 3.3" note in $17.5.1.1 of the stdlib docs.) >univ

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-08 Thread INADA Naoki
On Sun, Jan 8, 2017 at 1:47 AM, Stephen J. Turnbull wrote: > INADA Naoki writes: > > > I want UTF-8 mode is enabled by default (opt-out option) even if > > locale is not POSIX, > > like `PYTHONLEGACYWINDOWSFSENCODING`. > > > > Users depends on locale know what locale is and how to configure i

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-07 Thread Nick Coghlan
On 8 January 2017 at 02:47, Stephen J. Turnbull wrote: > I agree that people around me mostly know only two encodings: "works > for me" and "mojibake", but they also use locales configured for them > by technical staff. On top of that, international students (the most > likely victims of "UTF-8 b

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-07 Thread Stephen J. Turnbull
INADA Naoki writes: > I want UTF-8 mode is enabled by default (opt-out option) even if > locale is not POSIX, > like `PYTHONLEGACYWINDOWSFSENCODING`. > > Users depends on locale know what locale is and how to configure it. > They can understand difference between locale mode and UTF-8 mode

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Steve Dower
state that with any certainty.) Cheers, Steve Top-posted from my Windows Phone -Original Message- From: "Barry Warsaw" Sent: ‎1/‎6/‎2017 14:04 To: "python-ideas@python.org" Subject: Re: [Python-ideas] PEP 540: Add a new UTF-8 mode On Jan 05, 2017, at 05:50 PM, Victor Sti

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-07 1:06 GMT+01:00 Barry Warsaw : > For some reason it's not configured: (...) Ok, thanks for the information. > I'm not sure why that's the default inside a chroot. I found at least one good reason to use the POSIX locale to build a package: it helps to get reproductible builds, see: htt

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Barry Warsaw
On Jan 06, 2017, at 11:33 PM, Victor Stinner wrote: >Barry: About chroot, why do you get a C locale? Is it because no >locale is explicitly configured? Or because no locale is installed in >the chroot? For some reason it's not configured: % schroot -u root -c sid-amd64 (sid-amd64)# locale LANG=

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 22:20 GMT+01:00 Barry Warsaw : >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. > > It can still happen in some corner cases, even on Debian and Ubuntu where > C.UTF-8

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Chris Angelico
On Sat, Jan 7, 2017 at 8:20 AM, Barry Warsaw wrote: > On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: > >>Because I have the impression that nowadays all Linux distributions are UTF-8 >>by default and you have to show some bloody-mindedness to end up with a POSIX >>locale. > > It can still hap

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Barry Warsaw
On Jan 06, 2017, at 07:22 AM, Stephan Houben wrote: >Because I have the impression that nowadays all Linux distributions are UTF-8 >by default and you have to show some bloody-mindedness to end up with a POSIX >locale. It can still happen in some corner cases, even on Debian and Ubuntu where C.UT

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Barry Warsaw
On Jan 05, 2017, at 05:50 PM, Victor Stinner wrote: >I guess that all users and most developers are more in the "UNIX mode" >camp. *If* we want to change the default, I suggest to use the "UNIX >mode" by default. FWIW, it seems to be a general and widespread recommendation to always pass universa

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Oleg Broytman
On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki wrote: > >> Always use UTF-8 > >> > >> > >> Python already always use the UTF-8 encoding on Mac OS X, Android and > >> Windows. > >> Since UTF-8 became the defacto encoding, it makes sense to always use it > >> on all > >> p

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Stephan Houben
Hi Victor, 2017-01-06 13:01 GMT+01:00 Victor Stinner : > > What do you mean by "eating mojibake"? OK, I erroneously understood that the failure mode was that mojibake was produced. > Users complain because their > application is stopped by a Python exception. Got it. > Currently, most Python 3

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 10:50 GMT+01:00 M.-A. Lemburg : > Victor: I think you are taking the UTF-8 idea a bit too far. Hum, sorry, the PEP is still a draft, the rationale is far from perfect yet. Let me try to simplify the issue: users are unable to configure a locale for various reasons and expect that Python

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 7:22 GMT+01:00 Stephan Houben : > How common is this problem? Last 2 or 3 years, I don't recall having be bitten by such issue. On the bug tracker, new issues are opened infrequently. * http://bugs.python.org/issue19977 opened at 2013-12-13, closed at 2014-04-27 * http://bugs.python.o

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Victor Stinner
2017-01-06 8:21 GMT+01:00 INADA Naoki : > I want UTF-8 mode is enabled by default (opt-out option) even if > locale is not POSIX, > like `PYTHONLEGACYWINDOWSFSENCODING`. You do, I don't :-) It shouldn't be hard to find very concrete issues from the mojibake issues described at: https://www.python

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread M.-A. Lemburg
On 06.01.2017 04:32, Nick Coghlan wrote: > On 6 January 2017 at 12:37, Victor Stinner wrote: >> 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull >> : >>> I've quoted Victor out of context, and his other posts make me very >>> doubtful that he considers this a serious alternative. That said, I'm >>>

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread Stephan Houben
Hi all, One meta-question I have which may already have been discussed much earlier in this whole proposal series, is: How common is this problem? Because I have the impression that nowadays all Linux distributions are UTF-8 by default and you have to show some bloody-mindedness to end up with a

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-06 Thread INADA Naoki
LGTM. Some comments: I want UTF-8 mode is enabled by default (opt-out option) even if locale is not POSIX, like `PYTHONLEGACYWINDOWSFSENCODING`. Users depends on locale know what locale is and how to configure it. They can understand difference between locale mode and UTF-8 mode and they can opt

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Steven D'Aprano
On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote: > Let's say that you have the filename b'nonascii\xff': it's decoded as > 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename? > (I don't know the answer, it's a real question ;-)) I ran this in Python 2.7 to create

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Nick Coghlan
On 6 January 2017 at 12:37, Victor Stinner wrote: > 2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull > : >> I've quoted Victor out of context, and his other posts make me very >> doubtful that he considers this a serious alternative. That said, I'm >> +1 on "don't!" > > The "always ignore locale and

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull : > The point of this, I suppose, is that piping to xargs works by > default. Please read the second version (latest) version of my PEP 540 which contains a new "Use Cases" section which helps to define issues and the behaviour of the different modes.

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 3:10 GMT+01:00 Stephen J. Turnbull : > I've quoted Victor out of context, and his other posts make me very > doubtful that he considers this a serious alternative. That said, I'm > +1 on "don't!" The "always ignore locale and force UTF-8" option has supporters. For example, Nick Coghla

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Stephen J. Turnbull
Victor Stinner writes: > Python 3.6 is not exactly in the first or the later category: "it > depends". > > To read data from the operating system, Python 3.6 behaves in "UNIX > mode": os.listdir() *does* return invalid filenames, it uses a funny > encoding using surrogates. > > To write

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Stephen J. Turnbull
Oleg Broytman writes: > On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner > wrote: > > Since UTF-8 became the defacto encoding, it makes sense to always > > use it on all platforms with any locale. > >Please don't! I've quoted Victor out of context, and his other posts make m

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 2:15 GMT+01:00 INADA Naoki : >>> Always use UTF-8 (...) >>Please don't! (...) > > For stdio (including console), PYTHONIOENCODING can be used for > supporting legacy system. > e.g. `export PYTHONIOENCODING=$(locale charmap)` The problem with ignoring the locale by default and forcin

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
Ok, I modified my PEP: the POSIX locale now enables the UTF-8 mode. 2017-01-05 18:10 GMT+01:00 Victor Stinner : > A common request is that "Python just works" without having to pass a > command line option or set an environment variable. Maybe the default > behaviour should be left unchanged, but

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-06 0:35 GMT+01:00 Steven D'Aprano : >> Python 3 promotes Unicode everywhere including filenames. A solution to >> support filenames not decodable from the locale encoding was found: the >> ``surrogateescape`` error handler (`PEP 393 >> `_), store u

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread INADA Naoki
>> Always use UTF-8 >> >> >> Python already always use the UTF-8 encoding on Mac OS X, Android and >> Windows. >> Since UTF-8 became the defacto encoding, it makes sense to always use it on >> all >> platforms with any locale. > >Please don't! I use different locales and enco

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Steven D'Aprano
On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: [...] > Python 3 promotes Unicode everywhere including filenames. A solution to > support filenames not decodable from the locale encoding was found: the > ``surrogateescape`` error handler (`PEP 393 >

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Oleg Broytman
Hi! On Thu, Jan 05, 2017 at 04:38:22PM +0100, Victor Stinner wrote: > Always use UTF-8 > > > Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. > Since UTF-8 became the defacto encoding, it makes sense to always use it on > all > platforms with any

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
2017-01-05 17:50 GMT+01:00 Victor Stinner : > In its current shame, my PEP 540 leaves Python default unchanged, but > adds two modes: UTF-8 and UTF-8 strict. The UTF-8 mode is more or less > the UNIX mode generalized for all inputs and outputs: mojibake is a > feature, just pass bytes unchanged. Th

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
> https://www.python.org/dev/peps/pep-0540/ I read the PEP 538, PEP 540, and issues related to switching to UTF-8. At least, I can say one thing: people have different points of view :-) To understand why people disagree, I tried to categorize the different point of views and Python expectations:

[Python-ideas] PEP 540: Add a new UTF-8 mode

2017-01-05 Thread Victor Stinner
Hi, Nick Coghlan asked me to review his PEP 538 "Coercing the legacy C locale to C.UTF-8": https://www.python.org/dev/peps/pep-0538/ Nick wants to change the default behaviour. I'm not sure that I'm brave enough to follow this direction, so I proposed my old "-X utf8" command line idea as a new P