[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-08-31 Thread Nick Coghlan

Nick Coghlan added the comment:

I found this discussion again while looking for issue #19977 to reference from 
issue #24968.

"fixed" wasn't the right resolution, so I've moved it to "postponed" - the SSH 
locale forwarding problem highlighted again in #24968 means I think there's a 
discussion worth having about reading /etc/locale/conf when it's available, 
rather than always trusting the glibc locale settings.

--
resolution: fixed -> postponed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-05-25 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-05-25 Thread STINNER Victor

STINNER Victor added the comment:

Without a strong support, I don't want to put this in Python 3.5. It's too late 
(we reached the feature freeze).

For Python 3.6, we may experiment using UTF-8 for Python filesystem encoding 
when the LC_CTYPE locale is POSIX (C).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-25 Thread Nick Coghlan

Nick Coghlan added the comment:

If a Linux distro is using systemd (which is essentially all recent versions of 
popular distros, including RHEL/CentOS, although it won't land in Ubuntu LTS 
until 16.04), then cron jobs and service daemons will get their locale set 
properly based on the contents of /etc/locale.conf. Thus use an init system 
that reliably sets the locale correctly for cron jobs and service daemons is 
the correct fix for this problem.

Unfortunately, there are still an awful lot of Linux systems out there using 
other init systems that don't reliably set the locale, and for those Python 3 
shouldn't be worse than Python 2 is a desirable behavioural goal here.

Thus, I think it makes sense for Python to special case the C locale by 
assuming it's always the wrong setting, and thus surrogateescape is going to be 
needed on all system interfaces. While it won't be a perfect fix, at least 
we'll be able to roundtrip data within the system appropriately, even if it 
still gets corrupted in the face of encoding conversions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-19 Thread STINNER Victor

STINNER Victor added the comment:

Related issues and discussions:
- [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
  https://mail.python.org/pipermail/python-dev/2011-June/112086.html
- Issue #12451: open: avoid the locale encoding when possible
  https://bugs.python.org/issue12451

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-19 Thread R. David Murray

R. David Murray added the comment:

Well, previously our answer has been you have to understand unicode.  If we 
are going to change that, it probably needs a python-dev discussion.  But like 
I said, providing the *tools* to make it possible to easily do this, just not 
as a default, seems like mostly a no-brainer.  It's making it the default that 
is controversial, IMO.  (Call me -0.5 at this point in the discussion, as 
regards making it the default).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-18 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
title: Use surrogateescape error handler by default in open() if the locale is 
C - Use surrogateescape error handler by default in open() if the LC_CTYPE 
locale is C at startup

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-18 Thread STINNER Victor

STINNER Victor added the comment:

For a more concrete use case, see the makefile problem in Mercurial wiki page:
http://mercurial.selenic.com/wiki/EncodingStrategy#The_.22makefile_problem.22

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-18 Thread R. David Murray

R. David Murray added the comment:

Hmm.  Upon reflection I guess I can see the validity of if you are using the C 
locale you or the OS are broken anyway, so we'll just pass the bytes through.  
I'm not entirely convinced this won't cause issues, but I suppose it might not 
cause any more issues that having things break due to the C locale does.

It is, however, going to return us to the days when a program that works fine 
most of the time suddenly blows up in the face of non-ascii data, and that's my 
biggest concern.

I'd certainly be fine with it if it wasn't the default (that is, programs who 
need this have to opt in to it).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23993] Use surrogateescape error handler by default in open() if the LC_CTYPE locale is C at startup

2015-04-18 Thread STINNER Victor

STINNER Victor added the comment:

 if you are using the C locale you or the OS are broken anyway, so we'll just 
 pass the bytes through

Exactly. Even if you use Unicode, the Python 3 str type, you store text as raw 
bytes (in a custom format, as surrogate characters).

 I'm not entirely convinced this won't cause issues, but I suppose it might 
 not cause any more issues that having things break due to the C locale does.

The most obvious issue is the come back of mojibake. Since you manipulate raw 
bytes, it's easy to concatenate two bytes strings encoded to two different 
encodings.
https://unicodebook.readthedocs.org/definitions.html#mojibake

The problem is that the question is not how bad it is use to manipulate text as 
bytes. The problem is that a working application written for Python 2 starts to 
randomly fail (on non-ASCII characters) on Python 3 when the LC_CTYPE locale is 
the POSIX locale (C). The first question is: should I keep Python 2 or write 
my application in a language which doesn't force me to understand Unicode?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23993
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com